Kafka Double Payment Bug
When Your Payment System Goes Rogue
Picture this: Your payment worker processes a $500 transfer, successfully sends the money, then crashes right before telling Kafka âhey, I handled this message.â When the replacement worker starts up, it sees the same unprocessed message and cheerfully processes it again.
Congratulations - you just paid someone twice.
This isnât a hypothetical nightmare. Itâs the at-least-once delivery guarantee that makes Kafka reliable (no lost messages), but also terrifying for financial systems because duplicates are possible.
Why This Happens (And Why Itâs So Common)
Kafkaâs default behavior is simple: if a consumer doesnât commit an offset, the message gets redelivered. This works great for most scenarios - youâd rather process a message twice than lose it completely.
But payments are different. Money doesnât have an undo button.
The problem follows this brutal timeline:
# Worker receives payment message
payment = consumer.poll()
# Processes payment successfully
send_money(payment.amount, payment.recipient)
# Worker crashes HERE before commit
# New worker starts, sees same message
# Double payment created
The gap between âwork completedâ and âoffset committedâ is where financial disasters live.
The Idempotency Key Solution
The most robust fix is making your payment processing idempotent - meaning processing the same message multiple times has the same effect as processing it once.
Hereâs how to implement it:
def process_payment(message):
# Generate consistent key from message content
# Use stable identifiers (avoid volatile fields like timestamps)
idempotency_key = f"{message.sender}:{message.recipient}:{message.amount}:{message.payment_id}"
# Check if already processed
if payment_exists(idempotency_key):
return "already_processed"
# Process payment with key stored atomically
with database.transaction():
create_payment_record(idempotency_key, message)
# Call external payment API with idempotency headers + timeout
send_money(message.amount, message.recipient, headers={"Idempotency-Key": idempotency_key}, timeout=5)
# Commit offset after transaction completes
consumer.commit()
return "processed"
The key insight: store the idempotency key in the same transaction as the payment. If the worker crashes after sending money but before committing, the database rollback removes the payment record too.
Database-Level Prevention
For even stronger guarantees, use your databaseâs unique constraints:
# Create table with unique constraint
CREATE TABLE processed_payments (
idempotency_key VARCHAR(255) PRIMARY KEY,
amount DECIMAL(10,2),
status VARCHAR(50),
created_at TIMESTAMP
);
def process_payment_safely(message):
key = generate_idempotency_key(message)
try:
# This will fail if key already exists (or use UPSERT/ON CONFLICT DO NOTHING in Postgres)
db.execute("""
INSERT INTO processed_payments (idempotency_key, amount, status)
VALUES (?, ?, 'processing')
ON CONFLICT (idempotency_key) DO NOTHING
""", [key, message.amount])
send_money(message.amount, message.recipient)
# Update status and commit offset together
with db.transaction():
db.execute("UPDATE processed_payments SET status = 'completed' WHERE idempotency_key = ?", [key])
consumer.commit()
except UniqueConstraintViolation:
# Payment already processed, safe to ignore
consumer.commit()
Kafkaâs Exactly-Once Semantics
Modern Kafka offers exactly-once semantics (EOS) through transactional producers and consumers. This requires more setup but provides the strongest guarantees â but remember: EOS only makes Kafka commits atomic. Your downstream systems (like payment gateways) must still be idempotent.
# Configure consumer for exactly-once
consumer = KafkaConsumer(
isolation_level='read_committed',
enable_auto_commit=False
)
# Configure transactional producer
producer = KafkaProducer(
transactional_id='payment-processor-1',
enable_idempotence=True
)
def process_with_transactions(message):
producer.begin_transaction()
try:
# Process payment
send_money(message.amount, message.recipient)
# Commit offset within transaction
producer.send_offsets_to_transaction(
{TopicPartition(message.topic, message.partition): OffsetAndMetadata(message.offset + 1)},
'payment-consumer-group'
)
producer.commit_transaction()
except Exception as e:
producer.abort_transaction()
raise e
This ensures that either everything commits (payment + offset) or nothing commits.
Choosing Your Defense Strategy
Use idempotency keys when you need maximum reliability and canât afford any duplicate processing. It works with any Kafka setup and gives you full control.
Use exactly-once semantics when youâre building new systems and can design around Kafkaâs transactional requirements. Itâs cleaner but requires careful configuration.
Never rely on hoping workers wonât crash - because they will, and always at the worst possible moment.
The scariest bugs in distributed systems arenât the ones that break loudly. Theyâre the ones that silently create duplicate payments while your monitoring shows everything as âhealthy.â Fix this before it finds you.