Are Kafka write commits agnostic to producers?

  Kiến thức lập trình

I am learning the designs of Kafka replication, and producers.
And to my surprise, I somehow get the conclusion that it is impossible for producers to know accurately if the messages are committed by the broker, and the safety of writes are agnostic.

I hope I am wrong with my conclusion, and I will list steps of my analysis, hopefully I can get some reviews here.

STEP1:

**
Key information I get from Kafka docs:
**

  1. A write to a Kafka partition is not considered committed until all in-sync replicas (ISR) have received the write.
  2. The ISR set is dynamically maintained which means if we have 3 replicas of a partition, the ISR can range from one to three.

Based on the two statements from Kafka Document here, I got the first conclusion that:

My-Conclusion-1: The number of replicas behind a commit is dynamic.

STEP2:

Then I want to link the writes which refers to producers to the commit of data.
And then I found 2 key configurations:the acks of producer AND min.insync.replica of broker.
Direct quotation of these explanation:

When a producer sets acks to “all” (or “-1”), min.insync.replicas specifies the minimum number of replicas that must acknowledge a write for the write to be considered successful. If this minimum cannot be met, then the producer will raise an exception (either NotEnoughReplicas or NotEnoughReplicasAfterAppend).
When used together, min.insync.replicas and acks allow you to enforce greater durability guarantees. A typical scenario would be to create a topic with a replication factor of 3, set min.insync.replicas to 2, and produce with acks of “all”. This will ensure that the producer raises an exception if a majority of replicas do not receive a write.

Then here I find although we can set producer ack=all, we CANNOT that min.insync.replica=all because it requires a specific number. Then I get:

My-Conclusion-2:
For a partition = f+1 replicas:

  1. If a producer with ack=all but min.insync.replica=f is successful: there is still some chance the message is lost because a commit message requires all ISR (which can be f+1 now) receives the message;

  2. If a producer with ack=all but min.insync.replica=f+1 is failed, there is still some chance – the message is successful because a commit message requires all ISR (which can be <f+1 now) receives the message.

STEP3:

I learn from the videos of Kafka to find:

  • Kafka failover just pick one member from ISR as leader.
  • Kafka failover and rejoin policy truncate messages that are not fully committed.

My-Conclusion-3:
Uncommitted messages can be to get lost even if a majority of ISR has the message AND when
the one ISR that doesn’t have the message gets elected when leader fails.

STEP-FINAL:

Now I start to think what I can get with my producers:

For a partition = f+1 replicas, we have at least 2 options:

  1. If a producer with ack=all but min.insync.replica=f is successful
  • is failed, messages can be lost OR can be guaranteed, we don’t know for sure; we have false positives;
  1. If a producer with ack=all but min.insync.replica=f+1
  • is successful, we are sure messages cannot be lost;
  • is failed, messages can be lost OR can be guaranteed, we don’t know for sure; we have false negatives.

If we don’t care about retry-latency/throughput/SLA and we are very sensitive about data loss,
we should use second option.
If we don’t care about data loss, but want high SLA/throughput/retry-latency, we should use Option-1 are more slack options than it.

I want expert of Kafka to help review my analysis and conclusions to comment it they are wrong or make sense.

New contributor

user24721402 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

LEAVE A COMMENT