Relative Content

Tag Archive for apache-kafka

message delivery guarantee in Kafka

The question is the following, or to be more precise, in my misunderstanding regarding Kafka delivery guarantees, I looked everywhere for information and somewhere the data differs. As I understand it, the message delivery guarantee in Kafka is how they will be delivered to the target unit (Kafka, consumer). As it was said in one of the articles, before the Kafka update in 2017, Kafka supported the delivery guarantee al least once and at most once, and here is the question, is this delivery from the producer side or from the consumer side. And how is this achieved, by what parameter. From the assumption, this is the asc parameter, which indicates whether it is necessary to wait for confirmation or not. The article said that since 2017 transactions have been introduced in Kafka and this helps to achieve exactly once, in another article it was said that transactions work only at the topic level, but in no way relate to exactly once, that is, they work at the level of messages being entered into 2 topics within one transaction, and consumers in turn could read these messages after they are committed. I also heard about idempotency from the producer side, that Kafka can do deduplication by saving a certain message ID in its topic and as this happens prevents duplicates from being found in case of re-sending a message due to a producer or Kafka failure. So how can we achieve full exactly once, so that there is a guarantee that the full process, including sending from the producer to Kafka, reading and processing on the consumer, was complete?

Duplicated Kafka messages in output of Embulk collection

We are using: Embulk version v0.10.12.
We are collecting files using sftp input and pushing them to Kafka using embulk-output-kafka.
From time to time, we face duplicated kafka messages within our output Kafka topic although the Embulk logs shows that each file is processed ony once and Embulk Kafka producer pushes the message only once.
What could be the reason of such duplication ?

Kafka batch consumer not consuming latest 1 message from the partition

I have a .Net core kafka consumer, having batch consumer of batch size:- 100 and batch delay of 5 sec. The problem here is my consumer group Id gets stuck on the second last offset of each partition on the kafka topic. Since no any new messages were available in the topic the consumer is not processing the latest message in the partitions.
This is the configuration of the consumer
{ new KeyValuePair<string, string>("group.id", <groupname>), new KeyValuePair<string, string>("bootstrap.servers", <server>), new KeyValuePair<string, string>("enable.auto.commit", "false"), new KeyValuePair<string, string>("auto.offset.reset", "earliest"), new KeyValuePair<string, string>("retries", "0"), new KeyValuePair<string, string>("batch.num.messages", "100"), new KeyValuePair<string, string>("socket.nagle.disable", "true"), new KeyValuePair<string, string>("queue.buffering.max.ms", "0"), new KeyValuePair<string, string>("partition.assignment.strategy","roundrobin"), new KeyValuePair<string, string>("auto.commit.interval.ms","0"), new KeyValuePair<string, string>("max.poll.interval.ms","300000"), new KeyValuePair<string, string>("heartbeat.interval.ms", "3000"), new KeyValuePair<string, string>("session.timeout.ms", "30000") }

Automatic deleteion of topic in Kafka

I have a kafka MSK cluster. Lots of field devices send messages to this kafka in their corresponding topics. Some may not send any messages for a long time. Since number of devices are growing, I would like to delete their corresponding topics and reclaim the resources. What can be the correct approach to solve this?