What is the expected result of deserializing an avro message when you do not have the Java POJO?

  Kiến thức lập trình

Suppose I have a number of avro schema definitions: (i) Event1, (ii) Event2, (iii) EventWrapper.

EventWrapper is a record with one field (named payload) which is a union of Event1, Event2.

I also have a single topic, and the confluent schema registry has been setup so that when a message is sent to that topic, the subject name will resolve to the EventWrapper schema.

There is a producer and consumer which have access to the generated POJO classes for the schemas mentioned above. Everything works fine – the producer produces an EventWrapper message and the consumer, using the KafkaAvroDeserializer with specific.avro.reader: true, is able to deserialize the field within the EventWrapper message just fine to the correct POJO type (Event1 or Event2).

But now suppose I add a new event schema, Event 3, and update EventWrapper to v2 (which has the union updated to include: Event1 | Event2 | Event3 in it. And the producer’s generated POJOs have been updated but the consumer’s have not.

The producer goes and produces messages containing payloads of Event3 but the consumer has the old generated POJO definition for EventWrapper and does not have a generated POJO for Event3 at all.

What should be the expected result when the consumer receives an EventWrapper message that contains Event3 as the payload?

  1. Should the consumer have a de-serialization error?
  2. Should the consumer be able to deserialize Event3 but as a org.apache.avro.generic.GenericData.Record instead?

When using io.confluent:kafka-avro-serializer:7.6.0 the consumer I have de-serializes it to a GenericData.Record but when a previous version (7.2.x) it actually de-serializes Event3 as Event1 (which I am pretty sure is a bug).

What is the correct behaviour? Is the deserialization to a Record expected because the consumer is able to get the schema of Event3 from the schema registry – but it just can’t be exposed as the Event3 POJO type?

Can I depend on the fact that the consumer always be able to deserialize messages as a GenericData.Record?

LEAVE A COMMENT