Introduction
Avro Enums
"type": { "name": "color", "type": "enum", "symbols": ["red", "blue", "green"] }
Enums are always strings in Avro.
The apache documentation describes the specification for Enums
Avro Compatibility
{ "name": "v1", "type": "record", "namespace": "com.acme", "doc": "Enum test", "fields": [ { "name": "color", "type": { "name": "enum", "type": "enum", "symbols": [ "unknown", "red", "blue", "green" ], "default": "unknown" }, "default": "unknown" } ] }
{ "color": "red" }
Now, convert the JSON to avro using avro-tools
java -jar ~/DevTools/avro-tools-1.11.1.jar fromjson --schema-file v1.avsc v1.json > v1.avro
Now, we evolve the schema. Add another symbol - "yellow" to the "color" enum. Store it in a file called v2.avsc
{ "name": "v1", "type": "record", "namespace": "com.acme", "doc": "Enum test", "fields": [ { "name": "color", "type": { "name": "enum", "type": "enum", "symbols": [ "unknown", "red", "blue", "green", "yellow" ], "default": "unknown" }, "default": "unknown" } ] }
Create a new JSON message in file v2.json that uses the new enum value
{ "color": "yellow" }
Lets convert this to avro.
java -jar ~/DevTools/avro-tools-1.11.1.jar fromjson --schema-file v2.avsc v2.json > v2.avro
Now lets see what happens if you try to read the v2.avro file using v1 schema ( v1.avsc ). Remember that the v1 schema does not have the symbol "yellow" in the enum.
We will use the toJson command from avro-tools to convert the AVRO to json.
$ java -jar ~/DevTools/avro-tools-1.11.1.jar tojson --reader-schema-file v1.avsc v2.avro {"reason":"unknown"}
As you can see, trying to read the v2 avro message using a schema that does not have the new enum symbol causes the new enum to be converted to the default value for that enum.
Now, lets see what happens if we dont have the default value specified in the schema. Lets create a new schema with that specification, in a file called v1-nodefault.avsc
{
"name": "v1", "type": "record", "namespace": "com.acme", "doc": "Enum test", "fields": [ { "name": "color", "type": { "name": "enum", "type": "enum", "symbols": [ "unknown", "red", "blue", "green" ] }, "default": "unknown" } ] }
Use this schema to read v2.avro file.
java -jar ~/DevTools/avro-tools-1.11.1.jar tojson --reader-schema-file v1-nodefault.avsc v2.avro
This will result in the following exception printed to stdout
Exception in thread "main" org.apache.avro.AvroTypeException: No match for yellow at org.apache.avro.io.ResolvingDecoder.readEnum(ResolvingDecoder.java:269) at org.apache.avro.generic.GenericDatumReader.readEnum(GenericDatumReader.java:268) at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:182) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:161) at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:260) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:248) at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:180) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:161) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:154) at org.apache.avro.file.DataFileStream.next(DataFileStream.java:263) at org.apache.avro.file.DataFileStream.next(DataFileStream.java:248) at org.apache.avro.tool.DataFileReadTool.run(DataFileReadTool.java:98) at org.apache.avro.tool.Main.run(Main.java:67) at org.apache.avro.tool.Main.main(Main.java:56)
As you can see, removing the default value from the enum in the schema will make data written with new schema incompatible with the previous schema.