Ddia-Encoding-Evolution

Encoding and Evolution

This chapter was a bit of a snoozefest, diving into particulars of various encoding schemes. However, there were a couple important takeaways.

To me, the most important theme was around versioning, a problem that seems to creep up wherever I go. Because I've worked mostly at startups, it is typically something that gets kicked down the road and deprioritized. The truth is, versioning is hard, but at the same time, it's a crucial thing to establish in an API. And ultimately, that will trickle down to some form of versioning on first-class types (objects, structs, etc.).

In the DDIA setting where you have multiple nodes, i.e. from replicating/sharding, it's important to be able to slowly roll out updates for testing purposes and to facilitate zero downtime. (This helps achieve evolvability.) Hence, we need to anticipate multiple versions of code running at a given time. This requires the data flowing between nodes to be:

  • backward compatibility: new code can read old data
  • forward compatibility: old code can read new data

Formats

While textual encoding formats (e.g. XML, JSON) are popular for RESTful HTTP APIs, when it comes to moving large chunks of data, binary encodings can make a lot more sense. They take up less space and many have built-in schema with clearly defined forward & backward compatibility semantics.

Some popular schema-driven binary formats include

  • Thrift
  • Protocol Buffers
  • Avro