In Ooyala Adtech we have 1B+ ad events per day to be collected, delivered in near real-time to tens of different consumers, which include ad server decision engine itself, forecasting, reporting and more. We used to run it on RabbitMQ and a home-built blob storage, but it was not reliable enough, had high operation costs and made onboarding new data streams difficult. Recently we have completed replacing it with new Kafka-based solution, which we does not have the above mentioned limitations.We have switched all the consumers and changed messaged format to Avro and now we are using quite a lot from Kafka ecosystem: Kafka Streams, Google PubSub connector, S3 Connector, Schema registry and Mirror Maker and soon K-SQL and Debezium. For some of the stateful consumers we had to implement backup/restore of event topics to/from S3.
Some key takeaways:
- Kafka streams was a perfect fit for our environment, where multiple teams develop micro-services independently.
- Using Avro as event format allows for generic processing components (e.g. S3-sink) and schemas are perfect for documentation.
- Kafka time index is a very powerful feature that allowed us to develop clean backup/restore process.
- Using cloud native messaging solution (e.g. Google pub/sub) with a connector to on-premise Kafka cluster appears to be the easiest and cheapest solution for global streaming for us at the moment.
We believe our story of migration to and adoption of Kafka ecosystem might be interesting.