Kafka isn’t just for data engineers or distributed computing enthusiasts. But streaming data doesn’t become useful on its own: it needs to be accessible to the people who will build mobile games, performance dashboards, instant messaging systems (you get the picture) with it. My team set up Etsy’s first Kafka cluster in 2014 — as a pit stop for our clickstream data before it made its way to Hadoop for batch processing. It quickly became one of Etsy’s most reliable systems, as well as its most underutilized. It wasn’t until 2017 that engineers outside the data team began to make use of our Kafka pipeline and the data we had been streaming through it for years.
Now we’re using Kafka to develop user-facing applications and machine learning pipelines. The catalyst for this change was the deployment and monitoring platform we built to make our data sources accessible and usable. This talk will center around the evolution of Kafka at Etsy: how we built out a platform that would empower all of Etsy engineering to explore and experiment with our existing sources of streaming data. I’ll describe how we architected the deployment and monitoring platforms that allow us to support a diverse range of streaming data uses, the benefits and drawbacks of having a centralized platform for multiple services, and how we structured those services for fault isolation. I’ll also talk about how we introduced Kafka and the concepts of streaming architecture and distributed systems to engineers who had never worked with data apps: in other words, how we approached user education. The main takeaways will be technical — tips on architecture and tooling that allow for speed of development as well as functional isolation — and practical — how to support data-driven experimentation in all areas of your organization.