At Workday, Kafka is the data backbone powering our search and analytics infrastructure in production. Our customers, the world’s largest companies, rely on our search system for recruitment, employee professional development and other business-critical applications. In addition, enterprise search needs to respect highly configurable security and access-restriction models. As a consequence, our search indices need to be consistent and up-to-date. Our search indexing pipeline processes hundreds of terabytes of data every few hours. In addition, we need a real-time incremental indexing pipeline that captures tens of thousands of transactions a second, processes them and updates indices with relevant data.
To meet these challenges, we built a robust and scalable stream processing engine backed by Kafka to fetch data from our primary data stores, process and index it to our Elasticsearch cluster.
In this talk, we would like to share our experience building a custom streaming engine based on Akka Streams and how we integrated with Kafka. We will talk about how we handle the unique challenges of working with business-critical and sensitive enterprise data. We will describe how we handle per-tenant encryption/decryption, replication, error handling and data loss. We will share our experiences configuring Kafka to handle our workload and partitioning data such that the most critical data is always up-to-date. We will also talk about our experiences with deploying and operating our system in production, describe how we keep this workhorse running at scale and share performance numbers. Last, but not the least, we will touch upon about how our Kafka/Akka streaming engine is used across our product portfolio, powering applications such as collecting user-actions for analytics, training machine-learning models, and ferrying data across microservices for our ML inference engines.