Kafka Connect is a powerful, flexible and tremendously useful application that enables effortless, real-time data integration. With any new tool however comes a unique set of operational considerations. How do you make sure your Kafka Connect deployment hums along quietly at all hours of the day? Drawing primarily from my years of experience running a large distributed Kafka Connect cluster at Stitch Fix, this talk will be an overview of:
Stitch Fix’s Kafka Connect deployment model and use cases.
The most useful operational tools for making Kafka Connect run smoothly, e.g admin services, CLIs, jobs, alerts and dashboards.
How to do end-to-end monitoring.
Lessons learned from production issues and painful migrations (why, oh why did we not use schemas from the beginning?? Pausing connectors doesn’t do what you think it does… rebalancing is tricky, but has gotten a lot better!… jar hell problems are a thing of the past, upgrade and use plugin.path!).
If you are an engineer who is curious about Kafka Connect or currently maintains a modest sized Kafka Connect cluster you will walk away from this talk with increased confidence in deploying and maintaining a large, mission critical Kafka Connect cluster. I’m personally a huge fan of Kafka Connect and I’m excited to share some lessons learned with the community so more people will have a successful experience using it.