Testing Kafka sounds easy, especially as there are scripts and guides on how to get the software running within minutes. These make it possible to easily test and demonstrate ‘Kafka 101.’ There are some hard-core tests, for instance using Jepsen to simulate network partitions etc. However, we’ve yet to discover the middle ground that’d enable teams to easily and effectively test key aspects of using Kafka as a platform for their context, environment, and applications. In October 2017 we started testing Kafka’s suitability for a global pan data-centre deployment.
This includes testing:
- Performance and scalability using representative data, configurations, environments and consumers.
- Service operability to assess practical aspects of operating Kafka longer-term including equipment upgrades, reconfigurations, migrations, etc.
- Robustness and FMEDA of Kafka as a service; where networks, nodes, clusters, and clients misbehave, crash, and fail.
Vitally we include human factors and testing standard operating procedures, e.g., how to recover from problems that may occur when using Kafka in production. Fitness-for-purpose: Kafka is one possible solution. How well does it suit the business needs? And from a technical perspective, which Kafka implementation should we use, which API, versions, build processes, etc.? Also, how well does Kafka run, behave and integrate in the client’s environment?
Our approach to testing includes iterating quickly using non-confidential data, equipment and environments outside the client’s domain. The tests are adapted to increase the fidelity and relevance for the client’s domain and performed in their confidential environments. This talk provides a case study of the testing we designed and implemented. Some of the work is being open-sourced to facilitate others to perform similar testing. The audience has the opportunity to learn from our experiences and adapt these to improve their testing and assessment of Kafka and related technologies.