Kafka Summit Logo
Organized by

Kafka Summit SF 2019

September 30 - October 1, 2019 | San Francisco

Why Stop the World When you Can Change it? Design and Implementation of Incremental Cooperative Rebalancing

Session Level: Advanced

Since its initial release, the Kafka group membership protocol has offered Connect, Streams and Consumer applications an ingenious and robust way to balance resources among distributed processes. The process of rebalancing, as it’s widely known, allows Kafka APIs to define an embedded protocol for load balancing within the group membership protocol itself. Until now, rebalancing has been working under the simple assumption that every time a new group generation is created, the members join after first releasing all of their resources, getting a whole new load assignment by the time the new group is formed. This allows Kafka APIs to provide task fault-tolerance and elasticity on top of the group membership protocol. However, due to its side-effects on multi-tenancy and scalability this simple approach in rebalancing, also known as stop-the-world effect, is limiting larger scale deployments. Because of stop-the-world, application tasks get interrupted only for most of them to receive the same resources after rebalancing. In this technical deep dive, I’ll discuss the proposition of Incremental Cooperative Rebalancing as a way to alleviate stop-the-world and optimize rebalancing in Kafka APIs.

We’ll cover:
* The internals of Incremental Cooperative Rebalancing
* Uses cases that benefit from Incremental Cooperative Rebalancing
* Implementation in Kafka Connect
* Performance results in Kafka Connect clusters

We use cookies to understand how you use our site and to improve your experience. Click here to learn more or change your cookie settings. By continuing to browse, you agree to our use of cookies.