Kafka Summit Logo
Organized by

Kafka Summit London 2019

Streaming platforms at massive scale.

May 13-14, 2019 | London

Using Machine Learning to Understand Kafka Runtime Behavior

Session Level: Intermediate

Apache Kafka is now nearly ubiquitous in modern data pipelines and use cases.  While the Kafka development model is elegantly simple, operating Kafka clusters in production environments is a challenge. It’s hard to troubleshoot misbehaving Kafka clusters, especially when there are potentially hundreds or thousands of topics, producers and consumers and billions of messages.

The root cause of why real-time applications is lag may be due to an application problem – like poor data partitioning or load imbalance – or due to a Kafka problem – like resource exhaustion or suboptimal configuration. Therefore getting the best performance, predictability, and reliability for Kafka-based applications can be difficult. In the end, the operation of your  Kafka powered analytics pipelines could themselves benefit from machine learning (ML).

In this presentation, we will explain how recent advances in machine learning and AI form the basis of a methodology for applying statistical learning to the rich monitoring data that is available from Kafka. This monitoring data includes metrics from Kafka brokers, producers, consumers, and infrastructure,  as well as logs from various components of the Kafka ecosystem. We will also discuss how to use ML to identify root causes for a number of Kafka-based application bottlenecks, slowdowns, and failures.

We use cookies to understand how you use our site and to improve your experience. Click here to learn more or change your cookie settings. By continuing to browse, you agree to our use of cookies.