Comcast manages over 2 million miles of fiber and coax, and over 40 million in home devices. This “outside plant” is subject to adverse conditions from severe weather to power grid outages to construction-related disruptions. Maintaining the health of this large and important infrastructure requires a distributed, scalable, reliable and fast information system capable of real-time processing and rapid analysis and response. Using Apache Kafka and the Kafka Streams Processor API, Comcast built an innovative new system for monitoring, problem analysis, metrics reporting and action response for the outside plant.
In this talk, you’ll learn how topic partitions, state stores, key mapping, source and sink topics and processors from the Kafka Streams Processor API work together to build a powerful dynamic system. We will dive into the details about the inner workings of the state store—how it is backed by a Kafka “changelog” topic, how it is scaled horizontally by partition and how the instances are rebuilt on startup or on processor failure. We will discuss how these state stores essentially become like materialized views in a SQL database but are updated incrementally as data flows through the system, and how this allows the developers to maintain the data in the optimal structures for performing the processing. The best part is that the data is readily available when needed by the processors. You will see how a REST API using Kafka Streams “interactive queries” can be used to retrieve the data in the state stores. We will explore the deployment and monitoring mechanisms used to deliver this system as a set of independently deployed components.