SolarWinds MSP collects and aggregates information from millions of agents via hundreds of intermediate services deployed across the globe. It provides business intelligence, reporting and analytical capabilities to both internal and external clients. Having gone through a massive expansion in the past few years the traditional Extract Transform Load (ETL) pipelines cannot cope with the agility the business demands in order to deliver world class features with minimal engineering friction. The fabric of our data has evolved from cold storage independent silos into distributed interconnected continuous flows of information that demand high resilience and configurable delivery semantics at near real-time.
This talk presents DaVinci EventBus: the fully cloud-native eventing backbone of SolarWinds MSP. Built for scalability, it connects millions of agents through hundreds of micro-services that exchange tens of billions of messages per day deployed in four geographical regions. It exposes a unified gRPC interface that allows clients in different programming languages to seamlessly interact with topics across multiple Kafka clusters. DaVinci EventBus uses Akka to implement self-service topic management, provide high-throughput batch publication, coordinate consumption groups and replicate data while guaranteeing sequential consistency across multiple Kafka clusters.
We dive deep into the design of the DaVinci EventBus and show how Akka can be used to implement an external coordination mechanism that federates multiple Kafka clusters. We discuss our journey of breaking monolithic legacy systems into a set of resilient event-driven micro-services. We show how our event-driven approach massively reduced the data propagation network traffic and simplified the data manipulation and analysis in order to drive new features such as automated anomaly detection to our end users. Further, we expand on our future plans to provide multiple consumption mechanisms on a single event firehose, on-demand automated Kafka cluster deployment, and asynchronous workflow management across multiple micro-service boundaries.