In the beginning was PIPs, an API backed by a relational database used to store all the BBC’s programme metadata. But as more clients came, with more requirements and ever more complex queries, it became untenable to build one system able to service them all and maintain performance. Each client wanted a simple interface to be able to ask their specific complex questions, about subjects like availability and scheduling.
The Programme Metadata team turned to a combination of Kafka and Clojure (a functional, immutable, Lisp dialect) running in AWS to produce multiple pipelines, one per client requirement. This setup turns the normal ETL pipeline on its head, with one homogenous backend and multiple heterogenous outputs. At each level you can see the same pattern repeated, which extends even into the structure of the Clojure code itself. In this talk we’ll go through some of the things we’ve learned, look at how the structure of Clojure mirrors and supports the way Kafka is used, and see how simple commodity microservices can be reused in multiple pipelines to rapidly satisfy new client requirements.