Betfair is the largest online betting operation in the world, busier than London Stock Exchange at times. At Betfair, our biggest customers trade at ultra high frequencies pouring in millions of dollars into our trading systems. As such, low latency and reliability is key to everything we build. Our customers need to be able to view the current positions on offer on a market, place their orders accordingly and see them fulfilled reliably, all of which needs to happen in a few milliseconds. As global business, we also have a steady growth in a number of jurisdictions we operate in, so the number of customers operating at such frequencies are going up every single day. We need our exchange trading platform to be resilient, reliable, fast and easily scalable. On a busy Saturday afternoon when there is popular football going on, we see in excess of 200k transactions per second across our estate, and 99.9% of them being served with an SLA of 10ms. This used to be about 40k transactions per second a few years ago. So in order to get from that point to the present day, we needed to fundamentally re-engineer our backend systems, to be largely event driven, and Kafka was the perfect tool to help us solve this problem. On the back of the success of that platform, we are now rebuilding the core of our exchange platform that accepts and matches up to 25000 orders per second. Ordering of events is key to achieving this reliably, and again Kafka is at the centre of our solution here.
This presentation details some of the key scalability, reliability and resiliency challenges that we faced in this migration and how we overcame them, to rebuild our entire exchange with Kafka at its core.