Running Apache Kafka sometimes presents interesting challenges, especially when operating at scale. In this talk we share some of our experiences operating Apache Kafka as a service across a large company. What happens when you create a lot of partitions and then need to restart brokers? What if you find yourself with a need to reassign almost all partitions in all of your clusters? How do you track progress on large-scale reassignments? How do you make sure that moving data between nodes in a cluster does not impact producers and consumers connected to the cluster? We invite you to dive into a few of the issues we have encountered and share debugging and mitigation strategies.