We are seeking a Site Reliability Engineer to join our growing SRE team as the 3rd member, playing an essential role in maturing the company’s approach to service reliability and continuity.
You and your team will be directly responsible for solutions around the reliability of the platform, including availability, latency, performance, efficiency, capacity planning, and incident response.
You will be required to work with engineering teams on complex problems/projects where analysis of situations or data requires an in-depth evaluation of multiple factors where you must be able to make wise trade-offs between competing factors.
You have a passion for helping others and helping them making their lives better: in doing so, you seek to simplify complex systems to make them understandable and operable. You are able to effectively communicate decisions, ideas, designs, and operation of systems and services to others in a clear and concise manner.
You are both a generalist, capable of picking up and working with multiple, disparate systems, and an expert, having an ability to dive deep into specific topics and quickly master them. You comfortably move between system, service, and instance level views.
You have a love of stateful systems containing Treasured data, ensuring we continue to protect customer data from loss occurring from outages.
Things you will do:
Your background and skills will include:
We would be thrilled if you: