JOB TITLE: Senior Site Reliability Engineer
LOCATION: London, Edinburgh, Leeds or Bristol
HOURS: Full time
WORKING PATTERN: Our work style is hybrid, which involves spending at least two days per week, or 40% of our time in one of the above locations
About this opportunity
We’re seeking an experienced Site Reliability Engineer to join the Cloud Enabling team within the Personalised Experiences and Communication Platform. This role is central to strengthening our SRE capability and improving the resiliency, availability and security of our platforms. The ideal candidate will have experience in SRE, software engineering, data engineering or AI/MLOps, with a proven track record of supporting high‑throughput systems at scale. Strong experience with hybrid‑cloud architectures, Kubernetes‑based workloads, networking, and monitoring/logging solutions is essential, along with an engineering mindset suited to large, complex organisations.
What you’ll do:
Support platforms serving millions of customers and billions of requests each month, ensuring availability, scalability and resiliency.
Act as a key technical contributor within PEC, working with SRE guilds to improve cloud deployments, monitoring, CI/CD pipelines and cost efficiency.
Explore and adopt new technologies and practices to advance SRE capabilities, including AI‑driven tooling and automation
Apply hands‑on experience running high‑throughput production systems to deliver customer value beyond POCs.
Define and implement SLAs, SLOs and SLIs across software and data teams.
Improve incident management through better tooling, alerting, runbooks and automated remediation.
Act as a subject matter expert in site reliability engineering, contributing to technical discussions and fostering a culture of continuous learning across the lab.
About us
We're on an exciting transformation journey and there could not be a better time to join us. The investments we're making in our people, data, and technology are leading to innovative projects, fresh possibilities and countless new ways for our people to work, learn, and thrive.
What you’ll need
Hands-on proven experience of software development, testing, monitoring, and operational stability at scale.
Production experience with k8s and monitoring tools such as Datadog/Dynatrace/etc.
Proven experience and knowledge of automation and CI/CD and best practices
Proven experience of running postmortems, defining SLAs/SLIs/SLOs and participating in support rotas
Coding/scripting experience developed in a commercial/industry setting (python/bash)
Database knowledge, streaming and batch operations and designing APIs
Proficient with Kubernetes (ideally microservice architectures using istio service mesh)
Extensive experience of Cloud native solutions (ideally Google Cloud).
Good understanding of cloud storage, networking, and resource provisioning.
About working for us
Our focus is to ensure we're inclusive every day, building an organisation that reflects modern society and celebrates diversity in all its forms.
We want our people to feel that they belong and can be their best, regardless of background, identity or culture.
We were one of the first major organisations to set goals on diversity in senior roles, create a menopause health package, and a dedicated Working with Cancer initiative.
And it’s why we especially welcome applications from under-represented groups.
We’re disability confident. So, if you’d like reasonable adjustments to be made to our recruitment processes, just let us know.
We also offer a wide-ranging benefits package, which includes:
Ready for a career where you can have a positive impact as you learn, grow and thrive? Apply today and find out more!