Senior Site Reliability Engineer
GoDaddy
About the role
Location Details:
At GoDaddy the future of work looks different for each team. Some teams work in the office full-time, others have a hybrid arrangement (they work remotely some days and in the office some days) and some work entirely remotely.
This position may be a hybrid or fully remote position, as decided by your manager. If designated as hybrid, you’ll divide your time between working remotely from your home and an office location, so you should live within commuting distance. If designated as remote, you’ll be working remotely from your home and may occasionally visit a GoDaddy office to meet with your team for events or meetings. Your hiring manager can share more about this role’s hybrid or remote designation.
Join our team
Our Global Sustaining Engineering team sits at the intersection of software engineering and infrastructure, ensuring the services our customers depend on are fast, resilient, and always available. As a Senior Site Reliability Engineer, you'll take direct ownership of production services — from initial design through day-to-day operation — while partnering with product, engineering, and security teams to build and maintain business-critical systems. This is a role where you'll deepen your technical expertise, grow your leadership presence by mentoring the next generation of SREs, and gain hands-on experience integrating AI-driven tooling into real-world operational workflows.
What you'll get to do...
Design, implement, and operate scalable, highly available production services while diagnosing and resolving complex infrastructure, network, and application issues
Build and maintain alerting pipelines, dashboards, and SLO-driven monitoring strategies using Icinga, Prometheus, and Grafana
Lead incident response end-to-end — performing root-cause analysis, authoring blameless post-mortems, and driving corrective actions to closure
Develop and extend Infrastructure as Code coverage and build internal tooling that eliminates manual, repetitive operational work
Mentor SRE I and SRE II engineers through code reviews, debugging sessions, and knowledge-sharing talks
Apply LLM-driven log analysis, anomaly detection, and generative AI tools to accelerate incident response and runbook creation — validating all outputs before use
Your experience should include…
6+ years of professional experience in Site Reliability Engineering or Platform Engineering with demonstrated success leading organization-wide reliability programs
Deep hands-on expertise with Kubernetes (deployments, operators, custom resources) and Docker in production environments
Advanced Linux troubleshooting skills covering kernel internals, TCP/IP, DNS, and load balancers
Proficiency in Python for production-grade automation and scripting, with working knowledge of Bash
Expertise in Ansible and at least one additional Infrastructure as Code tool such as Terraform or Pulumi, with hands-on mastery of Icinga, Prometheus, and Grafana
Understanding of l
Underpaid estimate
~₹22 LPA for Site Reliability Engineers (industry-wide) · based on 5 submissions