Senior Site Reliability Engineer

GoDaddy

BulgariaRemoteIT Ops6+ yrs

About the role

Location Details:

At GoDaddy the future of work looks different for each team. Some teams work in the office full-time, others have a hybrid arrangement (they work remotely some days and in the office some days) and some work entirely remotely.

This position may be a hybrid or fully remote position, as decided by your manager. If designated as hybrid, you’ll divide your time between working remotely from your home and an office location, so you should live within commuting distance. If designated as remote, you’ll be working remotely from your home and may occasionally visit a GoDaddy office to meet with your team for events or meetings. Your hiring manager can share more about this role’s hybrid or remote designation.

Join our team

Our Global Sustaining Engineering team sits at the intersection of software engineering and infrastructure, ensuring the services our customers depend on are fast, resilient, and always available. As a Senior Site Reliability Engineer, you'll take direct ownership of production services — from initial design through day-to-day operation — while partnering with product, engineering, and security teams to build and maintain business-critical systems. This is a role where you'll deepen your technical expertise, grow your leadership presence by mentoring the next generation of SREs, and gain hands-on experience integrating AI-driven tooling into real-world operational workflows.

What you'll get to do...

Design, implement, and operate scalable, highly available production services while diagnosing and resolving complex infrastructure, network, and application issues

Build and maintain alerting pipelines, dashboards, and SLO-driven monitoring strategies using Icinga, Prometheus, and Grafana

Lead incident response end-to-end — performing root-cause analysis, authoring blameless post-mortems, and driving corrective actions to closure

Develop and extend Infrastructure as Code coverage and build internal tooling that eliminates manual, repetitive operational work

Mentor SRE I and SRE II engineers through code reviews, debugging sessions, and knowledge-sharing talks

Apply LLM-driven log analysis, anomaly detection, and generative AI tools to accelerate incident response and runbook creation — validating all outputs before use

Your experience should include…

6+ years of professional experience in Site Reliability Engineering or Platform Engineering with demonstrated success leading organization-wide reliability programs

Deep hands-on expertise with Kubernetes (deployments, operators, custom resources) and Docker in production environments

Advanced Linux troubleshooting skills covering kernel internals, TCP/IP, DNS, and load balancers

Proficiency in Python for production-grade automation and scripting, with working knowledge of Bash

Expertise in Ansible and at least one additional Infrastructure as Code tool such as Terraform or Pulumi, with hands-on mastery of Icinga, Prometheus, and Grafana

Understanding of l

Underpaid estimate

~₹22 LPA for Site Reliability Engineers (industry-wide) · based on 5 submissions

Check yours