Senior Manager System Engineering

GoDaddy

ColombiaRemoteManagerCore5+ yrs

About the role

Location Details: Colombia, remote.

At GoDaddy, the future of work looks different for each team. Some teams work in the office full-time; others have a hybrid arrangement (they work remotely some days and in the office some days) , and some work entirely remotely.

This is a remote position, so you’ll be working remotely from your home. You may occasionally visit a GoDaddy office to meet with your team for events or meetings.

Join Our Team

Join GoDaddy's Forge Ops team at the intersection of Data, Infrastructure, and AI-driven operations. As Senior Manager, Systems Engineering, you will lead the reliability, cost efficiency, and agentic operation of the Data & AI ecosystem that serves GoDaddy. This is a deeply technical leadership role, not a hands-off manager position. You will operate as GoDaddy's L1/L2 authority over critical analytics and data platforms while advancing Forge Operations: a structured operating model designed to transition platform operations from hero-based, expert-dependent support to system-based, agent-assisted, self-improving operations. If you can translate a business problem into a technical architecture and that architecture into team execution — and you want to build the AI Ops pattern for a large-scale data organization, this role is for you.

What you'll get to do...

Own and operate GoDaddy's analytical and data intelligence platforms(Redshift, QuickSight, FeedDB, Protegrity, Alation) as the authoritative L1/L2 platform owner — driving reliability, deployment standards, cost optimization, and user enablement across an ecosystem with a 50PB+ data lake and thousands of consumers.

Lead 24/7 incident management and production operations across 10+ Data & AI platforms, owning MTTR/MTTD targets, AAR rigor, and a root-cause-to-control loop that converts every incident into a runbook, monitoring improvement, or automation — not just a resolved ticket.

Architect and advanced Forge Ops OS, the team's agent-based operating model. This model uses history-informed early warning, auto-recovery agents, runbook intelligence, and bounded agentic orchestration. The team transitions from operating systems to leading all aspects of agents that operate systems.

Drive data platform cost efficiency through unit economics— cost per query, cost per workload, cost per dashboard visit — translating AWS spend into measurable business metrics and continuous optimization across Redshift, QuickSight, DPaaS, and ML infrastructure.

Manage operational planning and executive reporting weekly, monthly, and quarterly. Run a sprint-based improvement program with a near 70% strategic allocation. Provide clear traceability from team execution to company goals and landmark outcomes.

Your experience should include...

5+ years validated 24/7 production operations leadership— leading incident response end-to-end, owning MTTR performance, leading post-mortems (AARs) that produce controls, and driving the systemic fixes that reduce incident recurren