Staff SRE Engineer (DevOps+SRE+Platform Eng+IDP)
Vonage
About the role
Join Vonage and help us innovate cloud communications for businesses worldwide!
Why this role matters:
As a Staff Platform/SRE Engineer, you are a key technical leader for our India-based engineering team, acting as a bridge between high-level architectural vision and world-class execution. Working closely with Senior Management and the Global Architecture team, you will help shape Vonage’s engineering culture. You will drive the implementation of our long-term technical roadmap for our Cloud-Native Kubernetes platform, ensuring our global production APIs remain resilient, cost-efficient, and secure. Your mission is to eliminate systemic friction through software engineering, ensuring a seamless experience for hundreds of developers interacting with the cloud.
Your key responsibilities:
Collaborative Technical Leadership: Work in partnership with the architecture team to scope, size, and execute large-scale initiatives. You serve as a primary technical authority and escalation point for the team in India.
Predictive Problem Solving: Use deep systems knowledge to foresee architectural bottlenecks. You proactively design solutions for future scale, ensuring the platform stays ahead of business demands.
Mentorship & Multiplier: Act as a lead mentor for junior and senior engineers. You provide the guidance needed to raise the engineering bar, fostering a culture of technical excellence and self-sufficiency.
Infrastructure as Code (IaC) Culture: Drive the evolution of our IaC practices using tools like Terraform and Crossplane, moving the organization toward a Kubernetes-native management style.
Reliability & Systems Debugging: Respond effectively to service failures by diving into AWS/EKS, Linux, and ArgoCD. Lead the team in performing thorough, blame-free Root Cause Analysis (RCA).
CI/CD Modernization: Support the transition from legacy systems to modern automation. While expertise in GitHub Actions and enterprise-grade workflows is highly valued, your primary goal is ensuring a robust and automated deployment pipeline.
What you'll bring
Experience: 12+ years of progressive experience in software engineering, systems design, and cloud architecture.
AWS & Kubernetes Excellence: Expert-level experience managing massive, high-traffic, multi-tenant Kubernetes environments specifically on AWS (EKS).
Cloud-Native Proficiency: Deep understanding of the AWS ecosystem, including networking, storage, and security layers.
Observability Architecture: Experience implementing and managing observability stacks using tools like Prometheus, Grafana, VictoriaMetrics, or Thanos.
FinOps & Efficiency: A proven track record of optimizing cluster topologies to balance performance with cost-efficiency goals.
Treats AI assisted engineering as a core part of modern software practice, leveraging AI tools to enhance design, coding, testing, troubleshooting, and documentation, while guiding the team in adopting effective patterns and evolving best practices for