Principal Production Engineer (SRE)
Legion
About the role
Principal Production Engineer
Hybrid, Bucharest, Romania
JOB OVERVIEW
Are you passionate about automation, cloud infrastructure, Kubernetes, and reliability engineering? As a Senior Production Engineer (SRE) at Legion, you will build and operate a secure, highly scalable, and cost-effective AWS/Kubernetes-based cloud platform. You will work across infrastructure automation, CI/CD pipelines, observability, and production reliability. Simply put, the SRE team ensures Legion’s platform is reliable, scalable, and continuously improving for our customers.
This role includes participation in an on-call rotation.
RESPONSIBILITIES AND DUTIES
Support and operate Legion’s AWS-based cloud platform and Kubernetes (EKS) environments.
Leverage GenAI tools (e.g., Claude Code, Codex, or similar) to accelerate infrastructure development, automation, and auto-remediation of common production issues.
Build and maintain infrastructure-as-code using Terraform.
Develop automation and internal tooling using Go or Python.
Improve CI/CD pipelines to increase deployment safety and velocity.
Define and improve monitoring, alerting, and observability systems.
Respond to production incidents, conduct root cause analysis, and implement systemic improvements.
Develop and automate operational runbooks and remediation workflows.
Support production deployments, including during off-hours as needed.
REQUIRED SKILLS AND QUALIFICATIONS
8+ years of experience in SRE, DevOps, or SaaS production operations.
5+ years of hands-on experience operating large scale production workloads in AWS.
Strong experience with Terraform and infrastructure-as-code practices.
5+ years of experience with containerized environments using Docker and Kubernetes (EKS preferred); familiarity with Helm.
Proficiency in Go or Python (or similar programming language).
Experience building and maintaining CI/CD systems (Git-based workflows, Argo, Jenkins or similar).
Strong Linux/Unix systems experience.
Bachelor’s degree in Computer Science or equivalent practical experience.
PREFERRED QUALIFICATIONS AND ATTRIBUTES
Experience with observability tools such as Datadog, CloudWatch, ELK stack, Prometheus, or similar.
Experience managing AWS RDS and/or Aurora MySQL including slow query analysis, replication, and upgrade operations.
Experience implementing SLIs/SLOs and reliability best practices.
Experience working effectively with remote, distributed teams.
Experience with supporting SOC 2 / ISO 27001 audits.
AWS certification preferred.
ABOUT LEGION
Join Legion's mission to turn hourly jobs into good jobs. We're a mission-driven team seeking exceptional talent to propel this vision. Embrace a culture that's collaborative, fast-paced, and entrepreneurial. With us, you'll grow your skills, work closely with experienced executives, and contribute significantly to our mission. Our award-winning AI-native workforce management platform is intelligent, automated, and employee-centric and proven to d