Backend Engineer - Studio Media Platform
Sarvam AI
About the role
About Sarvam
Sarvam is building the bedrock of Sovereign AI for India. The company is developing India’s full-stack sovereign AI platform, building across research, models, infrastructure and applications with a singular focus on making AI genuinely work for India. Sarvam works with leading enterprises and public institutions and is backed by Lightspeed, Peak XV, and Khosla Ventures. Sarvam partners with India’s leading brands, including Tata Capital, SBI Life, CRED, IDFC, and LIC.
About the Role
We are hiring a Backend Engineer to work across Sarvam’s Studio media platform — spanning AI dubbing, live translation, and the shared service foundation that powers all Studio products (voice cloning, stem separation, lip sync, music generation, and more). You will build and maintain production services, ML pipeline libraries, and platform SDKs that together enable multilingual media processing at scale for enterprise customers and Sarvam Studio users.
The work cuts across multiple codebases: a core ML pipeline library (ASR, translation, TTS, audio processing), production services for dubbing and live translation, and a shared platform SDK that provides common capabilities to every Studio service.
What You’ll Do
Service & Infrastructure
- Design and optimize production FastAPI services for dubbing and live translation — multi-stage task orchestration, rate-limited scheduling, and backpressure controls for concurrent workloads
- Build and maintain distributed worker architectures with independent scaling per pipeline stage and automatic recovery of stuck or failed tasks
- Own the data layer — async ORM models, schema migrations, and query optimization on PostgreSQL
- Implement real-time features — WebSocket-based job tracking for dubbing and streaming audio pipelines for live translation
- Manage Kubernetes deployments — Helm charts, secrets management, ingress configuration, and multi-role container images
ML Pipeline & Library
- Extend and maintain the core dubbing library across all pipeline stages: audio extraction, VAD, speech recognition, translation, QC, TTS, and final video stitching
- Integrate and optimize ML model serving — remote inference server clients and local model inference for audio analysis and vocal separation
- Build and improve QC orchestration — automated scoring, tempo analysis, guided normalization, and pronunciation verification
- Design async-first pipelines with efficient concurrency patterns for CPU-bound audio processing
- Maintain and evolve LLM integration layers for translation, QC, and pre-processing across multiple provider backends
Platform SDK & Shared Services
- Build and maintain the shared Studio service SDK — reusable FastAPI middleware and routers for authentication, billing, workspace isolation, and input validation
- Design media storage abstractions — upload, signed URL generation, retention policies, and cloud blob storage integration
- Implement cross-cutting concerns: rate
Underpaid estimate
~₹24 LPA for Backend Engineers (industry-wide) · based on 79 submissions