Staff Engineer, API Platform

Sarvam AI

BengaluruEngineering

About the role

About Sarvam

Sarvam is building the bedrock of Sovereign AI for India. The company is developing India’s full-stack sovereign AI platform, building across research, models, infrastructure and applications with a singular focus on making AI genuinely work for India. Sarvam works with leading enterprises and public institutions and is backed by Lightspeed, Peak XV, and Khosla Ventures. Sarvam partners with India’s leading brands, including Tata Capital, SBI Life, CRED, IDFC, and LIC.

About the Role

Sarvam's model APIs (ASR, TTS, Vision, LLM, Translate) are how developers and enterprises ship products on our foundation models. The platform handles tens of millions of API calls every day, over HTTP and WebSockets, with metering, billing, prepaid wallets, and rate-limiting all part of the same stack. Today it's FastAPI and Python; we're in the middle of rewriting it in Go.

We're hiring a Staff Engineer to own this platform end-to-end. The architecture. The reliability. The performance. The standards. This is an IC role.

What You’ll Do

- The end-to-end design and evolution of the platform, from the moment a request hits the edge to the response that goes back out

- The Python→Go rewrite: leading the architecture, setting the patterns, and bringing the platform across without compromising reliability

- Audio pipeline engineering: TTS chunking around model context limits, sample-rate adjustments and format encoding, VAD-based silence detection for ASR

- Vision pipelines: orchestrating OCR, layout detection, and VLM harnesses for structured data extraction

- Streaming infrastructure: WebSocket connections, queue-based batch processing, back pressure, low-latency model invocation

- The commercial layer: metering, billing, prepaid wallet management, and rate-limiting, engineered to the same bar as the model APIs themselves

- Performance and reliability for a multi-tenant platform operating at sub-second latency SLOs

- Observability (logging, metrics, tracing), so the team can debug production confidently and quickly

- Integration testing infrastructure: the harness that lets us ship fast without breaking customer-facing APIs

- Close partnership with the Inference and MLOps teams, mentoring them on production engineering so model serving is fast, reliable, and operable

- Design-first, documentation-first engineering culture within the team: RFCs before code, decisions written down, no tribal knowledge

What We’re Looking For

- 7+ years building production backend systems at scale

- Strong Go in production: you've designed, built, and operated Go services under real load

- Comfortable in Python: you'll work in both languages through the rewrite, and FastAPI is the current foundation

- A track record of designing and operating high-scale, low-latency, multi-tenant distributed systems

- Hands-on experience with real-time / streaming systems: WebSockets, long-lived connections, backpressure

- Strong PostgreSQL and Redis