Staff Data Engineer

Sarvam AI

BengaluruEngineering5+ yrs

About the role

About Sarvam

Sarvam is building the bedrock of Sovereign AI for India. The company is developing India's full-stack sovereign AI platform, building across research, models, infrastructure and applications with a singular focus on making AI genuinely work for India. Sarvam works with leading enterprises and public institutions and is backed by Lightspeed, Peak XV, and Khosla Ventures. Sarvam partners with India's leading brands, including Tata Capital, SBI Life, CRED, IDFC, and LIC.

The Role

We're hiring a Staff Engineer to design and build Sarvam's Data & Analytics Platform from the ground up. This is a high-ownership, high-leverage role — you will be the technical owner of a system that every Sarvam product writes to and every Sarvam customer reads from.

The platform has three jobs:

1. Ingest every meaningful event from every Sarvam product — outbound calls, agent turns, user turns, knowledge-base lookups, lead creation, deployments, model invocations, and more — through a clean API layer that handles auto schema evolution and dynamic table creation per event type.

2. Replicate product databases into the analytics store via CDC, so operational data stays queryable alongside event streams without putting load on transactional systems.

3. Expose the data through a tenant-aware query API and a no-code dashboarding layer — powering customer-facing product analytics, internal BI, and the source-of-truth feed for our finance team's billing pipeline.

You will own architecture, build the core systems hands-on, set the engineering bar, and shape the team that grows around this platform. Databases, Kafka, and internal stores are never exposed directly — every read and write goes through APIs you design.

What you'll build

- A high-throughput event ingestion API in Go that accepts arbitrary event payloads, validates them, and lands them in ClickHouse with automatic table creation, column addition, and safe schema evolution as event shapes change.

- CDC pipelines using Debezium and Kafka to mirror product Postgres/MySQL databases into ClickHouse, with deduplication, ordering guarantees, and replay tooling.

- A multi-tenant query API with strict RBAC, per-tenant isolation, query budgets, and the same surface powering customer-facing dashboards, internal analytics, and the finance team's billing extracts.

- A no-code dashboard layer that lets customers build product-analytics views without writing SQL — and lets internal teams ship dashboards in hours, not days.

- The operational backbone: ClickHouse cluster design (sharding, replication, MergeTree families, materialized views), capacity planning, cost controls, observability, on-call playbooks, and SLOs.

- The engineering culture for this team — design reviews, RFC process, testing standards, and the bar that future hires will be measured against.

What we're looking for

Experience

- 5+ years of backend / data-platform engineering, with a meaningful chunk spent designing a

Underpaid estimate

~₹20 LPA for Data Engineers (industry-wide) · based on 61 submissions

Check yours