Embedded Infrastructure Engineer, Chanakya
Sarvam AI
About the role
About Sarvam
Sarvam is building the bedrock of Sovereign AI for India. The company is developing India's full-stack sovereign AI platform, building across research, models, infrastructure and applications with a singular focus on making AI genuinely work for India. Sarvam works with leading enterprises and public institutions and is backed by Lightspeed, Peak XV, and Khosla Ventures. Sarvam partners with India's leading brands, including Tata Capital, SBI Life, CRED, IDFC, and LIC.
About the Role
Embedded Infrastructure Engineers design, build, and maintain the data infrastructure that underpins AI system deployments at client sites. You work alongside Embedded Data Scientists and Strategic Deployment Engineers to ensure that terabyte-scale datasets can be ingested, stored, queried, and served to AI reasoning engines reliably and performantly.
This means building and operating data platforms that handle terabyte-scale persistent data stores and sustained large daily ingestion volumes across structured records, documents, imagery, audio, and geospatial data. You will design and maintain the databases, object stores, ingestion pipelines, and processing layers that make this possible.
You will make critical decisions about storage architecture, indexing strategies, pipeline orchestration, and system performance — often in constrained, air-gapped, or operationally sensitive environments where you cannot rely on managed cloud services or standard enterprise tooling. You will own the reliability and performance of the infrastructure layer in your assigned accounts.
What You'll Do
• Design and operate data storage architectures (relational, document, vector, object storage) capable of managing terabyte-scale datasets across multiple modalities
• Build and maintain ingestion pipelines that reliably process daily data influx — including batch and streaming workloads — with monitoring, error handling, and backpressure management
• Implement indexing, partitioning, and query optimisation strategies that allow AI systems and data scientists to retrieve and reason over large datasets with acceptable latency
• Work with Embedded Data Scientists to translate ontologies, schemas, and semantic structures into performant physical data models and storage configurations
• Deploy and manage database systems, vector stores, and search infrastructure in air-gapped, on-premise, or security-constrained environments
• Build observability into the data platform: monitor pipeline health, storage utilisation, query performance, and ingestion lag
• Own capacity planning and scaling decisions for data infrastructure across assigned client deployments
• Collaborate with product and engineering teams to feed infrastructure learnings back into the core platform and tooling
What We're Looking For
• 4–8 years in data infrastructure, data engineering, platform engineering, or site reliability engineering, ideally at organisations operating at significant data s