Linux Monitoring Desk Associate - L1
Neysa
About the role
Job Title: Monitoring Desk Associate – Service Assurance Team
Location: Mumbai
Type: Onsite - Work from office
About Neysa:
Neysa is an AI Acceleration Cloud System provider, dedicated to democratizing AI adoption with purpose-built platforms and services for AI-native applications and workloads. Co-founded by industry leaders, we empower businesses to discover, deploy, and scale Generative AI (Gen AI) and AI use cases securely and cost-effectively. Our flagship platforms—Neysa Velocis, Neysa Overwatch, and Neysa Aegis—accelerate AI deployment, optimize network performance, and safeguard AI/ML landscapes. We are committed to enabling AI-led innovation across industries and geographies.
Position Overview:
We are looking for a Monitoring Desk Associate to join our Service Assurance Team. This position will play a key role in ensuring the optimal performance of Neysa’s AI platforms by monitoring system health, responding to incidents, and performing troubleshooting and resolution in real time. The ideal candidate will have hands-on experience with Linux systems, a passion for operational excellence, and the ability to quickly resolve issues impacting service availability.
Key Responsibilities:
Incident Monitoring & Response: Monitor Neysa's AI platforms and infrastructure for any system alerts, performance issues, or service disruptions. Respond to incidents promptly and escalate issues as needed to ensure timely resolution.
Incident Management: Follow defined processes for incident identification, classification, and escalation. Ensure incidents are managed effectively, with minimal disruption to service and in alignment with service level agreements (SLAs).
Troubleshooting & Resolution: Use your Linux expertise to investigate, diagnose, and resolve incidents affecting system performance. Troubleshoot system-level issues, application failures, and network-related problems.
Proactive Monitoring: Continuously monitor the operational status of servers, applications, and networks, proactively identifying potential issues before they impact customers. Utilize monitoring tools such as Nagios, Prometheus, or Grafana to track system health.
Documentation & Reporting: Accurately document incidents, actions taken, and resolutions in incident management systems. Provide detailed reports on recurring issues, root causes, and preventive measures for the Service Assurance team.
Collaboration with Technical Teams: Work closely with system administrators, engineers, and developers to identify areas of improvement, share insights, and ensure issues are resolved with minimal business impact.
Root Cause Analysis: Participate in post-incident reviews to analyze root causes, provide feedback, and suggest improvements to incident management processes.
System Maintenance: Support periodic system checks, patch management, and routine maintenance to ensure systems are secure, optimized, and operating at peak efficiency.
Qualifications:
Experience: 1-5 year