Responsibilities
- Architect and implement distributed systems that autonomously detect, diagnose, and resolve issues within large-scale cloud environments
- Lead technical direction for solving complex problems and creating platforms that enhance security, system reliability, and operational visibility at enterprise levels
- Create robust AI Agents using large language models and agentic architectures to speed up incident analysis, diagnostics, root cause identification, and automated runbook workflows
- Build cloud-based services that ensure system reliability, observability, and maintainability across Linux/BSD, bare metal infrastructure, and Kubernetes clusters
- Conduct in-depth system investigations involving operating systems, network protocols, and distributed computing components, utilizing performance metrics, packet analysis, and memory dumps as needed
Benefits
- Multiple health insurance plan options
- Flexible time off policies covering vacation and illness
- Comprehensive parental leave benefits
- Retirement savings plans
- Reimbursement for educational pursuits
- On-site amenities and additional workplace perks
Work Arrangement
Hybrid


