Responsibilities
- Lead the transition from manual operations to fully automated, code-based infrastructure provisioning using Terraform.
- Create and maintain reusable Terraform modules for AWS services including VPCs, EC2, EKS, RDS, Aurora, IAM, and others.
- Develop and maintain Ansible playbooks for managing configurations across Windows and Linux systems.
- Advocate for infrastructure changes exclusively through code and version control, with no exceptions.
- Design and implement GitOps workflows using GitHub Actions or ArgoCD for deploying infrastructure and applications.
- Implement branch protection rules, code review requirements, and automated validation for infrastructure pull requests.
- Define and enforce GitOps standards across the team, guiding team members in adopting new workflows.
- Ensure cloud infrastructure on AWS remains reliable, available, and scalable across environments.
- Establish and monitor SLOs and SLIs, and develop alerts, dashboards, and runbooks to maintain high operational standards.
- Respond to production incidents and lead post-mortem analyses to implement long-term corrective actions.
- Drive cost efficiency by implementing optimization strategies and enforcing resource tagging and governance policies.
- Guide and train members of the Cloud Operations team in infrastructure-as-code and automation techniques.
- Document architectural decisions, patterns, and standards to make knowledge accessible across the team.
- Amplify team impact by enabling others to build effectively, measuring success beyond individual output.
Work Arrangement
Remote — Bangalore