Requirements
- 7-12 years of experience in SRE, DevOps, or Platform Engineering, with progressive responsibility
- Proven experience designing, implementing, deploying, and operating highly available, fault-tolerant, auto-scaling, and auto-healing distributed systems
- Strong proficiency in Python, Bash, Java, and configuration languages (JSON/YAML)
- Advanced experience with Infrastructure as Code using Terraform and CloudFormation
- Expertise in CI/CD pipeline design and implementation using Jenkins and GitLab
- Extensive hands-on experience with observability platforms (Elastic Stack, Grafana, Prometheus) including designing monitoring strategies
- Expert-level Linux administration and troubleshooting in enterprise environments
- Strong understanding of systems architecture, networking, security, and distributed systems principle
- Track record of delivering and maintaining systems with 99.9%+ uptime SLAs
- Experience running disaster recovery exercises and implementing zero-downtime deployment solutions
Nice to Have
- Kubernetes and container orchestration at scale
- Experience in banking/financial services with understanding of regulatory and compliance requirements
- Deep expertise in AWS cloud architecture and services at enterprise scale


