Senior Site Reliability Engineer
Join a team focused on building and maintaining robust, scalable systems that power critical financial services. In this role, you'll bridge infrastructure and development to enhance system reliability, performance, and automation across complex environments.
What You'll Do
- Architect and deploy resilient, high-performance infrastructure to support mission-critical applications
- Partner with engineering teams to embed reliability practices into development workflows
- Lead incident response efforts and conduct thorough post-mortems to drive systemic improvements
- Develop and maintain monitoring and alerting pipelines using Prometheus and Grafana
- Automate deployment processes using ArgoCD or FluxCD to ensure consistent, reliable releases
- Manage containerized workloads on Kubernetes across multiple cloud platforms
- Use Terraform and Ansible to maintain infrastructure as code at scale
- Support 24/7 operations through scheduled on-call rotations
What You Bring
- 2–4 years of hands-on experience with distributed systems and infrastructure automation
- Proven expertise in Kubernetes, container orchestration, and cloud platforms (AWS, GCP, Oracle Cloud)
- Strong experience managing MongoDB and PostgreSQL in production environments
- Deep familiarity with CI/CD pipelines and GitOps tools such as ArgoCD or FluxCD
- Proficiency with observability stacks including Prometheus and Grafana
- Experience using Terraform and Ansible for infrastructure provisioning and configuration
- Commitment to operational excellence, security standards, and system uptime


