Role Overview
This position is responsible for providing technical leadership in cloud infrastructure, focusing on operational reliability, performance, and cost efficiency at scale. You will work across AWS, Google Cloud, and Azure platforms, applying AI-driven insights and automation to enhance system resilience and performance.
Key Responsibilities
- Lead the design and implementation of secure, high-availability cloud architectures spanning multiple regions and providers.
- Apply expertise in cloud-native technologies to improve scalability, security, and operational efficiency.
- Manage and optimize middleware components, including SQL and NoSQL databases, message brokers, and stream processing systems.
- Guide the selection and deployment of CloudOps and DevOps tooling, with a focus on automation and infrastructure as code.
- Diagnose and resolve complex production issues, contributing to root cause analysis and long-term fixes.
- Collaborate with engineering and SRE teams during system design, architecture reviews, and incident response.
- Develop and refine deployment pipelines for Kubernetes-hosted microservices using tools like Terraform, Helm, and ArgoCD.
- Monitor system performance, identify bottlenecks, and recommend improvements to enhance availability and responsiveness.
- Support capacity planning and proactive scaling strategies to meet evolving service demands.
Required Qualifications
- Bachelor’s degree in a technical field, preferably Computer Science or Engineering.
- Minimum of 8 years in CloudOps or DevOps roles supporting large-scale environments.
- Proven experience with at least one major cloud provider (AWS, Azure, or Google Cloud).
- Strong understanding of Linux, networking, and security fundamentals.
- Hands-on experience with containerization (Docker, Kubernetes) and infrastructure automation tools.
- Familiarity with data platforms such as PostgreSQL, Redis, Kafka, Elasticsearch, and Flink.
- Experience using monitoring solutions like Prometheus, Grafana, or Nagios.
- Demonstrated ability to troubleshoot complex distributed systems.
- Ability to collaborate effectively in a globally distributed team environment.
Preferred Qualifications
- Background in cloud security, compliance frameworks, or identity management.
Technology Environment
The role leverages a modern stack including AWS, Google Cloud, and Azure, with Kubernetes for orchestration, Terraform and Helm for provisioning, and ArgoCD for GitOps workflows. Data systems include Kafka, RabbitMQ, Flink, and PostgreSQL, supported by monitoring through Prometheus and Grafana.
Work Environment
This is a globally distributed role requiring flexibility across time zones. The team operates in a collaborative, inclusive culture focused on innovation, continuous improvement, and technical excellence.


