We are looking for a Senior Software Engineer with deep expertise in infrastructure reliability and cloud technologies. This position plays a key role in building and managing scalable systems that support cloud-native applications across major platforms including AWS, GCP, and Azure.
Key Responsibilities
- Architect and manage cloud-based infrastructure to ensure scalability, reliability, and security.
- Design and maintain CI/CD pipelines to enable automated testing and deployment processes.
- Monitor system performance, respond to incidents, and conduct thorough root cause investigations.
- Implement infrastructure as code using tools such as Terraform, Ansible, or CloudFormation.
- Build observability frameworks with monitoring, logging, and alerting solutions using platforms like Prometheus, Grafana, ELK, or Datadog.
- Work closely with development teams to integrate reliability, scalability, and security into application design.
- Automate routine operations and enhance system efficiency through scripting in Python, Bash, or Go.
- Guide junior engineers in best practices for site reliability and operations engineering.
- Support compliance initiatives and contribute to security audits.
Qualifications
Candidates should hold a degree in Computer Science, Engineering, or a related field, and bring at least five years of software engineering experience—with a minimum of three years focused on cloud platforms, data systems, or ML Ops. A strong foundation in DevOps or SRE roles is essential.
- Proven experience with public cloud environments: GCP, AWS, or Azure.
- Hands-on skill with containerization (Docker) and orchestration (Kubernetes).
- Familiarity with automation tools such as Jenkins, GitHub Actions, or GitLab CI.
- Proficiency in scripting languages including Python, Bash, or Go.
- Experience with monitoring, logging, and alerting ecosystems.
- Knowledge of security standards and compliance requirements in production systems.
Preferred Background
- Cloud or DevOps-related certifications.
- Background in distributed, high-availability systems.
- Exposure to database management and optimization.
- Experience with configuration management technologies.
- Strong analytical, communication, and teamwork abilities.


