Senior Production Engineer
Role Overview
We are looking for a Senior Production Engineer to design, operate, and scale a robust cloud infrastructure built on AWS and Kubernetes (EKS). You will play a key role in ensuring system reliability, security, and efficiency while supporting rapid development and deployment cycles. This position includes participation in an on-call rotation to maintain production stability.
Key Responsibilities
- Operate and maintain cloud infrastructure on AWS, with a focus on EKS and containerized workloads.
- Use generative AI coding tools such as Claude Code or Codex to accelerate infrastructure development and enable automated responses to recurring production issues.
- Design and manage infrastructure-as-code using Terraform for consistent, repeatable deployments.
- Build automation scripts and internal tools using Go or Python to improve operational efficiency.
- Enhance CI/CD pipelines to support faster, safer software delivery using Git-based workflows and tools like Argo or Jenkins.
- Develop and refine monitoring, alerting, and observability practices using platforms like Datadog, Prometheus, or CloudWatch.
- Lead incident response efforts, perform root cause analysis, and implement long-term fixes to prevent recurrence.
- Create and automate runbooks and remediation procedures for common operational scenarios.
- Support production releases, including occasional off-hours coordination when necessary.
Required Qualifications
- 5-8+ years of experience in site reliability engineering, DevOps, or SaaS platform operations.
- Minimum of 3 years managing production systems in AWS environments.
- Proven expertise with Terraform and infrastructure-as-code methodologies.
- 3+ years working with Docker and Kubernetes (EKS preferred), including Helm for package management.
- Strong programming skills in Go or Python (or equivalent language).
- Experience building and maintaining CI/CD pipelines using Git, Argo, Jenkins, or similar tools.
- Deep familiarity with Linux/Unix systems and command-line environments.
- Bachelor’s degree in Computer Science or equivalent hands-on experience.
Preferred Qualifications
- Experience with observability platforms such as Datadog, ELK stack, Prometheus, or CloudWatch.
- Background managing AWS RDS or Aurora MySQL, including query optimization, replication, and upgrades.
- Knowledge of SLI/SLO frameworks and reliability engineering best practices.
- Proven ability to collaborate effectively with remote, distributed teams.
- Experience supporting SOC 2 or ISO 27001 compliance audits.
- AWS certification is a plus.
Technology Environment
AWS, Kubernetes (EKS), Terraform, Docker, Helm, Go, Python, CI/CD pipelines, Argo, Jenkins, Git, Datadog, CloudWatch, ELK stack, Prometheus, RDS, Aurora MySQL, Linux/Unix, Claude Code, Codex
Compensation & Benefits
Base salary range: $165,000 – $195,000 annually. The package includes stock equity and comprehensive benefits. Medical, dental, and vision plans are available with $0 monthly premium, effective on day one. Additional offerings include a 401k plan, discretionary paid time off, paid holidays, parental leave, monthly wellness reimbursement, and a monthly lunch stipend. This is a fully remote position open to candidates in the United States.
Work Environment
This role operates in a fully remote, global setup with flexible scheduling. We value collaboration, innovation, and a fast-moving, entrepreneurial culture. Our team thrives on solving complex infrastructure challenges and continuously improving system resilience.
Diversity & Inclusion
We are committed to building an inclusive workplace where all individuals are treated with fairness and respect. We welcome applicants regardless of race, color, religion, sex, age, national origin, ancestry, citizenship status, disability, marital status, familial status, sexual orientation, pregnancy, genetic information, gender identity, gender expression, veteran status, or any other protected characteristic under applicable law. All qualified candidates will be considered equally for employment opportunities.


