As a Site Reliability Engineer, you will play a key role in ensuring the reliability, performance, and scalability of customer-facing platforms. Your responsibilities include managing both Windows and Linux systems, monitoring service health, and proactively addressing potential issues before they impact users. You will work closely with development, DevOps, and database teams to deliver resilient and efficient infrastructure solutions.
Key Responsibilities
- Operate and maintain highly available systems across hybrid environments, with a focus on stability and performance
- Analyze system and application logs, including IIS, security events, and AWS CloudTrail, to detect and resolve issues
- Use infrastructure-as-code tools such as Terraform, ARM Templates, and CloudFormation to manage environments consistently
- Design, implement, and maintain CI/CD pipelines using tools like GitHub Actions, Jenkins, Octopus, and Azure DevOps
- Ensure robust backup strategies and disaster recovery readiness
- Apply security best practices throughout the development lifecycle, from design to deployment
- Support containerized workloads using Docker and Kubernetes, including AKS and Azure-hosted services
- Champion observability using platforms like New Relic, DataDog, Application Insights, or AppDynamics
- Advocate for and follow ITIL standards in incident, change, and problem management processes
- Provide expertise on cloud platforms, with a strong emphasis on AWS and Azure services
Required Qualifications
- Minimum of 5 years in system administration or SRE roles with a focus on high-availability environments
- Proven experience with AWS services and cloud infrastructure management
- Strong scripting skills in Bash, PowerShell, or Python for automation tasks
- Hands-on experience with configuration management tools such as Ansible or Terraform
- Familiarity with networking concepts including VNETs, subnets, private links, and peering
- Experience administering SQL Server, IIS, and ASP.NET applications in production
- Working knowledge of container technologies like Docker and Kubernetes (1+ years)
- Background in Agile methodologies such as Scrum, Kanban, or Lean
- 3+ years of experience with CI/CD tooling and cloud platforms (AWS, Azure)
Preferred Skills
- Direct experience managing and maintaining SQL Server databases in enterprise environments


