Guidewire is hiring a Site Reliability Engineer

About the Role

The role involves combining software engineering and operations to build and maintain reliable, scalable systems. Responsibilities include incident management, automation, monitoring, and improving system performance.

Responsibilities

  • Design and implement monitoring solutions for system health and performance
  • Respond to and resolve production incidents promptly
  • Develop automation tools to reduce manual operational tasks
  • Collaborate with development teams to improve system reliability
  • Participate in on-call rotations for critical system support
  • Analyze system failures and implement preventive measures
  • Optimize system performance and scalability
  • Maintain and improve deployment pipelines
  • Ensure high availability of services and infrastructure
  • Troubleshoot complex distributed systems issues
  • Support incident post-mortem processes with actionable recommendations
  • Implement and manage configuration management tools
  • Work on capacity planning and resource forecasting
  • Enforce security and compliance standards in production systems
  • Contribute to disaster recovery planning and execution
  • Develop and maintain technical documentation
  • Drive adoption of best practices in reliability engineering
  • Evaluate and integrate new technologies for operational efficiency
  • Monitor and report on service level objectives and error budgets
  • Collaborate on system architecture improvements
  • Support cloud infrastructure management and optimization
  • Promote a culture of blameless post-mortems and continuous learning
  • Assist in code reviews with a focus on operational impact
  • Ensure systems meet defined reliability and uptime targets
  • Participate in system design reviews for new features

Compensation

Competitive salary and benefits package

Work Arrangement

Hybrid work model with flexibility for remote and office-based work

Team

Collaborative engineering team focused on scalable systems and operational excellence

Why This Role Matters

This position plays a critical role in maintaining the stability and performance of large-scale software systems. The engineer ensures that services remain available and responsive under varying loads and helps bridge the gap between development and operations.

Technology Stack

The team uses modern cloud infrastructure, Kubernetes for orchestration, Prometheus and Grafana for monitoring, GitLab for CI/CD, and a microservices-based architecture built with Java and Go.

Growth Opportunities

Engineers are encouraged to lead initiatives, mentor peers, and contribute to cross-team projects. There are clear pathways for technical and leadership advancement.

Visa sponsorship available for qualified candidates

Required Skills
AWSKubernetesEKSTerraformHelmDockerPostgreSQLGitInfrastructure as Code
About company
Guidewire
Guidewire is the platform P&C insurers trust to engage, innovate, and grow efficiently. They combine digital, core, analytics, and AI to deliver their platform as a cloud service. More than 540+ insurers in 40 countries run on Guidewire.
All jobs at Guidewire Visit website
Job Details
Category infrastructure
Posted 10 months ago