We're looking for a Senior Site Reliability Engineer to help sustain and evolve the infrastructure behind one of the world's most visited platforms. You'll play a key role in ensuring reliability, performance, and scalability across a vast, distributed system used by millions daily. This position is central to maintaining operational excellence while advancing automation, observability, and resilience.

What You’ll Do

Manage and optimize production systems through deployment, configuration, and ongoing maintenance using modern DevOps practices
Design and implement automation for provisioning, scaling, and monitoring services using tools like Puppet and Kubernetes
Collaborate with engineering teams to shape scalable architectures and guide best practices in system design
Respond to incidents as part of a rotating on-call schedule, leading diagnosis, resolution, and post-mortem analysis to strengthen system resilience
Diagnose complex issues across layers—from network protocols to application performance—using deep knowledge of TCP/IP, HTTP, TLS, and DNS
Contribute to a culture of continuous improvement by identifying inefficiencies and driving automation initiatives
Mentor team members and share expertise across a globally distributed, asynchronous work environment
Occasionally travel 1–2 times per year for team gatherings and in-person collaboration

What We’re Looking For

At least six years of experience in site reliability, systems engineering, or DevOps roles within large-scale environments
Strong scripting ability in Python, Bash, or similar, with hands-on experience in configuration management (especially Puppet)
Proven skill in Linux system administration, particularly on Debian-based systems, including package management and kernel-level troubleshooting
Deep understanding of distributed systems, caching architectures, and performance optimization
Experience with incident response, root cause analysis, and implementing preventive measures
Excellent written and verbal communication skills in English, with the ability to work independently across time zones

Nice to Have

Background in tuning Linux kernels for high-throughput services
Familiarity with caching proxies such as Varnish, Nginx, or Envoy
Experience with monitoring and alerting stacks like Prometheus and Grafana
Contributions to open-source projects or active participation in developer communities
Knowledge of PHP, HHVM, Redis, or MediaWiki ecosystems
Experience defining and managing service-level objectives (SLOs) across teams

Our Environment

We operate as a remote-first organization with team members across more than 40 countries. All code, configuration, and documentation are publicly accessible, reflecting our commitment to open-source principles. Our culture values diversity, transparency, and continuous learning. We prioritize equitable compensation, inclusive hiring, and accessibility for all applicants and employees.

Wikimedia Foundation is hiring a Senior Site Reliability Engineer

What You’ll Do

What We’re Looking For

Nice to Have

Our Environment

Similar Jobs

Senior Infrastructure Engineer /DevOps

Senior Site Reliability Engineer - Production Engineering (Remote - Ireland)

DataBase Administrator

Cloud Systems Engineer

Trainee DevOps Engineer - Tieto Tech Consulting (m/f/d)

Cloud Deployment Engineer (AI Security)

Related Articles

Platform Engineering: Kubernetes for All

Developer Experience Platform: Lessons from Europe

Kubernetes Remote Jobs: AI & Cloud-Native Careers in 2026