Florida, Uruguay Remote (Global)

Wikimedia Foundation is hiring a Senior Site Reliability Engineer

About the Role

We're looking for a Senior Site Reliability Engineer to help sustain and evolve the infrastructure behind one of the world's most visited platforms. You'll play a key role in ensuring reliability, performance, and scalability across a vast, distributed system used by millions daily. This position is central to maintaining operational excellence while advancing automation, observability, and resilience.

What You’ll Do

  • Manage and optimize production systems through deployment, configuration, and ongoing maintenance using modern DevOps practices
  • Design and implement automation for provisioning, scaling, and monitoring services using tools like Puppet and Kubernetes
  • Collaborate with engineering teams to shape scalable architectures and guide best practices in system design
  • Respond to incidents as part of a rotating on-call schedule, leading diagnosis, resolution, and post-mortem analysis to strengthen system resilience
  • Diagnose complex issues across layers—from network protocols to application performance—using deep knowledge of TCP/IP, HTTP, TLS, and DNS
  • Contribute to a culture of continuous improvement by identifying inefficiencies and driving automation initiatives
  • Mentor team members and share expertise across a globally distributed, asynchronous work environment
  • Occasionally travel 1–2 times per year for team gatherings and in-person collaboration

What We’re Looking For

  • At least six years of experience in site reliability, systems engineering, or DevOps roles within large-scale environments
  • Strong scripting ability in Python, Bash, or similar, with hands-on experience in configuration management (especially Puppet)
  • Proven skill in Linux system administration, particularly on Debian-based systems, including package management and kernel-level troubleshooting
  • Deep understanding of distributed systems, caching architectures, and performance optimization
  • Experience with incident response, root cause analysis, and implementing preventive measures
  • Excellent written and verbal communication skills in English, with the ability to work independently across time zones

Nice to Have

  • Background in tuning Linux kernels for high-throughput services
  • Familiarity with caching proxies such as Varnish, Nginx, or Envoy
  • Experience with monitoring and alerting stacks like Prometheus and Grafana
  • Contributions to open-source projects or active participation in developer communities
  • Knowledge of PHP, HHVM, Redis, or MediaWiki ecosystems
  • Experience defining and managing service-level objectives (SLOs) across teams

Our Environment

We operate as a remote-first organization with team members across more than 40 countries. All code, configuration, and documentation are publicly accessible, reflecting our commitment to open-source principles. Our culture values diversity, transparency, and continuous learning. We prioritize equitable compensation, inclusive hiring, and accessibility for all applicants and employees.

Required Skills
PythonGoBashRubyPuppetAnsibleKubernetesLinuxDebianTCP/IPHTTPTLSDNSDistributed CachingScripting PythonGoBashRubyPuppetAnsibleKubernetesLinuxDebianTCP/IPHTTPTLSDNSDistributed CachingScripting
Want to work from Thailand?

Join a remote network built for tech talent

Iglu gives you real employment in Southeast Asia — visa, work permit, and projects included. Pick what you work on, earn performance-based pay, and live where you want.

Legal employment in Thailand & Vietnam
Choose your own projects
Performance-based revenue sharing
Relocation support available
Join Iglu
200+ professionals worldwide
About company
Wikimedia Foundation
The Wikimedia Foundation is the nonprofit organization that operates Wikipedia and the other Wikimedia free knowledge projects. Our vision is a world in which every single human can freely share in the sum of all knowledge.
All jobs at Wikimedia Foundation Visit website
Job Details
Category infrastructure
Posted a month ago