Role Overview
You'll play a key role in maintaining and evolving a high-availability infrastructure environment. Your focus will be on ensuring system reliability, scalability, and operational excellence across distributed Unix-based systems.
Key Responsibilities
- Operate and support large-scale Unix-based distributed systems
- Maintain and optimize on-premises Kubernetes clusters and container orchestration platforms
- Design, implement, and refine CI/CD pipelines to support fast, reliable software delivery
- Enhance DevOps practices across teams through automation and feedback loops
- Manage observability systems including monitoring, alerting, logging, and tracing
- Lead incident diagnosis and resolution, contributing to system stability and uptime
- Collaborate with engineering teams on infrastructure-wide technical initiatives
Required Skills
- Proven background in Unix/Linux system administration and operations
- Direct experience managing Kubernetes and Docker in on-prem environments
- Familiarity with CI/CD tools such as Jenkins or equivalent
- Proficiency with automation frameworks like Ansible
- Working knowledge of monitoring and logging stacks including Prometheus, Grafana, and the Elastic ecosystem
- Solid understanding of networking protocols: TCP/IP, HTTP, DNS, and LDAP
- Experience with databases including MySQL, PostgreSQL, MongoDB, and Redis
- Scripting ability in Bash, Python, or similar languages
- Comfort working extensively with open-source technologies
- Version control using Git
Our Values
We believe technology should serve people. Our culture emphasizes inclusion, ethical innovation, and human-centered design. We promote based on merit, welcome diverse perspectives, and actively support accessibility and anti-discrimination practices. Everyone, including people with disabilities, is encouraged to apply.


