Shape the future of secure, resilient systems by designing and managing high-availability infrastructure that supports vital national operations. As a Senior Site Reliability Engineer, you'll ensure systems remain robust, scalable, and compliant within complex, restricted environments.
Key Responsibilities
- Architect and maintain production infrastructure with a focus on performance, security, and long-term stability.
- Respond to critical incidents with in-depth analysis, driving solutions that prevent recurrence through automation.
- Reduce operational burden by building self-healing systems, advanced monitoring, and comprehensive technical documentation.
- Enhance system resilience across multi-cloud and hybrid network environments to maintain continuous service delivery.
- Guide engineering teams by promoting SRE principles, operational excellence, and continuous learning.
Qualifications
- Active TS/SCI security clearance with Polygraph
- Proven experience with FedRAMP and DoD IL6 compliance requirements
- Bachelor’s degree in Computer Science or equivalent technical background
- Extensive knowledge of AWS services including VPCs, Transit Gateways, Route Tables, ELBs, and NACLs
- Strong automation skills using Terraform or CloudFormation
- Expertise in Linux systems administration and scripting with Go, Python, Bash, or Ruby
- Experience managing containerized workloads with Docker and Java-based platforms like Apache and Tomcat
- Firm grasp of networking protocols, IP routing, and multi-cloud architecture
Work Environment
This hybrid role is based in the Washington, D.C. area, with occasional in-person requirements for onboarding. Candidates must be able to work on-site as needed.