Responsibilities
- Develop and implement strategies leveraging FOSS, COTS, and GOTS technologies to enhance the reliability, resiliency, and scalability of the platform.
- Conduct lab-based SWIL and HWIL testing to validate system performance and ensure components meet scalability and operational requirements.
- Identify performance bottlenecks, analyze usage patterns, and recommend improvements to enhance system efficiency and scalability.
- Identify, diagnose, and address recurring incidents, performing root cause analysis, and implementing preventative measures.
- Produce and brief comprehensive resiliency and scalability assessments, providing insights into system behavior under load, failure modes, and recovery conditions.
- Translate findings into inputs for SLAs and KPPs to support informed decision-making by leadership.
- Prepare, maintain, and execute a System Engineering Plan (SEP) for managing all systems architecture and system engineering related aspects of the program.
- Conduct systems engineering activities required to specify, build, and maintain system engineering designs for the System.
- Design, prepare, and document systems engineering and cybersecurity artifacts for the System.
- Support the Government in recommending and conducting enterprise system architecture activities.
- Define, document, maintain, and promulgate APIs and technical standards for using and interoperating within and outside the System.
- Design, engineer, integrate, and continuously improve the underlying infrastructure of the System.
- Identify, prepare, track, secure, and integrate government, commercial, and open-source tools and services into the System.
- Design, architect, engineer, and continuously improve the UI and UX components of the Platform.
- Perform site reliability engineering to build and maintain a reliable, scalable, and efficient System by applying software engineering principles to operational tasks.
Requirements
- Active Top Secret (TS) clearance with SCI eligibility.
- Bachelor’s degree in Computer Science, Engineering, Information Systems, or related technical discipline and 8–12 years of relevant experience OR Master’s degree in a related field and 6–10 years of relevant experience.
- Experience engineering and supporting enterprise cloud environments (AWS, Azure, or GCP).
- Experience implementing monitoring, observability, and performance management solutions.
- Experience conducting root cause analysis and implementing systemic reliability improvements.
- Experience integrating reliability engineering practices into DevSecOps pipelines.
- Experience operating within SAFe or large-scale Agile frameworks supporting enterprise systems.
- Experience with FOSS, COTS, and GOTS technologies.
- Proven experience in conducting SWIL and HWIL testing.
- Strong understanding of system performance analysis and optimization.
- Experience in root cause analysis and implementing preventative measures.
- Ability to produce and brief comprehensive technical assessments.
- Experience in preparing and maintaining System Engineering Plans (SEP).
- Strong documentation and communication skills.
Nice to Have
- Active TS/SCI clearance.
- Experience with DoD systems and environments.
- Familiarity with NIST security controls and Zero Trust compliance.
- Experience in defining and tracking KPIs and SLOs.
- Knowledge of AI/ML model serving and deployment.
- Experience in participating in Engineering Control Board (ECB) processes.
- Familiarity with cloud environments and DevSecOps practices.
- SAFe Agilist (SA) or related SAFe certification.
- Experience supporting multi-enclave DoD cloud environments.
- Experience implementing automated failover, redundancy, and capacity management solutions.
- Experience supporting enterprise-scale data, analytics, or AI platforms.
- Experience implementing Zero Trust-aligned resiliency patterns.
- Relevant cloud certification (AWS, Azure, or GCP).
Additional Information
- If you're looking for comfort, keep scrolling. At Leidos, we outthink, outbuild, and outpace the status quo — because the mission demands it. We're not hiring followers. We're recruiting the ones who disrupt, provoke, and refuse to fail. Step 10 is ancient history. We're already at step 30 — and moving faster than anyone else dares.
- For U.S. Positions: While subject to change based on business needs, Leidos reasonably anticipates that this job requisition will remain open for at least 3 days with an anticipated close date of no earlier than 3 days after the original posting date as listed above.
