NVIDIA is hiring a Senior Systems Engineer, Artificial Intelligence Operations

About the Role

NVIDIA is looking for a Senior Systems Engineer, Artificial Intelligence Operations to drive improvements in AI cluster reliability and performance. You will be at the intersection of customer needs and technical innovation, working in a diverse, supportive environment where everyone is inspired to do their best work.

What You'll Do

  • Bring together and understand internal and external customer requirements to improve AI cluster resiliency and design AIOps-based solutions.
  • Develop automated workflows for issue detection and root cause analysis and closely collaborate with operators to debug sophisticated, full-stack AI cluster problems.
  • Deliver compelling technical presentations and lead hands-on demos or training.
  • Handle evaluation deployments (POC/POV) and ensure smooth, reliable installations by staying engaged throughout the customer journey.

What We're Looking For

  • Bachelor of Science or equivalent experience.
  • 12+ years of networking experience in enterprise or service provider environments, with strong hands-on expertise in routing and switching.
  • Proficient in scripting and automation using Python or similar languages, with strong Linux expertise.
  • Proven experience working directly with customers to resolve issues and ensure success in Systems Engineer or SRE roles.
  • Exceptional oral, written, and presentation skills for clearly communicating complex technical topics.
  • Demonstrated ability to collaborate effectively across teams, partnering with operations, engineering, and product development.

Nice to Have

  • Experience with data center infrastructure and cloud architectures.
  • Background in network performance monitoring or observability.
  • Previous experience working at a technological start-up.

Technical Stack

  • Python
  • Linux

NVIDIA is an equal opportunity employer.

Required Skills
PythonLinuxAI InfrastructureSystems EngineeringCloud PlatformsNetworkingAutomationMonitoringPerformance TuningDistributed SystemsContainerizationScripting PythonLinuxAI InfrastructureSystems EngineeringCloud PlatformsNetworkingAutomationMonitoringPerformance TuningDistributed SystemsContainerizationScripting
Ready to relocate and code from paradise?

Thailand or Vietnam — your office, your rules

Iglu offers relocation to Bangkok, Chiang Mai, Ho Chi Minh City, or Hong Kong. Full employment, legal setup, and a community of 200+ digital professionals.

Relocation to 5 countries
Full legal work setup
Developer community access
Work-life balance culture
Explore locations
Relocation support included
About company
NVIDIA
NVIDIA builds accelerated computing platforms and AI technologies that power advancements in areas such as generative AI, data centers, robotics, and digital twins.
All jobs at NVIDIA Visit website
Job Details
Category infrastructure
Posted 6 months ago