Australia

NVIDIA is hiring a Solution Architect

About the Role

NVIDIA is hiring a Senior AI/HPC Engineer to join its Infrastructure Specialist team. In this role, you will be the technical face to the customer, interacting with partners and internal teams to analyze, define, and implement large-scale AI/HPC projects across Networking, System Design, and Automation.

What You'll Do

  • Deploy, manage, and maintain AI/HPC infrastructure in Linux-based environments.
  • Act as the domain expert for customers from planning calls through implementation.
  • Create comprehensive handover documentation and perform knowledge transfers for sophisticated systems.
  • Provide feedback to internal teams through bug reports, workarounds, and suggested improvements.

What We're Looking For

  • A BS/MS/PhD or equivalent experience in Computer Science, Engineering, Physics, Mathematics, or a related field.
  • 5+ years providing in-depth support and deployment services for hardware and software products.
  • Expert knowledge and experience with Linux System Administration, including process management, package management, kernel management, boot troubleshooting, and performance optimization.
  • Experience with cluster management technologies and schedulers such as SLURM, LSF, or UGE.
  • Scripting proficiency and strong organizational skills with the ability to prioritize tasks with limited supervision.
  • Excellent verbal and written English skills and good interpersonal skills for resolving critical customer issues.
  • Industry-standard Linux certifications.
  • Experience with advanced networking, including routing, tuning, and monitoring.

Nice to Have

  • Hands-on experience with MPI (e.g., OpenMPI, MPICH), including distributed communication programming and cluster debugging.
  • In-depth understanding of NCCL principles and expertise in collective communication optimization for NVIDIA GPU clusters.
  • Experience deploying and optimizing high-speed networks (InfiniBand/Ethernet) and understanding their impact on GPU cluster performance.
  • Familiarity with automation tools like Ansible, Salt, or Puppet for batch configuration and operational automation.
  • Knowledge and hands-on experience with Kubernetes for container orchestration, resource scheduling, and integration with HPC environments.

Technical Stack

  • Linux, SLURM, LSF, UGE
  • MPI (OpenMPI, MPICH), NCCL
  • InfiniBand, Ethernet
  • Ansible, Salt, Puppet, Kubernetes

Team & Environment

You will join NVIDIA's Infrastructure Specialist team, a diverse and supportive environment where everyone is inspired to do their best work.

NVIDIA is an equal opportunity employer and values diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.

Required Skills
LinuxSLURMLSFUGEMPINCCLInfiniBandEthernetAnsibleSaltCluster ManagementScriptingSystem Administration
Your first international client?

Don't lose them over invoicing

Clients ghost freelancers with unprofessional invoicing. Glopay gives you a real EU company partnership so they take you seriously from invoice #1.

Instant EU company partnership
Invoice builder with your branding
Automated payment reminders
Real-time payment tracking
Get EU company now
Ready in 24 hours
About company
NVIDIA
NVIDIA builds accelerated computing platforms and AI technologies that power advancements in areas such as generative AI, data centers, robotics, and digital twins.
All jobs at NVIDIA Visit website
Job Details
Department Information Technology
Category infrastructure
Posted 2 months ago