NVIDIA is hiring a Principal Software Engineer – Large-Scale LLM Memory and Storage Systems

NVIDIA is hiring a Principal Software Engineer – Large-Scale LLM Memory and Storage Systems to define the vision and roadmap for memory management of large-scale LLM and storage systems within the Dynamo inference framework. You will design and evolve a unified memory layer spanning multiple tiers and architect deep integrations with leading LLM serving engines.

What You'll Do

  • Design and evolve a unified memory layer that spans GPU memory, pinned host memory, RDMA-accessible memory, SSD tiers, and remote storage to support large-scale LLM inference.
  • Architect and implement deep integrations with leading LLM serving engines (like vLLM, SGLang, TensorRT-LLM), focusing on KV-cache offload, reuse, and remote sharing across heterogeneous clusters.
  • Co-design interfaces and protocols enabling disaggregated prefill, peer-to-peer KV-cache sharing, and multi-tier KV-cache storage for high-throughput, low-latency inference.
  • Partner closely with GPU architecture, networking, and platform teams to exploit GPUDirect, RDMA, NVLink, and similar technologies for low-latency KV-cache access across heterogeneous accelerators.
  • Mentor engineers, set technical direction for memory and storage subsystems, and represent the team in internal reviews and external forums.

What We're Looking For

  • Masters, PhD, or equivalent experience.
  • 15+ years of experience building large-scale distributed systems, high-performance storage, or ML systems infrastructure in C/C++ and Python, with a track record of delivering production services.
  • Deep understanding of memory hierarchies (GPU HBM, host DRAM, SSD, remote storage) and experience designing systems spanning multiple tiers for performance and cost.
  • Expertise in distributed caching or key-value systems optimized for low latency and high concurrency.
  • Hands-on experience with networked I/O and RDMA/NVMe-oF/NVLink-style technologies, and familiarity with disaggregated and aggregated deployments for AI clusters.
  • Strong skills in profiling and optimizing systems across CPU, GPU, memory, and network, using metrics to drive architectural decisions and validate improvements.
  • Excellent communication skills and prior experience leading cross-functional efforts.

Nice to Have

  • Prior contributions to open-source LLM serving or systems projects focused on KV-cache optimization, compression, streaming, or reuse.
  • Experience designing unified memory or storage layers exposing a single logical KV or object model across GPU, host, SSD, and cloud tiers, especially in enterprise or hyperscale environments.
  • Publications or patents in areas like LLM systems, memory-disaggregated architectures, RDMA/NVLink-based data planes, or KV-cache systems for ML.

Technical Stack

  • Languages: Rust, Python, C/C++
  • Technologies: GPU, RDMA, NVLink, NVMe-oF
  • Frameworks: vLLM, SGLang, TensorRT-LLM

Benefits & Compensation

  • Compensation: $272,000 USD - $425,500 USD + equity.
  • Equity package.
  • Comprehensive benefits package.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Required Skills
RustPythonC/C++GPURDMANVLinkNVMe-oFvLLMSGLangTensorRT-LLMDistributed SystemsHigh-Performance ComputingMemory SystemsStorage SystemsLLM Inference RustPythonC/C++GPURDMANVLinkNVMe-oFvLLMSGLangTensorRT-LLMDistributed SystemsHigh-Performance ComputingMemory SystemsStorage SystemsLLM Inference
Want to work from Thailand?

Join a remote network built for tech talent

Iglu gives you real employment in Southeast Asia — visa, work permit, and projects included. Pick what you work on, earn performance-based pay, and live where you want.

Legal employment in Thailand & Vietnam
Choose your own projects
Performance-based revenue sharing
Relocation support available
Join Iglu
200+ professionals worldwide
About company
NVIDIA
NVIDIA builds accelerated computing platforms and AI technologies that power advancements in areas such as generative AI, data centers, robotics, and digital twins.
All jobs at NVIDIA Visit website
Job Details
Category infrastructure
Posted 4 months ago