Lead the evolution of storage systems powering next-generation AI infrastructure. In this role, you'll define the strategic direction for storage architecture, ensuring scalability, performance, and reliability across multi-petabyte environments. Your expertise will directly influence how storage integrates with distributed GPU clusters running intensive machine learning workloads.
Key Responsibilities
- Lead vendor selection and request-for-proposal processes, using performance data and technical analysis to guide decisions
- Analyze AI and ML workload behaviors to shape storage design, tuning, and capacity planning
- Drive operational improvements and coordinate deployment strategies across engineering teams
- Collaborate with leadership to translate business requirements into technical specifications
- Oversee engineering execution by delegating complex tasks and maintaining alignment with senior stakeholders
Requirements
- Minimum of 8 years designing, deploying, and managing large-scale production storage systems
- Proven experience with enterprise storage platforms including Vast, Weka, DDN, NetApp, or equivalent
- Familiarity with file, block, and object storage architectures and use cases
- Working knowledge of NFS, SMB, and POSIX-compliant access protocols
- Hands-on experience with NVMe over Fabrics—via TCP, InfiniBand, or RoCE
- Understanding of RDMA, GPUDirect Storage, and parallel file systems for high-throughput environments
- Background in storage security, encryption, data reduction, and multi-tenancy models
- Experience with backup, disaster recovery, and data protection frameworks
- At least 5 years using Infrastructure as Code tools such as Terraform or Ansible
Preferred Qualifications
- Experience with Kubernetes storage interfaces, including CSI and COSI drivers
- Deep knowledge of storage performance analysis and optimization
- Familiarity with public cloud storage services and networking (SDN, identity, distributed systems)
- Track record deploying and managing Software Defined Storage solutions
- Implementation experience with monitoring platforms for storage and related infrastructure