Palo Alto, United States of America Remote (City)

Pathway is hiring a Machine Learning DevOps - Cloud and Compute Cluster - R&D Support

About the Role

Role Overview

We’re building advanced machine learning systems and need a DevOps engineer who specializes in ML infrastructure to support our research and development efforts. You will play a central role in shaping the backbone of our computational environment, ensuring that training and inference workflows run efficiently at scale.

Key Responsibilities

  • Design, optimize, and maintain cloud and distributed compute infrastructure tailored for machine learning workloads
  • Automate end-to-end ML pipelines including data processing, model training, validation, and deployment
  • Ensure reproducibility and traceability across experiments and model versions
  • Handle large-scale datasets, often in the terabyte range, with efficient storage and access patterns
  • Implement and refine CI/CD practices specific to machine learning systems
  • Monitor deployed models for performance degradation and data drift
  • Partner closely with machine learning engineers, software developers, and platform teams to align infrastructure with research goals

Required Qualifications

  • Proven experience managing cloud environments and compute clusters
  • Strong background in scaling infrastructure to support demanding ML workloads
  • Proficiency in Linux system administration
  • Hands-on experience supporting development, training, and production environments in the cloud
  • Track record of automating workflows across multiple cloud providers
  • Deep understanding of reliability, scalability, and automation throughout the machine learning lifecycle

Technology Environment

Our stack spans multiple cloud platforms, GPU-accelerated infrastructure, distributed computing frameworks, ML-focused CI/CD tooling, data ingestion pipelines, and model monitoring systems. The role requires fluency in Linux environments and a strong systems mindset.

Required Skills
Cloud providersGPU infrastructureDistributed computingCI/CD for MLData ingestion systemsModel monitoring toolsLinux administrationScaling infrastructuresCloud environment automationCompute cluster management Cloud providersGPU infrastructureDistributed computingCI/CD for MLData ingestion systemsModel monitoring toolsLinux administrationScaling infrastructuresCloud environment automationCompute cluster management
Got hired remotely?

Get paid like a professional

Remote clients expect company invoices, not personal PayPal requests. Glopay forms an EU partnership that makes you look legitimate while you stay independent.

Professional invoices with EU company details
Compliance handled automatically
Withdraw to any bank account
Income reports for easy tax filing
Create free account
Free signup • 5 min setup
About company
Pathway

The first post-transformer frontier model that solved continual learning.

Pathway has developed BDH, a massively parallel post-Transformer reasoning architecture enabling generalization over time. Created by scientists and researchers, the company is led by CEO Zuzanna Stamirowska, CTO Jan Chorowski, and CSO Adrian Kosowski. The team has previously built AI tooling that has amassed 126k stars on GitHub.

Pathway is backed by prominent figures including Lukasz Kaiser (co-inventor of Transformers), Martin Farach-Colton (NYU), and Jacques Attali, with support from investors such as TQ, Kadmos, ID4 Ventures, RBV, Inovo, and Market One Capital.

All jobs at Pathway Visit website
Job Details
Category infrastructure
Posted 6 months ago