Palo Alto, United States of America Remote (City)

Pathway is hiring a Machine Learning DevOps - Cloud and Compute Cluster - R&D Support

About the Role

Role Overview

We’re building advanced machine learning systems and need a DevOps engineer who specializes in ML infrastructure to support our research and development efforts. You will play a central role in shaping the backbone of our computational environment, ensuring that training and inference workflows run efficiently at scale.

Key Responsibilities

Design, optimize, and maintain cloud and distributed compute infrastructure tailored for machine learning workloads
Automate end-to-end ML pipelines including data processing, model training, validation, and deployment
Ensure reproducibility and traceability across experiments and model versions
Handle large-scale datasets, often in the terabyte range, with efficient storage and access patterns
Implement and refine CI/CD practices specific to machine learning systems
Monitor deployed models for performance degradation and data drift
Partner closely with machine learning engineers, software developers, and platform teams to align infrastructure with research goals

Required Qualifications

Proven experience managing cloud environments and compute clusters
Strong background in scaling infrastructure to support demanding ML workloads
Proficiency in Linux system administration
Hands-on experience supporting development, training, and production environments in the cloud
Track record of automating workflows across multiple cloud providers
Deep understanding of reliability, scalability, and automation throughout the machine learning lifecycle

Technology Environment

Our stack spans multiple cloud platforms, GPU-accelerated infrastructure, distributed computing frameworks, ML-focused CI/CD tooling, data ingestion pipelines, and model monitoring systems. The role requires fluency in Linux environments and a strong systems mindset.

Required Skills

Cloud providersGPU infrastructureDistributed computingCI/CD for MLData ingestion systemsModel monitoring toolsLinux administrationScaling infrastructuresCloud environment automationCompute cluster management Cloud providersGPU infrastructureDistributed computingCI/CD for MLData ingestion systemsModel monitoring toolsLinux administrationScaling infrastructuresCloud environment automationCompute cluster management

Got hired remotely?

Get paid like a professional

Remote clients expect company invoices, not personal PayPal requests. Glopay forms an EU partnership that makes you look legitimate while you stay independent.

Professional invoices with EU company details

Compliance handled automatically

Withdraw to any bank account

Income reports for easy tax filing

Create free account

Free signup • 5 min setup

About company

The first post-transformer frontier model that solved continual learning.

Pathway has developed BDH, a massively parallel post-Transformer reasoning architecture enabling generalization over time. Created by scientists and researchers, the company is led by CEO Zuzanna Stamirowska, CTO Jan Chorowski, and CSO Adrian Kosowski. The team has previously built AI tooling that has amassed 126k stars on GitHub.

Pathway is backed by prominent figures including Lukasz Kaiser (co-inventor of Transformers), Martin Farach-Colton (NYU), and Jacques Attali, with support from investors such as TQ, Kadmos, ID4 Ventures, RBV, Inovo, and Market One Capital.

All jobs at Pathway Visit website

Job Details

Category infrastructure

Posted 6 months ago