We are seeking a Senior Machine Learning Engineer to lead the development of scalable deep learning infrastructure within our Computational Science division. In this role, you will design, implement, and refine distributed training pipelines that process large-scale genomic datasets, enabling breakthroughs in early cancer detection through artificial intelligence.
Key Responsibilities
- Develop and enhance deep learning workflows on distributed computing platforms to improve training speed, data throughput, and model reliability.
- Partner with machine learning scientists and software engineers to align pipeline architecture with scientific objectives and production requirements.
- Continuously profile and optimize model training systems for performance, scalability, and reproducibility across cloud environments.
- Implement efficient data handling strategies for high-volume datasets using modern storage formats and processing frameworks.
- Integrate monitoring, caching, and debugging tools to strengthen pipeline resilience and reduce iteration time.
- Document best practices and serve as a technical liaison between research and engineering teams to promote knowledge sharing and system transparency.
Qualifications
Applicants should hold an MS or equivalent in a quantitative field such as Computer Science, Statistics, or Engineering, with at least five years of industry experience building production-grade ML systems.
- Proficiency in Python or similar programming languages and hands-on experience with frameworks including PyTorch, TensorFlow, Jax, or Scikit-learn.
- Strong foundation in distributed computing platforms like Ray or DeepSpeed, and familiarity with tools such as MLflow, Wandb, or TensorBoard.
- Experience deploying ML pipelines on AWS, Google Cloud, or Azure, with knowledge of containerization (Docker) and orchestration (Kubernetes).
- Proven ability to manage large datasets using Parquet, HDFS, S3, and processing libraries such as PyArrow or Spark.
- Skilled in version control (Git) and CI/CD practices to ensure code quality and deployment reliability.
- Ability to bridge scientific and engineering domains with clear communication and collaborative problem-solving.
Preferred Experience
- Working with genomics or biological data at scale.
- Handling multimodal datasets combining sequences, text, and images.
- GPU programming using CUDA, Triton, or XLA.
- Implementing infrastructure-as-code and MLOps best practices.
- Contributions to open-source machine learning projects.
Compensation & Benefits
This position offers a salary range of $161,925 to $227,325, eligibility for equity, cash bonuses, and a comprehensive package including medical, financial, and family leave benefits. The role operates in a hybrid model, requiring 2–3 days per week on-site in Brisbane, California.
Our Commitment
We are an equal-opportunity employer dedicated to fostering a diverse, inclusive workplace. We do not discriminate based on race, religion, age, gender identity, sexual orientation, disability, veteran status, or any protected status under applicable law.
