About the Role
The engineer will bridge machine learning and operations by building reliable pipelines, automating workflows, and ensuring model performance in production environments.
Responsibilities
- Design and manage CI/CD pipelines for machine learning models
- Develop infrastructure for automated model training and deployment
- Monitor system performance and model behavior in production
- Collaborate with data scientists to operationalize ML workflows
- Implement version control for models, data, and code
- Optimize resource allocation for training and inference workloads
- Ensure reproducibility across development and production environments
- Integrate logging, monitoring, and alerting for ML systems
- Support secure and compliant deployment practices
- Troubleshoot issues in distributed ML environments
- Maintain documentation for ML pipelines and systems
- Scale infrastructure to handle growing model demands
- Work with containerization and orchestration tools
- Apply infrastructure-as-code principles
- Improve model latency and throughput
- Enforce access controls and authentication for ML services
- Evaluate new tools and frameworks for MLOps efficiency
- Contribute to incident response for production outages
- Ensure data quality and consistency in training pipelines
- Automate testing for model performance and data drift
Nice to Have
- Experience with MLflow or similar MLOps platforms
- Knowledge of TensorFlow or PyTorch deployment
- Familiarity with feature stores
- Experience in regulated industries
- Contributions to open-source MLOps projects
- Advanced degree in computer science or related field
Compensation
Competitive salary with performance-based incentives
Work Arrangement
Fully remote with flexible hours
Team
Collaborative engineering team focused on machine learning systems
Tech Stack
- Primary languages: Python, Bash
- Cloud: AWS and GCP
- Containerization: Docker, Kubernetes
- CI/CD: GitLab CI
- Infrastructure as Code: Terraform
- Monitoring: Prometheus, Grafana
- Model Serving: TensorFlow Serving, TorchServe
- Data Pipeline Tools: Apache Airflow
Culture & Values
- Emphasis on transparency and open communication
- Commitment to continuous learning and improvement
- Support for remote collaboration and asynchronous workflows
- Focus on work-life balance and sustainable pace
- Inclusive environment that values diverse perspectives
Available for qualified candidates


