About the Role
The role involves building and managing the infrastructure that supports machine learning workflows, from training to deployment and monitoring, ensuring high availability and performance.
Responsibilities
- Design and implement CI/CD pipelines for machine learning models
- Automate deployment processes for model serving environments
- Monitor system performance and model behavior in production
- Ensure reproducibility of machine learning experiments
- Maintain version control for models, data, and code
- Optimize infrastructure for cost and efficiency
- Collaborate with data scientists to productionize models
- Troubleshoot issues in model serving pipelines
- Implement model monitoring and alerting systems
- Manage containerized environments using Docker and Kubernetes
- Integrate machine learning models with backend services
- Support data validation and preprocessing workflows
- Enforce security and compliance standards in ML systems
- Document system architecture and operational procedures
- Scale infrastructure to handle increasing model loads
- Evaluate and adopt new MLOps tools and frameworks
- Ensure model rollback capabilities in case of failure
- Work with cloud platforms for deployment and storage
- Improve model latency and throughput
- Support A/B testing and canary deployments
- Maintain logging and tracing for ML pipelines
- Coordinate with DevOps teams on infrastructure needs
- Implement automated testing for model quality
- Manage secrets and access controls in ML environments
- Assist in defining model lifecycle policies
Nice to Have
- Master’s degree in computer science or related field
- Experience with large-scale ML systems
- Contributions to open-source MLOps projects
- Knowledge of feature store implementations
- Experience with model registry tools
- Familiarity with serverless architectures
- Background in data engineering
- Experience with real-time inference systems
- Understanding of data drift detection
- Exposure to regulatory environments in AI
- Knowledge of model explainability tools
- Experience with multi-cloud deployments
- Strong understanding of distributed computing
- Prior work in fintech or healthcare AI
- Leadership experience in technical projects
Compensation
Competitive salary with performance-based incentives
Work Arrangement
Hybrid remote setup with flexible hours
Team
Collaborative team focused on scalable machine learning systems
About the Role
This position plays a key role in bridging data science and engineering by creating robust systems that enable reliable model deployment and operation.
Technology Stack
The team uses Python, Kubernetes, Docker, Terraform, Prometheus, and cloud-native services to manage machine learning workflows at scale.
Growth Opportunities
Engineers are encouraged to lead initiatives, mentor peers, and contribute to architectural decisions as the platform evolves.


