Shape the future of voice AI by building and refining production-grade Text-to-Speech systems that deliver natural, expressive speech in real time. You'll take ownership of deploying and optimizing large-scale neural models, ensuring they run efficiently across distributed environments while maintaining high audio quality and low latency.
What You’ll Do
- Deploy and fine-tune TTS models in both cloud and on-prem setups, focusing on scalability and reliability under heavy load.
- Apply advanced post-training methods—including DPO, GRPO, and RLHF—to improve model behavior and output quality during inference.
- Optimize performance through quantization, kernel tuning, and smart batching strategies tailored for GPU-accelerated systems.
- Collaborate with research and engineering teams to integrate new capabilities, run A/B tests, and iterate on live models.
- Ensure system resilience and high availability for multi-speaker, expressive voice generation across enterprise use cases.
- Document and share best practices for inference efficiency, monitoring, and long-term maintainability.
What We’re Looking For
- Proven experience deploying neural TTS models in production environments at scale.
- Strong grasp of real-time audio processing constraints and low-latency system design.
- Familiarity with GPU acceleration, distributed computing, and infrastructure that supports high-throughput inference.
- Skill in diagnosing and resolving performance bottlenecks, quality regressions, and system failures in live voice pipelines.
- Adaptability to thrive in a fast-moving startup culture with end-to-end ownership of critical systems.
Nice to Have
- Contributions to open-source TTS or audio processing frameworks.
- Background in telephony, live communication platforms, or enterprise voice applications.
Compensation & Environment
Base salary ranges from $160,000 to $250,000, paired with meaningful equity in a rapidly growing AI company. Benefits include comprehensive healthcare, dental, and vision coverage. Work from our modern San Francisco office with panoramic rooftop views or remotely within the U.S. You’ll have access to all necessary tools and infrastructure to excel in your role.
The team operates with a focus on technical excellence, innovation, and real-world impact—driving the next generation of human-like voice agents backed by leading Silicon Valley investors.