What You'll Do

Design representative evaluation datasets by refining sampling methods and query diversity to reflect actual user behavior. Develop and manage large-scale evaluation frameworks that assess AI performance across thousands of real queries, ensuring reliable measurement of response quality.

Implement LLM-driven evaluation models to score key attributes such as factual accuracy, completeness, and clarity, calibrating them with human judgment data. Conduct pre-deployment assessments of new models and product updates, establishing quality thresholds that guide release decisions.

Construct observability systems that enhance traceability and data visibility for AI agents, enabling deeper analysis of behavior patterns. Use evaluation outcomes and user feedback to drive iterative improvements, including automated prompt refinement and model tuning.

Partner with engineering teams across the organization to integrate evaluation practices into development workflows, ensuring quality remains central to product evolution.

What We're Looking For

Minimum of two years of software engineering experience with strong programming skills
Proficiency in Go and Python, with experience in distributed data processing systems
Background in LLM evaluation, reinforcement learning from human feedback, or natural language processing
Ability to reason critically about how offline metrics correlate with real user outcomes
Commitment to product quality and experience working in collaborative, cross-functional environments
Strong ownership mindset with a focus on delivering measurable impact

Technology Environment

Go, Python, LLM evaluation frameworks, reinforcement learning from human feedback, natural language processing, distributed data pipelines

Benefits

Competitive salary and potential equity or variable compensation
Medical, vision, and dental insurance
Generous paid time off
401k contribution options
Stipends for home office setup, education, and wellness
Daily healthy lunches and regular team events

Work Environment

This role operates in a hybrid model, requiring 3–4 days per week in one of our offices in the SF Bay Area. We foster a collaborative, inclusive culture that values diverse perspectives and customer-centric innovation.

Our Commitment

We are dedicated to building a diverse and equitable workplace. We do not discriminate on the basis of race, gender, age, religion, sexual orientation, disability, or any other protected status. All are welcome to contribute to our mission of creating intelligent, reliable AI systems.

Glean is hiring a Machine Learning Engineer

What You'll Do

What We're Looking For

Technology Environment

Benefits

Work Environment

Our Commitment

Similar Jobs

Staff Machine Learning Engineer

Data Scientist (Personalization)

Data Scientist (KYB)

Embedded AI Engineer (Hybrid)

Senior Computer Vision Engineer

Machine Learning Engineer

Related Articles

AI-Driven Graduate Jobs at Infosys: Future Tech Roles

Become an AI Developer: Your Career Guide

Spotify Shifts Dev Work to Honk AI