Responsibilities

Lead the design of evaluation methodologies for AI agent integrations, establishing key metrics across quality, cost, and performance for both short-term and long-term interactions.
Create and maintain evaluation datasets, reference execution traces, and automated regression testing systems to proactively detect quality regressions before customer impact.
Ensure evaluation tools are reusable and accessible to all teams contributing agent capabilities to the platform.
Collaborate with AI engineers to enhance retrieval accuracy, tool selection precision, and efficient use of context in agent workflows.
Conduct applied research on unresolved challenges in agent behavior, including tool selection in large-scale environments, multi-turn interaction assessment, and reducing hallucinations in live data contexts.
Enable consistent evaluation across first-party and third-party agents by working closely with internal product teams to share metrics and insights.
Support bidirectional knowledge transfer between platform and product teams to improve overall agent performance.
Provide technical guidance through design reviews, cross-team initiatives, and mentorship within the Agentic Interfaces team and organization-wide.
Represent the team externally through technical publications, conference talks, and contributions to open-source agent projects.
Analyze tradeoffs between operational cost and output quality at scale to inform product and infrastructure decisions.
Ensure evaluation systems support both offline benchmarking and online monitoring of agent performance.
Develop scalable methods to assess trajectory-level agent behavior beyond single-turn responses.
Improve mechanisms for grounding agent outputs in real-time telemetry data to reduce inaccuracies.
Establish shared standards for measuring agent effectiveness across diverse use cases.
Drive adoption of evaluation best practices across engineering teams building agent-integrated products.

Compensation

Competitive salary and equity package

Work Arrangement

Hybrid or remote options available

Team

Part of the Agentic Interfaces team focused on AI-driven automation and developer tools

Responsibilities

Own the evaluation strategy for Datadog's AI agent integrations. Define the metrics — offline and online, quality and cost, single-turn and trajectory-level — that the team and the broader organization optimize against.
Build the eval datasets, golden traces, and regression harnesses that catch quality changes before they hit customers, and make those assets reusable by every team contributing tools to the platform.
Drive measurable improvements to retrieval relevance, tool-selection accuracy, and context efficiency, partnering closely with the AI engineers on the team who build the underlying platform.
Run applied research on the open problems in agent–data interaction: tool selection under large catalogs, multi-turn agent evaluation, grounding and hallucination control on live telemetry, cost/quality tradeoffs at scale.
Partner with the Bits SRE, Bits Assistant, and Bits Dev Agent teams so first-party agents benefit from the same measurement substrate as third-party integrations, and so learnings move freely in both directions.
Provide technical leadership across the Agentic Interfaces team and the broader organization through design reviews, working groups, and mentorship, and represent the team externally through talks, blog posts, and contributions to the open agent ecosystem.

Available for qualified candidates

Datadog is hiring a Staff Applied Scientist - Agentic Interfaces

Responsibilities

Compensation

Work Arrangement

Team

Responsibilities