Germany Remote (Global)

Mindrift is hiring a Freelance AI Evaluation Engineer (Python/Full-Stack)

About the Role

You will help shape how AI coding systems are evaluated by creating meaningful programming challenges grounded in real-world development scenarios. Your work will directly influence how AI performance is measured, focusing on reasoning, implementation accuracy, and handling of complex requirements.

What You'll Do

Develop and refine coding tasks based on realistic production codebases, ensuring they reflect authentic development challenges
Write detailed functional tests that assess full behavior, including edge cases and integration points
Design problems that are fair but demanding—requiring AI to synthesize information across files and external sources
Examine AI-generated solutions to identify patterns in success and failure
Improve tasks based on structured feedback from expert reviewers using defined quality benchmarks

What We're Looking For

Computer Science or related degree
At least 5 years of hands-on software development with strong Python experience (pytest, async/await, subprocess, file operations)
Full-stack background with practical work in both React front-ends and backend systems
Proven experience writing tests, not just running them
Familiarity with Docker for running isolated evaluations
Working knowledge of CI/CD, particularly GitHub Actions (triggers, labels, interpreting results)
Functional English (B2 level or higher)

Work Environment

This is a freelance, project-based position open to candidates worldwide. You’ll have full control over your schedule as long as deadlines are met. Projects vary in scope and complexity, and compensation adjusts accordingly, with an equivalent base rate of $50 per hour.

Technology Stack

Python, pytest, async/await, subprocess, file operations, React, Docker, GitHub Actions, CI/CD

Required Skills

Pythonpytestasync/awaitsubprocessfile operationsReactDockerGitHub ActionsFull-Stack DevelopmentBack-end SystemsFunctional TestingIntegration TestingCI/CD Pythonpytestasync/awaitsubprocessfile operationsReactDockerGitHub ActionsCI/CDFull-Stack DevelopmentTest Case DesignBackend SystemsIntegration Testing

Visa expiring soon?

Extend or switch without leaving Thailand

Running out of time on your current visa? SVBL identifies your best option — extension, category switch, or long-term visa — and handles the entire process.

Visa extensions & category switches

LTR & DTV visa applications

90-day reporting managed

Overstay prevention

Check your options

Prevent overstay issues

About company

Mindrift connects specialists with project-based AI opportunities for leading tech companies, focused on testing, evaluating, and improving AI systems.

All jobs at Mindrift Visit website

Job Details

Category fullstack

Posted a month ago