Germany Remote (Global)

Mindrift is hiring a Freelance AI Evaluation Engineer (Python/Full-Stack)

About the Role

You will help shape how AI coding systems are evaluated by creating meaningful programming challenges grounded in real-world development scenarios. Your work will directly influence how AI performance is measured, focusing on reasoning, implementation accuracy, and handling of complex requirements.

What You'll Do

  • Develop and refine coding tasks based on realistic production codebases, ensuring they reflect authentic development challenges
  • Write detailed functional tests that assess full behavior, including edge cases and integration points
  • Design problems that are fair but demanding—requiring AI to synthesize information across files and external sources
  • Examine AI-generated solutions to identify patterns in success and failure
  • Improve tasks based on structured feedback from expert reviewers using defined quality benchmarks

What We're Looking For

  • Computer Science or related degree
  • At least 5 years of hands-on software development with strong Python experience (pytest, async/await, subprocess, file operations)
  • Full-stack background with practical work in both React front-ends and backend systems
  • Proven experience writing tests, not just running them
  • Familiarity with Docker for running isolated evaluations
  • Working knowledge of CI/CD, particularly GitHub Actions (triggers, labels, interpreting results)
  • Functional English (B2 level or higher)

Work Environment

This is a freelance, project-based position open to candidates worldwide. You’ll have full control over your schedule as long as deadlines are met. Projects vary in scope and complexity, and compensation adjusts accordingly, with an equivalent base rate of $50 per hour.

Technology Stack

Python, pytest, async/await, subprocess, file operations, React, Docker, GitHub Actions, CI/CD

Required Skills
Pythonpytestasync/awaitsubprocessfile operationsReactDockerGitHub ActionsFull-Stack DevelopmentBack-end SystemsFunctional TestingIntegration TestingCI/CD Pythonpytestasync/awaitsubprocessfile operationsReactDockerGitHub ActionsCI/CDFull-Stack DevelopmentTest Case DesignBackend SystemsIntegration Testing
Visa expiring soon?

Extend or switch without leaving Thailand

Running out of time on your current visa? SVBL identifies your best option — extension, category switch, or long-term visa — and handles the entire process.

Visa extensions & category switches
LTR & DTV visa applications
90-day reporting managed
Overstay prevention
Check your options
Prevent overstay issues
About company
Mindrift
Mindrift connects specialists with project-based AI opportunities for leading tech companies, focused on testing, evaluating, and improving AI systems.
All jobs at Mindrift Visit website
Job Details
Category fullstack
Posted a month ago