global Remote (Global)

LILT is hiring an AI Benchmark Engineer - Native Language Specialist | Japanese

LILT is seeking an experienced native-speaking software engineer to join our efforts in evaluating AI capabilities. In this role, you will design, build, and validate benchmarks for large language models on multilingual software challenges, with a focus on creating high-signal tasks in your native language. AI is changing how the world communicates—LILT's mission is to make the world's information available to everyone, no matter the language they speak.

What You'll Do

  • Task Engineering: Evaluate and design tests for coding agents.
  • Asset Creation: Build realistic task environments using datasets and files in your native language.
  • Prompting & Translation: Identify failure points where AI does not work, specifically in your native language.
  • Implementation & Verification: Support the development of robust solutions and write highly reliable, deterministic verifier scripts.
  • Calibration & Execution: Analyze execution logs and calibrate task difficulty using standard Terminal-Bench run configurations against various model tiers.
  • Quality Assurance: Participate in a rigorous, 4-layer human quality control process alongside automated LLM-based checks.

What We're Looking For

  • 5+ years of industry experience in software engineering.
  • A proven track record at leading technology companies and/or graduation from top-tier engineering universities.
  • Native or near-native fluency in Japanese, with a deep understanding of its grammar, register, and phrasing rules.
  • High proficiency in English.
  • Strong proficiency in Python, standard shell scripting, and data processing.
  • Extensive experience with Terminal/CLI-based development workflows and a working familiarity with coding agents.
  • Deep technical understanding of multilingual text processing pitfalls, including encoding/decoding robustness, Unicode normalization, locale-dependent conventions, text I/O, toolchain interoperability, and safe string operations.

Technical Stack

  • Python
  • Shell scripting

Benefits & Compensation

  • Earn money and get paid quickly and fairly.
  • Have fun and advance human knowledge.
  • Work on diverse projects from anywhere, any time you want.
  • Build your professional network in a supportive community.

Work Mode

This is a global, distributed role.

LILT is an equal opportunity employer. We extend equal opportunity to all individuals without regard to an individual’s race, religion, color, national origin, ancestry, sex, sexual orientation, gender identity, age, physical or mental disability, medical condition, genetic characteristics, veteran or marital status, pregnancy, or any other classification protected by applicable local, state or federal laws.

Required Skills
PythonShell ScriptingMachine LearningNatural Language ProcessingData AnalysisBenchmarkingEvaluation MetricsJapanese Language ProficiencyQuality AssuranceStatistical AnalysisData CollectionData AnnotationProblem SolvingCommunication PythonShell ScriptingMachine LearningNatural Language ProcessingData AnalysisBenchmarkingEvaluation MetricsJapanese Language ProficiencyQuality AssuranceStatistical AnalysisData CollectionData AnnotationProblem SolvingCommunication
Landing international contracts?

Invoice globally with an EU company

GloPay creates an Estonian partnership for you automatically. Your clients get proper invoices, you keep 95% of payments. Setup takes 5 minutes, works in 100+ currencies.

EU-registered company for compliance
Multi-currency invoicing & payments
Expense tracking & tax reports
Money in your bank in 1 business day
Start invoicing free
5% per invoice • No subscriptions
About company
LILT
LILT builds multilingual AI and human-verified services that make the world's information available to everyone, regardless of language. The company serves Enterprises, Governments, and AI Developers worldwide.
All jobs at LILT Visit website
Job Details
Category data
Posted 2 months ago