Remote (Global)

EverAI is hiring a Mid/Senior LLM Engineer (Remote - Worldwide)

About the Role

As a Mid/Senior LLM Engineer at EverAI, you will be at the forefront of developing AI companionship technology that serves 30 million users and processes 5 million messages daily. You will fine-tune and optimize large language models to scale globally while maintaining personalized interactions.

What You'll Do

  • Interact with stakeholders including Co-founders, Web Engineers, and DevOps Engineers to bring projects to life.
  • Oversee the creation and optimization of algorithms for LLM behavior adjustments via fine-tuning and prompt engineering.
  • Develop features to improve product richness, such as multi-character chats and gamification.
  • Collaborate with team members managing other modalities like audio, image, and video.
  • Adapt and fine-tune base models for multilingual support.
  • Manage the creation and maintenance of diverse datasets critical for training and improving LLM performance.
  • Assess and determine the best technological approaches, selecting between classifiers, fine-tuning, and other methods.

What We're Looking For

  • 5+ years building production-grade, modular, and maintainable Python codebases.
  • Deep expertise in LLM architecture, including transformers, attention mechanisms, positional encodings, samplers, tokenizers, and post-training.
  • Expert-level experience with inference optimization at scale using vLLM or TensorRT-LLM, and a proven record of reducing latency and memory via quantization or distillation.
  • Hands-on experience with distributed training using FSDP, DeepSpeed, or accelerate on multi-GPU/multi-node setups, including mixed-precision training and gradient checkpointing.
  • Skilled at performance profiling and optimization, identifying compute or memory bottlenecks across CPU/GPU pipelines.

Nice to Have

  • Strong concurrency and runtime engineering skills with asyncio or multiprocessing.
  • Practical low-level systems experience with CUDA or Triton, including writing or debugging custom kernels.
  • Contributions to open-source LLM tooling such as vLLM, Hugging Face Transformers, or Triton.
  • Experience building or maintaining latency-critical, multi-user LLM services like RAG, streaming, agents, or chatbots.
  • Exposure to specialized generation use cases like multi-turn instruction tuning or non-English quality alignment.

Technical Stack

  • Python, vLLM, TensorRT-LLM, CUDA, Triton, FSDP, DeepSpeed, accelerate
  • GPT-4, Mistral, Hugging Face

Team & Environment

You will join a team of 55+ people, interacting directly with stakeholders including Co-founders, Web Engineers, and DevOps Engineers.

Benefits & Compensation

  • 4 weeks of PTO.
  • Annual company gathering.
  • A wellbeing budget of up to $200.
  • Learning budget.
  • Company laptop.
  • Access to GPT-4, Mistral, and a Hugging Face Pro plan.

Work Mode

This role is fully remote and open to candidates worldwide.

EverAI is an equal opportunity employer.

Required Skills
PythonvLLMTensorRT-LLMCUDATritonFSDPDeepSpeedaccelerateGPT-4MistralLLMNLPDistributed TrainingModel Optimization
Invoicing holding you back?

Focus on work, not paperwork

Stop worrying about invoicing, taxes, and compliance. Glopay handles the business setup, you handle the client work. Get paid faster and look professional.

Auto-generated compliant invoices
Built-in expense management
Income reports for tax season
95% of earnings stay with you
Try Glopay free
No credit card needed
About company
EverAI
EverAI builds the future of AI companionship through the world’s largest AI companionship platform, governed by its proprietary AI moderation system, EverGuard. The company focuses on safe, ethical, and human-first AI experiences and has reached 50 million users in two years.
All jobs at EverAI Visit website
Job Details
Category data
Posted 9 months ago