This position is no longer available
Remote (Country)

Innodata Inc was looking for a Language Data Scientist

Innodata Inc is seeking a Language Data Scientist to help customers advance their generative AI applications. In this role, you will work hands-on with multi-modal and multi-lingual datasets, collaborate cross-functionally, and leverage your experience with human and synthetic data workflows to drive innovation.

What You'll Do

  • Design and improve workflows to create data for AI/ML training and evaluation, including human annotation, data collection, and synthetic workflows.
  • Dive deep into existing workflows to gather data, make recommendations, and drive improvement through innovation and cross-functional collaboration.
  • Critically assess annotation tooling and workflows.
  • Quantitatively analyze large datasets, perform statistical analysis, calculate metrics, and make recommendations to improve accuracy and performance.
  • Work closely with client stakeholders to understand goals, gather requirements, propose solutions, and execute them.

What We're Looking For

  • Master's degree in (computational) linguistics, data science, computer science (AI/ML/NLU), quantitative social sciences or a related scientific/quantitative field; PhD strongly preferred.
  • Extensive experience working with human language data and designing human evaluation tasks, including multi-phase and complex workflows.
  • Deep understanding of language and its relationship with culture.
  • Ability to identify ambiguity and subjectivity in language.
  • Ability to work with multi-lingual and multi-modal projects.
  • Advanced knowledge of statistics, metrics (e.g., f1 score, inter-rater reliability), and data analysis methods such as sampling.
  • Experience with Natural Language Processing (NLP) techniques and tools, such as SpaCy, NLTK, or Hugging Face.
  • Proficiency in Python to handle and transform large datasets (e.g., pandas), perform quantitative analyses, and visualize data (e.g., matplotlib, seaborn).
  • Deep understanding of data pipelines to support ML and NLP workflows.
  • Knowledge of efficient data collection, transformation, and storage.
  • Knowledge of data structures, algorithms, and data engineering principles.
  • Excellent interpersonal skills for effective cross-functional stakeholder engagement.
  • Excellent problem-solving skills, with the ability to think critically and creatively to develop innovative AI solutions.
  • Ability to work independently and collaborate as part of a team.
  • Adaptable to changing technologies and methodologies.
  • Ability to translate experience, research and development information to understand client products and services.

Nice to Have

  • Conducting research to stay up-to-date with the latest advancements in generative AI, machine learning, and deep learning techniques.
  • Knowledge of optimizing existing generative AI models for improved performance, scalability, and efficiency.
  • Experience developing and maintaining ML/AI pipelines, including data preprocessing, feature extraction, model training, and evaluation.
  • Knowledge of fine-tuning pre-trained models to adapt them to specific tasks and datasets.
  • Developing clear and concise documentation to communicate complex AI concepts to both technical and non-technical stakeholders.
  • Contributing to establishing best practices and standards for generative AI development with customers and within the organization.
  • Providing technical mentorship and guidance to junior team members.
  • Understanding of techniques such as GPT, VAE, and GANs.

Technical Stack

  • Python
  • SpaCy
  • NLTK
  • Hugging Face
  • pandas
  • matplotlib
  • seaborn

Benefits & Compensation

  • Compensation: Up to $120k CAD

Work Mode

This is a local-country position. The role is remote within Canada (excluding Quebec).

Required Skills
PythonNLTKPandasData ScienceMachine LearningNLPData AnalysisStatisticsData Visualization
About company
Innodata Inc
Innodata (NASDAQ: INOD) is a leading data engineering company and AI technology solutions provider, serving over 2,000 customers including 4 out of 5 of the world’s biggest technology companies. By combining advanced ML/AI technologies, a global workforce of subject matter experts, and a high-security infrastructure, they provide clean and optimized digital data solutions across multiple industries.
All jobs at Innodata Inc Visit website
Job Details
Category data
Posted 4 months ago