LanceDB is hiring an Open Source Engineer

About the Role

LanceDB is hiring an Open Source Engineer to advance high-performance multimodal databases. You will leverage Java/Scala and Rust to expand the reach of Lance and LanceDB within the broader data infrastructure ecosystem.

What You'll Do

  • Drive OSS community efforts to integrate Lance format into Spark, Hive Metadata Store, Presto, Trino, Ray, and other data infrastructure systems.
  • Promote the Lance format at big data conferences and meetups.
  • Design and maintain efficient distributed Lance dataset operations.
  • Design efficient indices to power predicate pushdown in Spark, Ray, or Trino.
  • Work on table format, data encodings, and various aspects of the Lance format in Rust.
  • Operate on in-house data processing infrastructure.

What We're Looking For

  • At least five years of experience building high-performance databases, big data systems, or web-scale data services.
  • Experience with internals of open source big data or AI training systems, such as Hadoop, Spark, Flink, Ray, Iceberg, Delta-lake, Hudi, Clickhouse, Trino, Presto, PyTorch, or JAX.
  • Hands-on experience with high-performance computing in Java or Scala.
  • You thrive in a small, high-caliber team with autonomy, drive, and the ability to iterate fast.

Nice to Have

  • You are an open-source veteran, committer, or PMC of large open source systems in the Apache community.
  • You fearlessly challenge the status quo and dismiss mediocre engineering as unacceptable.
  • You have a proven record of driving large features in Apache projects.
  • You are familiar with Java, Rust, C++, Apache Arrow, Apache DataFusion, Apache Parquet, Apache Iceberg, and Delta Lake.

Technical Stack

  • Languages: Java, Scala, Rust, C++
  • Big Data Systems: Hadoop, Spark, Flink, Ray, Iceberg, Delta-lake, Hudi, Clickhouse, Trino, Presto
  • AI Frameworks: PyTorch, JAX
  • Apache Ecosystem: Apache Arrow, Apache DataFusion, Apache Parquet

Team & Environment

You'll work with a small, high-caliber team where autonomy, drive, and fast iteration are the standard.

Required Skills
JavaScalaRustHadoopSparkFlinkRayIcebergDelta-lakeHudiOpen SourceData EngineeringDistributed SystemsData LakeData Processing JavaScalaRustHadoopSparkFlinkRayIcebergDelta-lakeHudiOpen SourceData EngineeringDistributed SystemsData LakeData Processing
Got hired remotely?

Get paid like a professional

Remote clients expect company invoices, not personal PayPal requests. Glopay forms an EU partnership that makes you look legitimate while you stay independent.

Professional invoices with EU company details
Compliance handled automatically
Withdraw to any bank account
Income reports for easy tax filing
Create free account
Free signup • 5 min setup
About company
LanceDB
LanceDB is a developer-friendly, open-source database for multimodal AI, providing a foundation for AI applications from vector search to AI dataset exploration.
All jobs at LanceDB Visit website
Job Details
Category data
Posted 9 months ago