Boston, Massachusetts Hybrid

absentia labs. is hiring a Senior Data Engineer

About the Role

Shape the foundation of data intelligence in biomedicine by leading the design and implementation of robust, scalable data systems. As a Senior Data Engineer, you will own the architecture of end-to-end pipelines that transform fragmented, heterogeneous scientific data—spanning genomics, chemical assays, clinical findings, and more—into structured, reliable inputs for machine learning and research.

What You’ll Do

Design and evolve schema-first data models that unify noisy, semi-structured sources into coherent, versioned, and interoperable datasets.
Build and maintain cloud-native infrastructure across storage, processing, and streaming layers, prioritizing scalability, correctness, and operational resilience.
Develop pipelines supporting both batch and real-time data access, tailored to the needs of ML training, evaluation, and inference workflows.
Define and enforce standards for data quality, lineage, validation, and provenance to support scientific reproducibility.
Collaborate closely with ML engineers, scientists, and product leads to translate research questions into durable, reusable data abstractions.
Provide technical leadership through architecture reviews, mentorship, and cross-functional guidance, balancing performance, cost, and long-term maintainability.
Proactively identify and mitigate systemic risks in data integrity, scalability bottlenecks, and operational complexity.

What We’re Looking For

5+ years of experience building and operating production data systems, with clear ownership of platform-level decisions.
Strong command of Python and modern data engineering practices, with experience in distributed processing frameworks such as Spark, Beam, or Ray.
Deep familiarity with cloud platforms (AWS, GCP, or Azure), including storage, compute, networking, and security primitives.
Proven track record designing large-scale pipelines that support ML workloads, from feature generation to reproducible training environments.
Experience with orchestration tools like Airflow or Dagster, and streaming platforms such as Kafka, Pub/Sub, or Kinesis.
Ability to make sound architectural trade-offs and communicate them effectively across technical and scientific domains.

Nice to Have

Background working with biomedical or life science data—including omics, molecular representations, toxicology, or clinical datasets.
Experience with ontology-driven modeling or schema evolution in scientific contexts.
Proficiency in infrastructure-as-code tools like Terraform, and systems such as Docker and Kubernetes.
Experience in fast-moving, research-intensive, or early-stage environments.
Contributions to open-source projects or technical publications.

Environment & Culture

This role offers substantial autonomy and ownership in a technically deep, low-ego environment. You’ll work in a culture centered on learning, precision, and long-term thinking. The team supports flexible remote or hybrid arrangements, with a focus on sustainable, impactful engineering. You’ll help build the data backbone of an AI-powered biomedical platform, directly influencing how raw scientific data becomes machine-understandable knowledge.

Compensation includes competitive pay, meaningful equity participation, and benefits aligned with long-term contribution.

Required Skills

PythonAWSGCPAzureSparkBeamRayAirflowDagsterKafkadata pipelinescloud platformsdistributed processingdata engineeringML infrastructure PythonAWSGCPAzureSparkBeamRayAirflowDagsterKafkaData PipelinesCloud PlatformsDistributed Data ProcessingOrchestration SystemsML Infrastructure

Landing international contracts?

Invoice globally with an EU company

GloPay creates an Estonian partnership for you automatically. Your clients get proper invoices, you keep 95% of payments. Setup takes 5 minutes, works in 100+ currencies.

EU-registered company for compliance

Multi-currency invoicing & payments

Expense tracking & tax reports

Money in your bank in 1 business day

Start invoicing free

5% per invoice • No subscriptions

About company

Pioneering AI research laboratory advancing artificial intelligence, machine learning, and computational biology. Focused on the future of drug discovery and development through cutting-edge innovation in silicon-based life sciences.

Building the AI stack for molecular engineering and design to learn the hidden rules behind successful drugs.

All jobs at absentia labs. Visit website

Job Details

Category data

Posted a month ago