Senior Data Engineer: Data Lake (Remote)
Role Overview
We're building and maintaining a large-scale data infrastructure that processes over 2 TB of compressed event data each day and manages more than 6 petabytes in storage. As a Senior Data Engineer on the Data Lake team, you'll play a central role in evolving this platform, ensuring data reliability, scalability, and performance for both internal teams and external clients.
Key Responsibilities
- Own and enhance the core data pipeline framework used across batch and streaming workflows
- Design and expand a comprehensive data quality system to validate internal and external data sources
- Support and scale a public-facing data ingestion service handling over 17,000 requests per second
- Ensure stability and performance of data processing jobs across distributed systems
- Act as a technical escalation point for platform users and troubleshoot critical data issues
- Participate in a shared on-call rotation to respond to platform incidents and maintain system health
Technology Environment
You'll work with modern data tools and cloud-native services including Python, Spark, Databricks, and AWS Lambda for processing, ClickHouse and Delta for storage, S3 and Kinesis for data movement, and FastAPI for service development. Observability is powered by Prometheus, OpenTelemetry, Sentry, and PagerDuty.
Work Environment
This is a fully remote role with team members distributed globally. We value flexibility, clear communication, and asynchronous collaboration.
Team Culture
Our team is guided by empathy, openness, and a drive to solve real problems. We focus on metrics that reflect impact, support continuous learning, and aim to improve the experience for both customers and colleagues. We believe in empowering individuals to lead and contribute meaningfully to the systems they own.
