Responsibilities
- Manage and maintain core data systems, including event streaming, relational databases, and batch processing frameworks.
- Build robust, observable, and scalable data architectures enabling both real-time and batch data workflows.
- Create automated systems and policies for data governance, retention, compliance, and cross-service consistency.
- Collaborate with application, platform, and site reliability teams to enhance data reliability, access, and recovery.
- Define standards for monitoring, alerting, and capacity planning across data infrastructure components.
- Advance operational excellence by increasing system resilience and reducing manual effort through automation.
- Optimize and refine data pipelines that serve analytics, identity validation, and machine learning use cases.
- Assess, deploy, and manage event-driven and batch data technologies such as Kafka, Google Pub/Sub, Dataflow, or Temporal.
- Lead responses to production incidents, conduct root cause analyses, and contribute to postmortem reviews.
- Guide engineering teams and promote a culture centered on system reliability and operational discipline.
- Design and implement data lake architecture, including storage layout, partitioning, lifecycle policies, and tiering for mixed workloads.
Work Arrangement
On-site — Mountain View, CA