United States Remote (Global)

Ford is hiring a SRE Sr Engineer/Specialist

About the Role

As a Senior Site Reliability Engineer, you will shape the future of system reliability and observability across a global cloud infrastructure. Your work will center on building and refining scalable solutions that ensure high availability, performance, and proactive incident response across distributed services.

Key Responsibilities

  • Design, write, and deploy code that enhances system reliability, setting benchmarks for quality and maintainability.
  • Review code and production changes with actionable insights to strengthen system integrity.
  • Lead root cause analysis and resolution of complex architectural and operational issues.
  • Develop and manage AI-powered monitoring systems using Python and observability platforms to detect and prevent outages.
  • Use Terraform and Infrastructure as Code practices to automate visibility and issue detection across environments.
  • Operate and optimize workloads on Google Cloud Platform, ensuring efficient scaling, cost control, and performance.
  • Partner with development teams to embed reliability into the software lifecycle using platform engineering principles.
  • Create automated solutions for monitoring, tuning, and disaster recovery, leveraging AI to reduce toil.
  • Diagnose and resolve issues across development, testing, and production environments.
  • Conduct post-incident reviews and implement safeguards to prevent recurrence.
  • Enforce security standards across infrastructure, supporting audits and compliance efforts.
  • Contribute to capacity planning by analyzing trends and advising on resource needs.
  • Optimize system performance through profiling, tuning, and metric monitoring.
  • Maintain and test disaster recovery protocols to ensure business continuity.
  • Document system designs, analyses, and procedures, and help improve team-wide design practices.

Qualifications

A bachelor’s degree in Computer Science, Engineering, Mathematics, or equivalent experience is required. You should have at least three years in roles such as SRE, DevOps, or Software Engineering.

Essential skills include proficiency with observability tools, cloud platforms—especially GCP and Kubernetes—strong Python programming, experience with both relational and document databases, and the ability to debug, optimize, and automate systems. You must also demonstrate strong problem-solving abilities and effective communication in high-pressure settings.

Preferred qualifications include experience with agentic AI and MCP development, Terraform Provider development, and working with Dynatrace SaaS.

Technology Environment

Python, Terraform, Google Cloud Platform (GCP), Kubernetes, Dynatrace SaaS, AI-driven observability, Infrastructure as Code (IaC), relational and document databases.

Work Environment

This position is remote and supports a global team structure. You’ll have the flexibility to work from any location while contributing to a large-scale, distributed engineering ecosystem.

Required Skills
PythonTerraformKubernetesGoogle Cloud Platform (GCP)Dynatrace SaaSAI/Observability toolsMonitoringObservabilityDevOpsSRE PythonTerraformGoogle Cloud Platform (GCP)KubernetesDynatrace SaaSAIObservability toolsInfrastructure as Code (IaC)Relational databasesDocument databases
Planning long-term in Thailand?

Full relocation support, start to finish

From visa strategy to housing, banking, and schools for your family — SVBL plans and manages every detail of your move to Thailand so nothing falls through the cracks.

Complete relocation planning
Family visa & school enrollment
Banking & insurance setup
Cultural integration support
Plan your move
One partner for everything
About company
Ford
Ford Motor Company is an established global automotive manufacturer building a better world through innovative, exciting, and sustainable products and services. The company advances technologies in autonomy, electrification, and smart mobility.
All jobs at Ford Visit website
Job Details
Department Information Technology
Category infrastructure
Posted 2 months ago