United States of America Remote (Country) USD 125,000 - 180,000 Yearly

Nebius is hiring a Hardware Support Engineer

About the Role

Role Overview

We're looking for a Senior Hardware Support Engineer to safeguard the reliability of our production hardware infrastructure at scale. In this role, you'll serve as a technical leader for resolving complex hardware and firmware issues, bridging operations, engineering, and vendor teams to maintain fleet stability and system uptime.

Key Responsibilities

  • Lead in-depth investigations into hardware and firmware failures, tracing issues from symptom to root cause
  • Identify systemic reliability trends by analyzing recurring errors and field data
  • Serve as the primary technical contact during critical hardware incidents affecting performance or availability
  • Collaborate with hardware vendors to expedite diagnostics, replacements, and corrective actions
  • Work closely with internal engineering groups to test and validate fixes before broad deployment
  • Conduct pre-deployment validation of server hardware and firmware updates
  • Apply structured problem-solving frameworks to manage and document incident resolution
  • Provide technical guidance to on-site teams during urgent hardware interventions
  • Enhance tools and processes for monitoring, tracking, and reporting hardware failures
  • Help shape long-term strategies to improve fleet-wide hardware resilience

Required Qualifications

  • Proven experience supporting server hardware in large-scale or data center environments
  • Track record of diagnosing and resolving complex hardware and firmware issues
  • Solid knowledge of server subsystems including CPU, memory, storage, power, networking, and BMC
  • Experience engaging with hardware vendors to resolve production escalations
  • Proficiency in structured troubleshooting using formal incident or problem management practices
  • Strong analytical skills with the ability to interpret system logs, telemetry, and error patterns
  • Experience coordinating with field operations teams during hardware events
  • Ability to manage multiple high-priority investigations simultaneously
  • Clear and effective communication skills in cross-team settings

Preferred Qualifications

  • Background in GPU-intensive, AI, or high-performance computing environments
  • Experience managing firmware lifecycle and validating large-scale rollouts
  • Familiarity with Linux-based infrastructure and operational tooling
  • History of improving hardware reliability metrics across large server fleets

Benefits

  • Comprehensive medical, dental, and vision insurance
  • 401(k) plan with company match
  • Flexible paid time off policy
  • Paid parental leave
  • Support for professional development and learning

Compensation

Annual salary range: $125,000 – $180,000. Additional compensation includes an annual performance-based bonus.

Work Mode

This position is remote within the United States. Occasional travel may be required for on-site coordination or critical hardware events. The role includes participation in incident escalation coverage for production-impacting issues.

Company Culture

Our environment encourages initiative and innovation, with a strong emphasis on collaboration and technical excellence. We support professional growth and work at the forefront of AI cloud infrastructure, building systems that power next-generation applications.

Required Skills
server hardwaredata center operationsroot cause analysishardware failure diagnosisfirmware debuggingCPU architecturememory subsystemsstorage systemsnetworking hardwarepower systemsBMCincident managementITILvendor coordinationproblem-solving server hardwaredata center operationsroot cause analysishardware failure diagnosisfirmware debuggingCPU architecturememory subsystemsstorage systemsnetworking hardwarepower systemsBMCincident managementITILvendor coordinationproblem-solving
Looking for a remote dev community?

200+ professionals, 37 countries, one network

Working remotely doesn't mean working alone. Iglu connects you with developers, designers, and digital experts worldwide. Collaborate, learn, and grow together.

Global professional network
Knowledge sharing & collaboration
Regular community events
Cross-project opportunities
Join the community
37 countries represented
About company
Nebius
Nebius is leading a new era in cloud computing to serve the global AI economy. It creates tools and resources for customers to solve real-world challenges without massive infrastructure costs.
All jobs at Nebius Visit website
Job Details
Department Engineering
Category infrastructure
Posted 2 months ago