Germany Remote (Global)

Albatross is hiring a Site Reliability Engineer

Join Albatross as a Site Reliability Engineer to take ownership of the reliability and observability of our platform. This is a hands-on leadership role where you will design, build, and maintain our observability stack, lead incident response, oversee releases, and establish processes and standards.

What You'll Do

  • Own and evolve our observability stack, including Prometheus, Grafana, Loki, and Jaeger, along with dashboards, alerts, and SLOs.
  • Instrument services for meaningful metrics and tracing, reducing noise and improving signal.
  • Lead incident response and establish blameless postmortems, runbooks, and automated remediation.
  • Define, track, and improve SLIs and SLOs to proactively reduce reliability risk.
  • Own the release process end-to-end, improving deployment speed, safety, and recovery.
  • Implement progressive rollouts, feature flags, and rollback strategies.
  • Embed observability into the development lifecycle in close collaboration with engineering teams.
  • Maintain and evolve our Kubernetes-based platform, adopting new tools when they add real value.

What We're Looking For

  • 5–7+ years in SRE, platform engineering, DevOps, or a similar hands-on role.
  • Strong production experience with Kubernetes and modern observability stacks like Prometheus, Grafana, Loki, and Jaeger/OpenTelemetry.
  • Proven track record leading incident response and building monitoring systems teams actually use.
  • Deep distributed systems knowledge and production debugging experience.
  • A pragmatic approach to tooling and alerting that teams trust.
  • Clear communicator across engineering, product, and leadership.
  • A STEM degree (Computer Science, Engineering, Mathematics, or similar).

Nice to Have

  • Contributions to open-source observability projects.
  • A background in high-scale or high-availability environments.

Technical Stack

  • Prometheus
  • Grafana
  • Loki
  • Jaeger
  • OpenTelemetry
  • Kubernetes

Benefits & Compensation

  • Remote-first, async-friendly culture.
  • Ownership and autonomy; you'll shape how we do reliability.
  • A team that cares about building things right.

Work Mode

This is a global, remote position open to candidates based in Europe.

Required Skills
PrometheusGrafanaLokiJaegerOpenTelemetryKubernetesSite Reliability EngineeringMonitoringObservabilityDistributed TracingInfrastructure as CodeCloud PlatformsIncident ManagementAutomation PrometheusGrafanaLokiJaegerOpenTelemetryKubernetesSite Reliability EngineeringMonitoringObservabilityDistributed TracingInfrastructure as CodeCloud PlatformsIncident ManagementAutomation
Earn more as a remote developer

Performance pay that rewards your skills

Iglu's revenue-sharing model means top performers earn significantly more than traditional salaries. Choose your projects, deliver great work, and see it reflected in your pay.

Revenue-sharing compensation
Project choice & autonomy
International client base
Career growth support
Check compensation
Top earners exceed market rate
About company
Albatross
We’re building the second pillar of AI: a perception layer that understands how users actually experience content, in real time. Trained on live user interactions, Albatross learns and reasons on the fly. Our technology powers real-time, in-session discovery by adapting to evolving user interests, in real-time.
All jobs at Albatross Visit website
Job Details
Category infrastructure
Posted 3 months ago