Germany Remote (Global)

Albatross is hiring a Site Reliability Engineer

Join Albatross as a Site Reliability Engineer to take ownership of the reliability and observability of our platform. This is a hands-on leadership role where you will design, build, and maintain our observability stack, lead incident response, oversee releases, and establish processes and standards.

What You'll Do

Own and evolve our observability stack, including Prometheus, Grafana, Loki, and Jaeger, along with dashboards, alerts, and SLOs.
Instrument services for meaningful metrics and tracing, reducing noise and improving signal.
Lead incident response and establish blameless postmortems, runbooks, and automated remediation.
Define, track, and improve SLIs and SLOs to proactively reduce reliability risk.
Own the release process end-to-end, improving deployment speed, safety, and recovery.
Implement progressive rollouts, feature flags, and rollback strategies.
Embed observability into the development lifecycle in close collaboration with engineering teams.
Maintain and evolve our Kubernetes-based platform, adopting new tools when they add real value.

What We're Looking For

5–7+ years in SRE, platform engineering, DevOps, or a similar hands-on role.
Strong production experience with Kubernetes and modern observability stacks like Prometheus, Grafana, Loki, and Jaeger/OpenTelemetry.
Proven track record leading incident response and building monitoring systems teams actually use.
Deep distributed systems knowledge and production debugging experience.
A pragmatic approach to tooling and alerting that teams trust.
Clear communicator across engineering, product, and leadership.
A STEM degree (Computer Science, Engineering, Mathematics, or similar).

Nice to Have

Contributions to open-source observability projects.
A background in high-scale or high-availability environments.

Technical Stack

Prometheus
Grafana
Loki
Jaeger
OpenTelemetry
Kubernetes

Benefits & Compensation

Remote-first, async-friendly culture.
Ownership and autonomy; you'll shape how we do reliability.
A team that cares about building things right.

Work Mode

This is a global, remote position open to candidates based in Europe.

Required Skills

PrometheusGrafanaLokiJaegerOpenTelemetryKubernetesSite Reliability EngineeringMonitoringObservabilityDistributed TracingInfrastructure as CodeCloud PlatformsIncident ManagementAutomation PrometheusGrafanaLokiJaegerOpenTelemetryKubernetesSite Reliability EngineeringMonitoringObservabilityDistributed TracingInfrastructure as CodeCloud PlatformsIncident ManagementAutomation

Earn more as a remote developer

Performance pay that rewards your skills

Iglu's revenue-sharing model means top performers earn significantly more than traditional salaries. Choose your projects, deliver great work, and see it reflected in your pay.

Revenue-sharing compensation

Project choice & autonomy

International client base

Career growth support

Check compensation

Top earners exceed market rate

About company

We’re building the second pillar of AI: a perception layer that understands how users actually experience content, in real time. Trained on live user interactions, Albatross learns and reasons on the fly. Our technology powers real-time, in-session discovery by adapting to evolving user interests, in real-time.

All jobs at Albatross Visit website

Job Details

Category infrastructure

Posted 3 months ago