Senior Site Reliability Engineer (Azure & OpenShift)
In Cyclad we work with top international IT companies in order to boost their potential in delivering outstanding, cutting edge technologies that shape the world of the future. Currently, we are looking for experienced Senior Site Reliability Engineer (Azure & OpenShift) to join our team.
Project information:
Location: Warsaw (hybrid)
Type of employment: B2B contract or standard employment contract
Project languages: English
Key Responsibilities:
Own and operate staging and production environments in Microsoft Azure
Manage and support application deployments on OpenShift (on-prem and Azure)
Support and optimize CI/CD pipelines and enable GitOps practices (e.g., ArgoCD)
Ensure system reliability through SLIs, SLOs, and continuous improvement of service health
Design, implement, and maintain observability solutions (monitoring, logging, alerting) using tools such as Prometheus, Grafana, Azure Monitor, and ELK/EFK
Troubleshoot issues across infrastructure, platform (Azure/OpenShift), applications, and deployments
Lead incident management, including root cause analysis (RCA), MTTR reduction, and prevention of recurring issues
Build and maintain Infrastructure as Code using Terraform and drive automation to reduce operational toil
Improve deployment reliability, release processes, and overall system resilience
Collaborate with development teams to embed reliability into design, delivery, and operational practices
Maintain and improve operational documentation, including runbooks and procedures
Ensure performance, scalability, cost efficiency, security, and compliance of cloud infrastructure
Advocate for SRE best practices and a DevOps culture across engineering teams
Requirements:
3+ years of experience in SRE, DevOps, or Platform Engineering roles
Strong experience with Microsoft Azure in production environments
Strong experience with OpenShift Container Platform (OCP) and Kubernetes
Experience with CI/CD pipelines (e.g., ArgoCD, Jenkins, GitHub Actions) and container-based deployments
Strong understanding of observability, incident management, and reliability engineering principles
Hands-on experience with Infrastructure as Code (Terraform)
Scripting experience (Bash or similar)
Experience with monitoring and logging tools (Prometheus, Grafana, ELK/EFK)
Strong focus on automation, system stability, and continuous improvement
We offer:
Private medical care with dental care (covering 70% of costs). Family package option possible
Multisport card (also for an accompanying person)
Life insurance
Work with talented engineers on large-scale, technically challenging projects
Senior Site Reliability Engineer (Azure & OpenShift)
Senior Site Reliability Engineer (Azure & OpenShift)