Job Description
Senior Site Reliability / DevOps Engineer – Observability & Automation
Role Overview
The Senior Site Reliability / DevOps Engineer will be responsible for building resilient, scalable, and observable platforms through strong automation, infrastructure engineering, and SRE best practices. This role blends SRE, DevOps, and platform engineering with hands‑on programming and production ownership in complex, distributed environments.
Key Responsibilities
- Design, build, and operate high‑reliability production platforms following SRE and DevOps principles.
- Develop automation and tooling using Python and Go to reduce operational toil and improve system reliability.
- Implement and maintain Ansible‑based automation for configuration management and infrastructure operations.
- Design and operate CI/CD pipelines using Jenkins, GitHub Actions, GitLab, and Azure DevOps.
- Implement Infrastructure as Code using Terraform and configuration management using Helm and Kustomize.
- Support and operate containerized and cloud‑native workloads on Docker and Kubernetes.
- Build, operate, and optimize observability platforms (metrics, logs, traces) using Prometheus, Grafana, ELK, Splunk, or similar tools.
- Ensure deep visibility into system health, performance, and availability across distributed environments.
- Troubleshoot and resolve critical production issues, performing root cause analysis and driving permanent fixes.
- Partner with infrastructure, platform, and application teams to improve system reliability, scalability, and operability.
Required Skills & Experience
- 8+ years of experience in SRE, DevOps, Platform Engineering, or Production Engineering roles.
- Strong programming expertise in:
- Python (automation, scripting, internal tooling)
- Go (systems programming, microservices, CLIs)
- Hands‑on experience with Ansible for automation and configuration management.
- Strong understanding of Linux internals, networking, and distributed systems.
- Proven experience with CI/CD pipelines and Git‑based workflows.
- Hands‑on experience with Infrastructure as Code (Terraform) and configuration tooling (Helm, Kustomize).
- Solid experience running containerized environments using Docker and Kubernetes.
- Strong background in observability engineering (metrics, logs, traces).
- Experience working with at least one cloud platform: AWS, Azure, or GCP.
- Excellent troubleshooting skills and experience managing high‑severity production incidents.
Good to Have
- Experience applying SRE concepts such as SLIs, SLOs, and error budgets.
- Exposure to large‑scale, multi‑region distributed systems.
- Experience building internal developer platforms or reliability tooling.
Experience: 5-8 Years .
Reinvent your world. We are building a modern Wipro. We are an end-to-end digital transformation partner with the boldest ambitions. To realize them, we need people inspired by reinvention. Of yourself, your career, and your skills. We want to see the constant evolution of our business and our industry. It has always been in our DNA - as the world around us changes, so do we. Join a business powered by purpose and a place that empowers you to design your own reinvention.