Job Description
Role Overview
We are seeking a mid-level Site Reliability Engineer (SRE) with strong SQL skills to improve reliability, performance, and availability of Kubernetes-based, data-intensive services on Google Cloud Platform (GCP). You will operate and scale production workloads on GKE, define SLOs/SLIs, and partner with Engineering and Data teams to reduce toil, strengthen observability, and ship changes safely.
͏
Key Responsibilities
Experience:8+ Years
- Own reliability of GKE-hosted services; participate in on-call, triage, incident response, and blameless postmortems.
- Operate and improve GKE clusters (Standard/Autopilot as applicable): upgrades, node pools, autoscaling, cluster hardening, and capacity planning.
- Manage Kubernetes deployments using Helm/Kustomize and CI/CD; implement progressive delivery (canary/blue-green) and safe rollbacks.
- Build observability on GCP using Cloud Monitoring/Logging (and Prometheus/Grafana where needed): dashboards, SLOs, and actionable alerting.
- Troubleshoot issues across app/network/cluster layers: GKE networking, VPC, Cloud Load Balancing/Ingress, DNS, IAM, quotas, and resource limits.
- Use SQL to investigate incidents and performance issues: query tuning, indexing strategy, schema improvements, and execution plan analysis.
- Operate relational data services (e.g., Cloud SQL) including backups/restores, HA/DR, maintenance windows, and performance monitoring.
- Harden platform access and secrets: IAM least privilege, Workload Identity, Secret Manager/KMS, and secure service-to-service auth.
- Reduce toil through automation (runbooks, self-healing, ChatOps), and improve release safety and operational readiness.
͏
Required Qualifications
- Mid-level experience (typically 2–5 years) supporting production systems with high availability and performance requirements.
- Hands-on experience with Kubernetes in production; ability to debug issues across pods, nodes, networking, and autoscaling.
- Working experience on GCP with core services such as IAM, VPC networking, Cloud Load Balancing, and Cloud DNS.
- Strong SQL skills (complex joins, window functions, indexing strategy, query tuning, and troubleshooting).
- Experience operating relational databases (MySQL/PostgreSQL/SQL Server) and/or Cloud SQL (backups, maintenance, performance analysis).
- Proficiency with Linux and scripting (Python/Bash) for automation and operational tooling.
- Practical observability and incident management skills: alert triage, root-cause analysis, and postmortem-driven improvements.
Preferred Qualifications
- Strong hands-on with GKE (Standard/Autopilot), including multi-environment operations and upgrade strategies.
- GitOps experience (Argo CD/Flux) and standardized release practices across multiple clusters/environments.
- GCP security and governance: Workload Identity, Organization Policies, least-privilege IAM, and audit/logging best practices.
- Supply chain & runtime security: Artifact Registry, image scanning, Binary Authorization, and policy-as-code (OPA/Gatekeeper).
- Networking and traffic protection: Ingress controllers, Cloud Load Balancing, Cloud Armor, TLS, and (optional) service mesh (Istio/Anthos Service Mesh).
- Infrastructure as Code (Terraform) and configuration management/automation (Ansible); cost optimization on GCP.
͏
Key Tools & Technologies
GCP: GKE (Standard/Autopilot), IAM, VPC, Cloud Load Balancing, Cloud DNS; Observability: Cloud Monitoring & Cloud Logging, Managed Service for Prometheus, Grafana, OpenTelemetry; CI/CD & GitOps: Cloud Build/GitHub Actions/Jenkins, Argo CD/Flux; Data: Cloud SQL (MySQL/PostgreSQL), SQL, (optional) BigQuery; Security: Secret Manager, Cloud KMS, Artifact Registry, Binary Authorization, Cloud Armor; IaC: Terraform; Scripting: Python/Bash.
͏
Deliver
| No. | Performance Parameter | Measure |
| 1. | Continuous Integration, Deployment & Monitoring | 100% error free on boarding & implementation |
| 2. | CSAT | Manage service tools Troubleshoot queries Customer experience |
| 3. | Capability Building & Team Management | % trained on new age skills, Team attrition %, Employee satisfaction score |
Experience: 5-8 Years .
Reinvent your world. We are building a modern Wipro. We are an end-to-end digital transformation partner with the boldest ambitions. To realize them, we need people inspired by reinvention. Of yourself, your career, and your skills. We want to see the constant evolution of our business and our industry. It has always been in our DNA - as the world around us changes, so do we. Join a business powered by purpose and a place that empowers you to design your own reinvention.