Job Description
Dublin 3 Days a week/ 1 Year
Job Description – SRE (Observability & Database Reliability Engineer)
Role Summary
We are seeking a Site Reliability Engineer (SRE) with strong Database Reliability and Observability expertise to ensure high availability, performance, and operational visibility of business‑critical platforms. This role has a strong emphasis on dashboards, observability, Splunk, and operational reporting, along with hands‑on database operations in complex production environments.
Key Responsibilities
SRE & Reliability Engineering
- Own end‑to‑end reliability, availability, and performance of applications and database platforms.
- Define, implement, and track SLIs, SLOs, and error budgets.
- Proactively identify reliability risks using metrics, trends, and capacity analysis.
- Lead production incident management, root cause analysis (RCA), and post‑incident reviews.
- Drive automation to reduce operational toil and improve MTTR.
- Participate in on‑call rotations and support 24x7 production environments.
Observability, Dashboards & Reporting (Primary Focus)
- Design and maintain end‑to‑end observability covering metrics, logs, alerts, and traces.
- Build and manage real‑time operational and executive dashboards for system health, availability, latency, and database performance.
- Strong hands‑on experience with Splunk including log ingestion, SPL queries, dashboards, alerts, and reports.
- Correlate application, infrastructure, and database events to detect issues proactively.
- Create and publish operational reports (daily / weekly / monthly) covering availability, incidents, SLO compliance, performance KPIs, and capacity trends.
- Translate technical metrics into actionable insights for engineering and leadership teams.
Database Reliability & Operations
- Support and operate enterprise databases such as PostgreSQL or Oracle (mandatory experience in at least one).
- Monitor and tune database performance including queries, indexes, and resource utilization.
- Design and support high availability, replication, backup, and disaster recovery solutions.
- Perform database upgrades, patching, migrations, and routine health checks.
- Integrate database monitoring and logs with observability platforms.
͏
Required Skills & Experience
- 10+ years of experience in SRE, Production Support, DevOps, or Reliability Engineering roles.
- Strong expertise in observability and monitoring tools, with mandatory hands‑on experience in Splunk.
- Proven experience in dashboard building and operational reporting.
- Strong hands‑on experience with PostgreSQL or Oracle databases.
- Solid Linux/Unix administration and troubleshooting skills.
- Experience with incident response, RCA, and production on‑call support.
- Proficiency in scripting using Python, Shell, or Bash.
- Strong analytical and communication skills.
Preferred Skills
- Experience with cloud platforms such as AWS or Azure.
- Exposure to Kubernetes, Docker, and containerized environments.
- Experience with Infrastructure as Code tools such as Terraform or Ansible.
- Knowledge of capacity planning, forecasting, and performance baselining.
- Experience supporting regulated or high‑availability systems.
͏
Deliver
| No | Performance Parameter | Measure |
| 1 | Operations of the tower | SLA adherence Knowledge management CSAT/ Customer Experience Identification of risk issues and mitigation plans Knowledge management |
| 2 | New projects | Timely delivery Avoid unauthorised changes No formal escalations |
Experience: 5-8 Years .
Reinvent your world. We are building a modern Wipro. We are an end-to-end digital transformation partner with the boldest ambitions. To realize them, we need people inspired by reinvention. Of yourself, your career, and your skills. We want to see the constant evolution of our business and our industry. It has always been in our DNA - as the world around us changes, so do we. Join a business powered by purpose and a place that empowers you to design your own reinvention.