Job Description
Location: Gurgaon or PAN INDIA
rate :- 180K/M
Role Overview
We are seeking a highly skilled Senior Observability Engineer to help with the optimization and standardization of our Grafana Cloud ecosystem. This role is critical for reducing operational expenditure through efficient platform configuration and establishing the observability framework for the TSA separation programme.
The ideal candidate is a subject matter expert in the Grafana LGTM stack (Loki, Grafana, Tempo, Mimir/Prometheus) with a proven track record of implementing cost-effective, high-performance monitoring solutions.
Key Responsibilities
1. Grafana Cloud Optimization (Cost & Performance)
· Cost Optimization : Implement strategies to decrease monthly expenditure, including query optimization, refining data retention policies, and eliminating redundant data ingestion.
· Dashboard & Alerting: Enhance monitoring quality by refining existing alerts and creating high-impact, insightful dashboards that improve system stability.
2. Strategy & Best Practices
· Standardization: Develop and document organization-wide best practices for configuring Grafana Cloud, Prometheus, and Loki.
· Governance & Security: Implement robust Role-Based Access Controls (RBAC) to mitigate security vulnerabilities and prevent unauthorized access to sensitive log data.
· Migration Roadmap: Establish foundational observability guidelines to ensure the TSA separation is launched with consistent and effective monitoring.
3. Platform Architecture & Collaboration
· Foundation Project: Collaborate with the Osttra Platform teams to define core observability components, including logging, metrics, and tracing standards.
· Architectural Design: Contribute to the design of scalable observability solutions that will be integrated into the core platform architecture.
· Knowledge Transfer: Mentor internal teams to foster long-term observability expertise and ensure the sustainability of the new standards.
Technical Qualifications (L3 Requirements)
· Expert-level Grafana Cloud: Extensive experience managing Grafana Cloud at scale, specifically focusing on cost management and performance tuning.
· Observability Stack: Deep technical proficiency in Prometheus (metrics), Loki (logging), and Tempo (tracing).
· Data Strategy: Proven ability to manage complex data ingestion pipelines and optimize "cardinality" to reduce cloud costs.
· Security Mindset: Practical experience implementing secure access controls and compliance standards within observability platforms.
Infrastructure as Code: Experience defining observability components as code to support automated platform foundations
͏
| Areas of responsibility | |
| Triaging and resolution of tickets | Establish incident monitoring protocols and provide technical guidance to enable permanent fixes for recurring issues. |
| SLA Monitoring | Review SLA performance across the team, represent the function in meetings, and ensure consistent delivery quality. |
͏
| Performance tuning and optimization | Collaborate with technical teams in managing production support activities regarding incident resolution, efficient deployments, and implementation of changes. |
| Perform system Improvements | Collaborate with technical teams to review mitigation measures and proposed improvement actions. |
| Handling disaster recovery | Develop disaster recovery and data backup plans to ensure system resilience and quick restoration in case of disruptions. |
͏
| Reporting and stakeholder engagement | Participate in Quarterly Business Reviews (QBRs), manage client expectations, and ensure alignment with business objectives. |
| Team Management | Supervise team members by overseeing daily operations, conducting performance appraisals, and providing feedback. |
| Knowledge base and continuous learning | Update and maintain documentation for complex issues, share best practices within the team, and support ongoing knowledge enhancement to improve service efficiency. |
͏
Experience: 8-10 Years .
Reinvent your world. We are building a modern Wipro. We are an end-to-end digital transformation partner with the boldest ambitions. To realize them, we need people inspired by reinvention. Of yourself, your career, and your skills. We want to see the constant evolution of our business and our industry. It has always been in our DNA - as the world around us changes, so do we. Join a business powered by purpose and a place that empowers you to design your own reinvention.