Job Description
AWS Devops Operations
Rate :- 170K/M
Location: Gurugram
Role Level: Senior / L3(10+ years )
Role Overview
We are seeking a highly skilled Senior Observability Engineer to help with the optimization and standardization of our Grafana Cloud ecosystem. This role is critical for reducing operational expenditure through efficient platform configuration and establishing the observability framework for the TSA separation programme.
The ideal candidate is a subject matter expert in the Grafana LGTM stack (Loki, Grafana, Tempo, Mimir/Prometheus) with a proven track record of implementing cost-effective, high-performance monitoring solutions.
Key Responsibilities
1. Grafana Cloud Optimization (Cost & Performance)
- Cost Optimization : Implement strategies to decrease monthly expenditure, including query optimization, refining data retention policies, and eliminating redundant data ingestion.
- Dashboard & Alerting: Enhance monitoring quality by refining existing alerts and creating high-impact, insightful dashboards that improve system stability.
2. Strategy & Best Practices
- Standardization: Develop and document organization-wide best practices for configuring Grafana Cloud, Prometheus, and Loki.
- Governance & Security: Implement robust Role-Based Access Controls (RBAC) to mitigate security vulnerabilities and prevent unauthorized access to sensitive log data.
- Migration Roadmap: Establish foundational observability guidelines to ensure the TSA separation is launched with consistent and effective monitoring.
3. Platform Architecture & Collaboration
- Foundation Project: Collaborate with the Osttra Platform teams to define core observability components, including logging, metrics, and tracing standards.
- Architectural Design: Contribute to the design of scalable observability solutions that will be integrated into the core platform architecture.
- Knowledge Transfer: Mentor internal teams to foster long-term observability expertise and ensure the sustainability of the new standards.
Technical Qualifications (L3 Requirements)
- Expert-level Grafana Cloud: Extensive experience managing Grafana Cloud at scale, specifically focusing on cost management and performance tuning.
- Observability Stack: Deep technical proficiency in Prometheus (metrics), Loki (logging), and Tempo (tracing).
- Data Strategy: Proven ability to manage complex data ingestion pipelines and optimize "cardinality" to reduce cloud costs.
- Security Mindset: Practical experience implementing secure access controls and compliance standards within observability platforms.
- Infrastructure as Code: Experience defining observability components as code to support automated platform foundations.
͏
Do
- Provide adequate support in architecture planning, migration & installation for new projects in own tower (platform/dbase/ middleware/ backup)
- Lead the structural/ architectural design of a platform/ middleware/ database/ back up etc. according to various system requirements to ensure a highly scalable and extensible solution
- Conduct technology capacity planning by reviewing the current and future requirements
- Utilize and leverage the new features of all underlying technologies to ensure smooth functioning of the installed databases and applications/ platforms, as applicable
- Strategize & implement disaster recovery plans and create and implement backup and recovery plans
- Manage the day-to-day operations of the tower
- Manage day-to-day operations by troubleshooting any issues, conducting root cause analysis (RCA) and developing fixes to avoid similar issues.
- Plan for and manage upgradations, migration, maintenance, backup, installation and configuration functions for own tower
- Review the technical performance of own tower and deploy ways to improve efficiency, fine tune performance and reduce performance challenges
- Develop shift roster for the team to ensure no disruption in the tower
- Create and update SOPs, Data Responsibility Matrices, operations manuals, daily test plans, data architecture guidance etc.
- Provide weekly status reports to the client leadership team, internal stakeholders on database activities w.r.t. progress, updates, status, and next steps
- Leverage technology to develop Service Improvement Plan (SIP) through automation and other initiatives for higher efficiency and effectiveness
͏
Team Management
- Resourcing
- Forecast talent requirements as per the current and future business needs
- Hire adequate and right resources for the team
- Train direct reportees to make right recruitment and selection decisions
- Talent Management
- Ensure 100% compliance to WiproâÂÂs standards of adequate onboarding and training for team members to enhance capability & effectiveness
- Build an internal talent pool of HiPos and ensure their career progression within the organization
- Promote diversity in leadership positions
- Performance Management
- Set goals for direct reportees, conduct timely performance reviews and appraisals, and give constructive feedback to direct reports.
- Ensure that organizational programs like Performance Nxt are well understood and that the team is taking the opportunities presented by such programs to their and their levels below
- Employee Satisfaction and Engagement
- Lead and drive engagement initiatives for the team
- Track team satisfaction scores and identify initiatives to build engagement within the team
- Proactively challenge the team with larger and enriching projects/ initiatives for the organization or team
- Exercise employee recognition and appreciation
͏
Deliver
| No | Performance Parameter | Measure |
| 1 | Operations of the tower | SLA adherence Knowledge management CSAT/ Customer Experience Identification of risk issues and mitigation plans Knowledge management |
| 2 | New projects | Timely delivery Avoid unauthorised changes No formal escalations |
͏
Experience: 8-10 Years .
Reinvent your world. We are building a modern Wipro. We are an end-to-end digital transformation partner with the boldest ambitions. To realize them, we need people inspired by reinvention. Of yourself, your career, and your skills. We want to see the constant evolution of our business and our industry. It has always been in our DNA - as the world around us changes, so do we. Join a business powered by purpose and a place that empowers you to design your own reinvention.