Job Description
Role Summary
The Site Reliability Engineer (SRE) – NOC (Associate) will support the AWS cloud team in designing, implementing, and maintaining cloud infrastructure. The role focuses on ensuring availability, performance, and scalability of services, while identifying opportunities for automation and process improvement. The SRE NOC will collaborate across teams, contribute to technical initiatives, and follow best practices in incident and problem management to help ensure the reliability and growth of the organization.
Required Skills:
- Good working knowledge of AWS services, including EC2, S3, RDS, ELB, VPC, EKS, MSK, ES, EMR, CloudFormation, CloudWatch, Route 53, CloudFront, SNS, IAM, and API Gateway.
- Experience working with public cloud environments, VPC configuration, monitoring, and basic cloud security.
- Hands-on experience configuring and maintaining VPCs and network resources within AWS.
- Ability to create technical documentation (runbooks/playbooks) and assist in training team members.
- Understanding of network fundamentals, including routers, switches, firewalls, and load balancers.
- Familiarity with virtualization, provisioning, and configuration management tools such as Terraform and AWS CloudFormation.
- Experience or knowledge of database technologies, including Oracle, RDS (MySQL and PostgreSQL), and ElasticSearch.
- Exposure to monitoring tools such as CloudWatch, AppDynamics, and Splunk; awareness of Nagios and Grafana is a plus.
- Understanding of containerization, microservices architecture, and application management in Kubernetes environments.
- Good skills in Linux administration and server management.
- Experience with IT Service Management (ITSM) platforms, such as ServiceNow and Jira, for incident, problem, and change management.
- Participation in change management processes for servers supporting SaaS products.
- Troubleshooting skills in Shell and Python scripting.
- Exposure to GenAI tools is an added advantage.
͏
Qualifications & Responsibilities:
- Minimum of 5+ years of overall experience, with at least 3+ years of relevant exposure to Site Reliability Engineering and Application Monitoring including incident management.
- Expected to assist the AWS cloud team, following best practices and maintaining high standards.
- Ability to help develop, document (playbooks/runbooks), integrate, and implement new technologies and solutions to improve operational efficiency.
- Experience in performance monitoring, infrastructure support, and automation, with an understanding of optimizing cloud environments.
- Willingness to analyze environments and provide recommendations to support organizational growth and scalability.
- Ability to work in a fast-paced, high-demand setting, while maintaining a collaborative and team-oriented approach.
- Responsible for following incident and problem management policies, ensuring compliance and effective resolution of issues.
- AWS Certified Cloud Practitioner or equivalent AWS certification is preferred.
͏
͏
͏
Experience: 3-5 Years .
Reinvent your world. We are building a modern Wipro. We are an end-to-end digital transformation partner with the boldest ambitions. To realize them, we need people inspired by reinvention. Of yourself, your career, and your skills. We want to see the constant evolution of our business and our industry. It has always been in our DNA - as the world around us changes, so do we. Join a business powered by purpose and a place that empowers you to design your own reinvention.