Title: Site Reliability Engineer(Contract)
Role Purpose
Required Skills: · 5+Years of experience in system administration, application development, infrastructure development or related areas · 5+ years of experience with programming in languages like Javascript, Python, PHP, Go, Java or Ruby · 3+ years of in reading, understanding and writing code in the same · 3+years Mastery of infrastructure automation technologies (like Terraform, Code Deploy, Puppet, Ansible, Chef) · 3+years expertise in container/container-fleet-orchestration technologies (like Kubernetes, Openshift, AKS, EKS, Docker, Vagrant, etcd, zookeeper) · 5+ years Cloud and container native Linux administration /build/ management skills Key Responsibilities: · Hands-on design, analysis, development and troubleshooting of highly-distributed large-scale production systems and event-driven, cloud-based services · Primarily Linux Administration, managing a fleet of Linux and Windows VMs as part of the application solutions · Involved in Pull Requests for site reliability goals · Advocate IaC (Infrastructure as Code) and CaC (Configuration as Code) practices within Honeywell HCE · Ownership of reliability, up time, system security, cost, operations, capacity and performance-analysis · Ensuring the repeatability, traceability, and transparency of our infrastructure automation · Support on-call rotations for operational duties that have not been addressed with automation · Support healthy software development practices, including complying with the chosen software development methodology (Agile, or alternatives), building standards for code reviews, work packaging, etc. · Create and maintain monitoring technologies and processes that improve the visibility to our applications’ performance and business metrics and keep operational workload in-check. · Partnering with security engineers and developing plans and automation to aggressively and safely respond to new risks and vulnerabilities. · Develop, communicate, collaborate, and monitor standard processes to promote the long-term health and sustainability of operational development tasks. |