Location: Dallas or SF Bay area
The Site Reliability Engineer (SRE) role is accountable for ensuring very high availability of the entire application environment and is responsible for taking appropriate actions to minimize downtime and unavailability and resolve issues pertaining to performance etc.
Site reliability engineer is expected to have a very good understanding of the application deployment processes, application technology stack, extremely good familiarity regarding application hosting infrastructure and connectivity issues, security etc.
SREs are expected to know the entire development lifecycle for the applications that they support as well as the infrastructure and other components that can potentially affect site performance or availability. They will interact with DevOps personnel, Developers, Infrastructure operations people, NOC and Command center staff as needed.
SREs are expected to oversee operations metrics that project health of the environment and take pro-active corrective action if needed. SREs will build or gather those metrics that might not be available using standard monitoring tools for the environments that they support.
SREs will automate remediation and or recovery processes that are needed to minimize/eliminate downtimes.
SREs will analyze site failures/release failures/module failures and put corrective processes/automation in place to recover from those failures. SREs will constantly test the environment to ensure extreme availability of all application components.
The SRE function rolls up to the head of operations.
Candidates should be self-motivated and collaborative IT professionals with a strong background in software development, systems administration and IT automation.
Ideal Job Requirements:
- 10+ years of overall IT experience
- 6 to 10 years of experience on Linux/cloud Operations/AWS/Private Cloud
- 2 to 4 years of experience in a large (largish) environment
- Must have very good scripting skills. Must have automation experience
- Must have experience in managing large application environments with skills in
- Scaling up/down those environments
- Deployment Processes
- Past Experience with CI/CD processes
- Past Experience with any DevOps processes
- Must have experience in 24by7 operations
- Good Hands on experience with-in the Datacenter
- Virtualization and storage technologies
- Providing Datacenter and Operations reports
- Must have very good communication skills
- Past experience with Automation of SOPs, Runbooks etc
- Development experience or Developer support experience
- Production support of Business (Mission) critical applications
- Familiarity and Experience with Automated Release management processes
NetEnrich is a next-gen IT infrastructure & operations management, automation, cloud, DevOps, & Cyber-security services provider for Enterprises. Our services and products span IT infrastructure, cloud, and applications, as well as enable agile DevOps and intelligent business operations at IT environments. We combine elastic industrialized services with automation technologies, products, and proprietary analytics to deliver a new world approach to IT operations. We mitigate risks with IT operations, drive innovation, and transform IT teams & solution providers to become a true service provider to their business. We also enable them to best unlock the potential of new world technologies such as cloud, virtualization, and mobility.
Because we hire only the best and brightest, we nourish that talent with an environment where people can innovate, thrive, and pursue their passions.
Please send your profiles to USjobs@netenrich.com