DOP-008 Site Reliability Engineer - Containerization & Kubernetes Infrastructure
Vancouver, Toronto (BC or ON only)
Our Client is searching for an experienced Site Reliability Engineer (SRE) to join our Containerization & Kube Infra team. As a member of this team, you will be focused on enabling reliable and efficient service runtime across our Engineering organization. We partner closely with contributors responsible for our Build & Delivery systems, our VMWare-based infrastructure, and our Observability systems.
Your Role: In this role, you will be expected to work on continuously improving the ability of engineers to develop, test, release, and maintain their production services. You will participate in managing the systems and processes that ensure a flexible and reliable container ecosystem, including Kubernetes cluster stability, deployment tooling, ecosystem security, and service integration support. To be successful, you will need to work with teams across the Engineering organization to understand their needs, and you will need to work closely with our internal Platform and Infrastructure teams to build and maintain the services that provide for those needs.
Who YOU Are: You are an active participant in a culture of sharing and learning. You believe that we succeed or fail as a team, and you confront problems (not people) when things are difficult. You are an experienced technologist with a passion for DevOps, and you have spent a few years dealing with complex automation problems in a Linux/Unix ecosystem. We expect experience with most of the tools and concepts outlined in the skill section (or comparable) -- but we know that nobody knows everything, and you are a growth-oriented engineer, right?
Your Skills:
- Extensive experience with Linux/Unix, particularly programming to automate tasks
- Moderate experience with distributed systems
- Some experience with low-level Linux ecosystem (eg kernel, cgroups)
- Familiarity with self-managed container orchestration (e.g. Kubernetes)
- Familiarity with Release Orchestration (Ansible, Capistrano)Familiarity with Build Automation (Jenkins, Github Actions)
- Familiarity with Configuration Management (Puppet)
- Programming with at least one OOP language (Python preferred; you may also encounter Ruby, etc)
- Scripting (distinct from mid-sized software development: as an SRE, you aren’t going to be able to avoid hacking on Bash)