Targeted Talent -
1,889 Jobs
Halifax, NS
We are looking for an experienced Site Reliability Engineer or Platform Operations Engineer for our client. This is a permanent position that is remote to start with later relocation to Calgary or Winnipeg. Our client is a global enterprise company with a product that you've likely used.
You Will:- Own development projects, providing technical guidance and delivering against the Platform & Service Operations Engineering roadmap.
- Designing and Implementing Wargames to test our operational response and identify areas of weakness in our platforms.
- Technical and Management Escalation point for Service Operations Centre (SOC) engineers and during major incidents.
- Troubleshooting, reproducing and mitigating issues in our production environments
- Mentoring other team members.
- Operate global AWS Platforms at scale
- Evidence of Strong Troubleshooting, problem-solving and investigative skills
- Experience of AWS or Other cloud providers
- Experience developing in Java
- Major incident management on experience operating production platforms at scale
- Experience working with distributed web applications
- Experience Automating operational tasks / Processes using other languages
- Understanding of relational and/or NoSQL data structures
- Experience mentoring/influencing peers
- Identifying improvements, highlighting risks vs benefits, and translating them into technical requirements
- Worked with Ansible, Terraform, Python
- Experience working with Serverless / Containers
- Experience of ELK &/Or Graphite/Prometheus / Grafana
- Used Tracing Tools in production before
- Experience in Chaos Engineering / Failure Injection Testing
- Experience of working in an Agile Environment
- Experience working in a similar site reliability role