Site Reliability Engineer (SRE)

Wisedocs AI - 8 emplois

Toronto, ON

Temps plein

Expérimenté

Avantages pour l'entreprise

Congés payés

Options d'achat d'actions

Publié il y a 9 jours

Postuler sur le site de l'entreprise

Wisedocs is on a mission to make it easy and accessible for any company in the insurance, legal and medical space to understand medical documents quickly using AI (Artificial Intelligence). Every week, we process hundreds of thousands of pages of documents, saving our customers hours and hours of manual processing time, and helping them process medical claims much more quickly.

Join Wisedocs AI as a Site Reliability Engineer, where you will be responsible for designing, implementing, and maintaining our cloud-based infrastructure and operational processes. You will work closely with our development and operations teams to build reliable, automated systems that support our rapidly growing user base and mission-critical applications. This role combines software engineering, system administration, and operational expertise to optimize our service reliability and performance.

The position is a hybrid model requiring on-site presence 2-3 days/week in Downtown Toronto.

Responsibilities

As a member of our Engineering team, your primary responsibilities will include:

Infrastructure Reliability & Monitoring:

Design, build, and maintain scalable, resilient, and secure cloud infrastructure on AWS.
Implement robust monitoring, alerting, and logging systems to ensure high availability and performance.
Develop and maintain automated processes for infrastructure deployment, updates, and recovery.

Incident Response & Troubleshooting:

Serve as the first line of defence during incidents; the majority of your time will be spent on initial incident response, including rapid detection, diagnosis, and remediation of outages or performance issues.
Lead escalation procedures as needed, conduct root cause analyses, and implement long-term fixes to prevent recurrence of incidents.

Performance & Capacity Planning:

Continuously evaluate system performance, identify bottlenecks, and proactively plan for future growth.
Develop and maintain tools to measure and optimize system performance.

Automation & DevOps:

Collaborate with software development teams to integrate SRE best practices into the development lifecycle.
Automate repetitive tasks and implement CI/CD pipelines to streamline deployments and operational workflows.

Security & Compliance:

Ensure systems are secure, compliant with industry standards, and follow best practices for data protection and privacy.
Work with cross-functional teams to address security vulnerabilities and maintain system integrity.

Documentation & Collaboration:

Create clear, detailed documentation for infrastructure, processes, and operational procedures.
Serve as a key resource for reliability and performance insights across the organization.
Other duties and responsibilities will be assigned as projects develop, adjust and mature.

What to expect from our Recruitment Process:

Round #1 – HR (Quick Prescreen)
Duration: 20-30 minutes
Focus: High level Get-to-Know-You
Round #2 - Technical Assessment
Duration: 30-45 minutes
Focus: Practical Coding Evaluation
Round #3 - Hiring Manager Interview
Duration: 1-1.5 hours
Focus: Experience, technical skills (conceptual assessment), team integration.
Round #4 - Meet with our CTO!
Duration: 45 min
Focus: Culture fit, strategic alignment

Requirements

Technical Expertise:

Proven experience in a Site Reliability Engineer, DevOps, or similar role in a cloud environment (AWS preferred).
Experience with infrastructure-as-code tools (e.g., Terraform, CloudFormation) and container orchestration (e.g., Kubernetes).
Solid programming/scripting skills in languages such as Python or Bash

Operational Skills:

Experience with monitoring tools (e.g., Prometheus, Grafana, Datadog) and logging solution
Familiarity with CI/CD pipelines and version control systems (e.g., Git).
Knowledge of networking concepts, load balancing, and high availability architectures.

Soft Skills:

Excellent problem-solving skills and the ability to work under pressure during incident resolution.
Strong communication skills with the ability to collaborate effectively across technical teams.
A proactive, detail-oriented mindset with a passion for continuous improvement.

Preferred Qualifications:

Experience working in a SaaS or high-growth startup environment.
Familiarity with agile methodologies and collaborative cross-functional team environments.
Familiarity with browser developer console.
Relevant certifications in cloud technologies (e.g., AWS Certified DevOps Engineer, Certified Kubernetes Administrator).
Familiarity with browser developer tools
Experience with ticketing systems

Benefits

What We Offer

A hybrid work model,
Modern employee benefits, including health and dental coverage
Competitive compensation, with valuable stock options, as we're still a young company growing very quickly.
An opportunity to develop very rapidly in your career. We can offer you a super-immersive learning environment, and you thrive there you will have the opportunity to rapidly develop this opportunity into senior practitioner or management opportunities as you choose.
Access to a learning and professional development fund to help you level up your career while you're working with us. We hope to be an incredible step up for your career if you decide to come and work with us.
Company events
Generous Paid Time Off
Paid Sick Days
Casual Dress code
Employee Referral Bonus
Tuition Assistance
Plus many other Recognition Programs!

Join our team and be part of a company committed to making a positive impact on the InsureTech and HealthTech industries.

*Wisedocs AI is an equal opportunity employer and are committed to providing employment accommodation in accordance with AODA. If you require an accommodation, please notify us and we will work with you to meet your needs.

#Engineering carrières

Postuler sur le site de l'entreprise

Enregistrer

Site Reliability Engineer (SRE)

Partager un emploi :