Job Title or Location

Site Reliability Engineer

Atlantis IT group - 9 Jobs
Halifax, NS
Posted 4 days ago
Job Details:
Full-time
Experienced

Role - Site Reliability Engineer Location - HalifaxHybrid position We are seeking a highly skilled Kubernetes Administrator - Site Reliability Engineer (SRE) to manage, optimize, and ensure the reliability of Kubernetes-based infrastructure. The ideal candidate will have deep expertise in Kubernetes, container orchestration, cloud infrastructure, and automation, along with a strong focus on reliability, scalability, and performance.
Key Responsibilities:
Kubernetes Administration: Deploy, configure, and manage Kubernetes clusters in cloud and on-prem environments.
Reliability & Performance: Implement best practices to ensure high availability, scalability, and performance of containerized applications.
Monitoring & Incident Response: Set up monitoring (Prometheus, Grafana, ELK, etc.), troubleshoot issues, and lead incident resolution.
Automation & Infrastructure as Code (IaC): Develop and maintain Terraform, Helm charts, and Kubernetes manifests for automation.
CI/CD & DevOps Integration: Work with DevOps teams to optimize CI/CD pipelines for Kubernetes deployments (Jenkins, ARBCCD, FluxCD, etc.).
Security & Compliance: Implement security best practices for containerized workloads, RBAC, network policies, and vulnerability scanning.
Capacity Planning & Optimization: Analyze resource usage and optimize infrastructure costs and performance.
Disaster Recovery & Backup: Implement backup and disaster recovery strategies for Kubernetes.

Share This Job: