Tekvaly is looking for a Site Reliability Engineer (SRE) in Canada for its client.
Who We are
Tekvaly is a diversified global software development and IT consulting company that gives both offshore and onshore technical solutions to business enterprises. Our mission is to enable superior returns on clients’ technology investments through best-in-class industry solutions, domain expertise and global scale. We feel deeply connected to our customers, and therefore our success isn’t just a matter of our bottom line, but a reflection of how our customers flourish, and how their communities thrive. We strive to understand our customers’ individual needs so that we can develop products and services that enhance their livelihoods. Our customers are our partners, and when we rise, we rise together.
As an SRE, you will design, build, and maintain the automation, infrastructure, and observability that keep services reliable, scalable, and easy to deploy.
Responsibilities
- Design, implement, and maintain CI/CD pipelines and automation that ensure reliable, scalable, and safe deployments for development and production environments.
- Automate infrastructure provisioning and configuration using Infrastructure-as-Code (Terraform, CloudFormation, or similar).
- Work with Kubernetes and container orchestration to manage and scale containerised applications.
- Build and maintain internal developer platforms and self-service tooling that reduce friction for engineering teams.
- Implement and tune monitoring, logging, and alerting to ensure high availability and fast incident response.
- Define and track reliability metrics (SLIs/SLOs) and drive improvements to reduce downtime and improve user experience.
- Collaborate with developers, QA, and security teams to integrate security and quality checks into the delivery pipeline.
- Continuously improve deployment processes, rollback strategies, and release workflows to make releases fast and lowrisk.
- Stay current on DevOps, SRE, and cloudnative trends and recommend improvements to tools and practices.
Requirements
- Professional experience in Site Reliability Engineering (SRE), DevOps, or Platform Engineering, with a focus on automation and system reliability.
- Strong understanding of at least one major cloud platform (AWS, Azure, or GCP).
- Experience with containers and orchestration (Docker, Kubernetes, Helm).
- Experience with InfrastructureasCode tools (Terraform, CloudFormation, Ansible, or similar).
- Hands-on experience with CI/CD platforms such as GitHub Actions, GitLab CI, Jenkins, or Azure DevOps.
- Solid scripting and programming skills with strong experience in Linux environments and Bash (Python, PowerShell, or similar is a plus).
- Familiarity with monitoring and observability tools (e.g., Prometheus, Grafana, Datadog, ELK stack, or similar).
- Understanding of reliability engineering concepts such as SLIs, SLOs, error budgets, and incident management.
- Strong communication skills and ability to collaborate with cross-functional teams.
- Bachelor’s degree in Computer Science, Software Engineering, or a related field, or equivalent experience.
Soft Skills We Like to See:
- Excellent Communication skills.
- Adaptability and willingness to learn.
- Problem-solving mindset.
- Analytical skills.
- Ability to work in a team environment and collaborate effectively with others.
********************************************************************************************************************************************************************************************
Accommodations will be provided on request for candidates taking part in all aspects of our recruitment and selection process.
We thank all candidates for their interest; however, only those selected for an interview will be contacted.
********************************************************************************************************************************************************************************************