RQ10729 - DevOPS/Cloud Engineer - Senior

Toronto, ON, Canada
-
On-Site

Apply now Refer

Job Description:

Security Level: CRJMC

Must Have:

Design, provision, and manage AWS infrastructure including VPCs, subnets, security groups, IAM policies, EC2, ECS, EKS, RDS, S3, Route 53, and CloudFront.
Architect multi-account AWS environments following AWS Well-Architected Framework principles.
Manage AWS cost optimization strategies including Reserved Instances, Savings Plans, and rightsizing.
Develop, maintain, and refactor Terraform modules and configurations for all cloud infrastructure.
Author and maintain Ansible playbooks, roles, and collections for server configuration, application deployment, and compliance enforcement.
Operate and administer Red Hat OpenShift Service on AWS (ROSA) clusters, including cluster upgrades, node scaling, and add-on management.
Design and maintain CI/CD pipelines (GitLab CI, Azure DevOps Service) for infrastructure and application delivery.

Experience and Skill Set Requirements

1. Cloud Infrastructure & AWS

· Design, provision, and manage AWS infrastructure including VPCs, subnets, security groups, IAM policies, EC2, ECS, EKS, RDS, S3, Route 53, and CloudFront.

· Architect multi-account AWS environments following AWS Well-Architected Framework principles.

· Manage AWS cost optimization strategies including Reserved Instances, Savings Plans, and rightsizing.

· Implement and maintain CloudTrail, Config, GuardDuty, Security Hub, and AWS Organizations SCPs.

2. Infrastructure as Code — Terraform/Terraform Cloud

· Develop, maintain, and refactor Terraform modules and configurations for all cloud infrastructure.

· Manage Terraform Cloud workspaces, remote state backends, variable sets, and team access policies.

· Enforce IaC standards including module versioning, input/output conventions, and documentation.

· Implement drift detection and remediation workflows using Terraform Cloud run tasks and policy-as-code (Sentinel or OPA).

· Lead Terraform code review processes and mentor junior team members on best practices.

3. Configuration Management — Ansible

· Author and maintain Ansible playbooks, roles, and collections for server configuration, application deployment, and compliance enforcement.

· Manage Ansible inventories across dynamic cloud environments using AWS dynamic inventory plugins.

· Integrate Ansible automation with CI/CD pipelines for repeatable and auditable deployments.

· Use Ansible Vault for secrets management and always ensure secure handling of credentials.

· Develop idempotent, well-tested automation that reduces manual toil and configuration drift.

4. Container Platform — OpenShift ROSA

· Operate and administer Red Hat OpenShift Service on AWS (ROSA) clusters, including cluster upgrades, node scaling, and add-on management.

· Define and enforce OpenShift RBAC, NetworkPolicies, and SecurityContextConstraints (SCCs).

· Manage Operators, Helm charts, and Kustomize overlays for workload deployment on ROSA.

· Ensure cluster hardening against CIS benchmarks and organizational security policies.

5. CI/CD Pipelines

· Design and maintain CI/CD pipelines (GitLab CI, Azure DevOps Service) for infrastructure and application delivery.

· Implement GitOps workflows using ArgoCD for declarative, auditable deployments to OpenShift ROSA.

· Integrate security scanning tooling (SAST, container scanning, dependency auditing) into pipeline gates.

· Champion shift-left testing principles, ensuring infrastructure changes are validated before promotion to production.

· Maintain pipeline-as-code standards with versioned, peer-reviewed pipeline definitions.

6. Security & Compliance

· Serve as a key contributor to the team's security posture, embedding security controls throughout the infrastructure and CI/CD lifecycle.

· Implement secrets management solutions (AWS Secrets Manager) and enforce least-privilege access.

· Support vulnerability management processes by triaging findings from infrastructure and container scanning tools.

· Participate in incident response and post-mortem processes, ensuring remediation actions are tracked and resolved.

7. Observability & Reliability

· Build and maintain end-to-end observability solutions using AWS CloudWatch.

· Define and track SLOs and SLIs for critical platform services and workloads.

· Lead on-call incident response for platform-level issues, conducting RCAs and driving permanent fixes.

· Produce and maintain runbooks and architectural decision records (ADRs).