RQ10729 - DevOPS/Cloud Engineer - Senior

  • Toronto, ON, Canada
  • -
  • On-Site

Job Description:

Security Level: CRJMC

Must Have:

  • Design, provision, and manage AWS infrastructure including VPCs, subnets, security groups, IAM policies, EC2, ECS, EKS, RDS, S3, Route 53, and CloudFront.
  • Architect multi-account AWS environments following AWS Well-Architected Framework principles.
  • Manage AWS cost optimization strategies including Reserved Instances, Savings Plans, and rightsizing.
  • Develop, maintain, and refactor Terraform modules and configurations for all cloud infrastructure.
  • Author and maintain Ansible playbooks, roles, and collections for server configuration, application deployment, and compliance enforcement.
  • Operate and administer Red Hat OpenShift Service on AWS (ROSA) clusters, including cluster upgrades, node scaling, and add-on management.
  • Design and maintain CI/CD pipelines (GitLab CI, Azure DevOps Service) for infrastructure and application delivery.

Experience and Skill Set Requirements

1. Cloud Infrastructure & AWS

· Design, provision, and manage AWS infrastructure including VPCs, subnets, security groups, IAM policies, EC2, ECS, EKS, RDS, S3, Route 53, and CloudFront.

· Architect multi-account AWS environments following AWS Well-Architected Framework principles.

· Manage AWS cost optimization strategies including Reserved Instances, Savings Plans, and rightsizing.

· Implement and maintain CloudTrail, Config, GuardDuty, Security Hub, and AWS Organizations SCPs.

2. Infrastructure as Code — Terraform/Terraform Cloud

· Develop, maintain, and refactor Terraform modules and configurations for all cloud infrastructure.

· Manage Terraform Cloud workspaces, remote state backends, variable sets, and team access policies.

· Enforce IaC standards including module versioning, input/output conventions, and documentation.

· Implement drift detection and remediation workflows using Terraform Cloud run tasks and policy-as-code (Sentinel or OPA).

· Lead Terraform code review processes and mentor junior team members on best practices.

3. Configuration Management — Ansible

· Author and maintain Ansible playbooks, roles, and collections for server configuration, application deployment, and compliance enforcement.

· Manage Ansible inventories across dynamic cloud environments using AWS dynamic inventory plugins.

· Integrate Ansible automation with CI/CD pipelines for repeatable and auditable deployments.

· Use Ansible Vault for secrets management and always ensure secure handling of credentials.

· Develop idempotent, well-tested automation that reduces manual toil and configuration drift.

4. Container Platform — OpenShift ROSA

· Operate and administer Red Hat OpenShift Service on AWS (ROSA) clusters, including cluster upgrades, node scaling, and add-on management.

· Define and enforce OpenShift RBAC, NetworkPolicies, and SecurityContextConstraints (SCCs).

· Manage Operators, Helm charts, and Kustomize overlays for workload deployment on ROSA.

· Ensure cluster hardening against CIS benchmarks and organizational security policies.

5. CI/CD Pipelines

· Design and maintain CI/CD pipelines (GitLab CI, Azure DevOps Service) for infrastructure and application delivery.

· Implement GitOps workflows using ArgoCD for declarative, auditable deployments to OpenShift ROSA.

· Integrate security scanning tooling (SAST, container scanning, dependency auditing) into pipeline gates.

· Champion shift-left testing principles, ensuring infrastructure changes are validated before promotion to production.

· Maintain pipeline-as-code standards with versioned, peer-reviewed pipeline definitions.

6. Security & Compliance

· Serve as a key contributor to the team's security posture, embedding security controls throughout the infrastructure and CI/CD lifecycle.

· Implement secrets management solutions (AWS Secrets Manager) and enforce least-privilege access.

· Support vulnerability management processes by triaging findings from infrastructure and container scanning tools.

· Participate in incident response and post-mortem processes, ensuring remediation actions are tracked and resolved.

7. Observability & Reliability

· Build and maintain end-to-end observability solutions using AWS CloudWatch.

· Define and track SLOs and SLIs for critical platform services and workloads.

· Lead on-call incident response for platform-level issues, conducting RCAs and driving permanent fixes.

· Produce and maintain runbooks and architectural decision records (ADRs).