Pipeline
Browse Jobs
Sign inSign up
Pipeline
Browse jobsSign inContactTermsPrivacyCookiesPreferences
Logos provided by Logo.dev

© 2026 Pipeline. All rights reserved.

  1. Home
  2. Jobs
  3. Infrastructure
  4. Site Reliability Engineer
Kaseya logo

Kaseya

Site Reliability Engineer at Kaseya

Markham, OntarioFull-timeInfrastructurePosted 24 days ago
Apply with Pipeline→

About the Role

<div class="content-intro"><p><strong>About Kaseya</strong></p> <p data-start="226" data-end="568">Kaseya is the leading provider of AI-powered IT management and cybersecurity software, serving Managed Service Providers (MSPs) and internal IT organizations worldwide. Our comprehensive platform helps organizations efficiently manage, secure, and automate their IT environments, driving operational efficiency and long-term business success.</p> <p data-start="570" data-end="898">Backed by <a class="decorated-link" href="https://www.insightpartners.com?utm_source=chatgpt.com" target="_new" data-start="580" data-end="654">Insight Partners</a>, a leading global software investor, Kaseya has experienced sustained double-digit growth and continues to expand its global footprint. Today, Kaseya supports customers in more than 20 countries and manages over 15 million endpoints worldwide.</p> <p data-start="900" data-end="1191">Founded in 2000, Kaseya has built a culture centered around innovation, accountability, and results. We are a high-growth, high-performance organization that values individuals who are driven, adaptable, and committed to delivering exceptional outcomes for our customers and teammates alike.</p> <p data-start="1193" data-end="1468">At Kaseya, success comes from embracing challenges, moving with urgency, and continuously raising the bar.&nbsp;</p></div><p>&nbsp;</p> <p class="font-claude-response-body break-words whitespace-normal leading-[1.7]">Kaseya is hiring a Site Reliability Engineer to keep our production systems healthy as we scale. You'll own the reliability of services that thousands of MSPs depend on every day. That means defining the SLOs we hold ourselves to, leading incidents when they happen, and building the automation that keeps things stable as we ship. The work is hands on, the on call rotation is real, and the environment runs heavily on AWS. If you treat reliability as a product instead of a chore, you'll fit in well here.</p> <p class="font-claude-response-body break-words whitespace-normal leading-[1.7]"><strong>What You'll Do</strong></p> <ul class="[li_&amp;]:mb-0 [li_&amp;]:mt-1 [li_&amp;]:gap-1 [&amp;:not(:last-child)_ul]:pb-1 [&amp;:not(:last-child)_ol]:pb-1 list-disc flex flex-col gap-1 pl-8 mb-3"> <li class="font-claude-response-body whitespace-normal break-words pl-2">Set, monitor, and enforce SLOs, SLIs, and error budgets that keep our systems reliable</li> <li class="font-claude-response-body whitespace-normal break-words pl-2">Lead incident response, troubleshooting, and blameless postmortems that produce real fixes</li> <li class="font-claude-response-body whitespace-normal break-words pl-2">Build and maintain automated deployment, configuration management, and infrastructure provisioning using Infrastructure as Code</li> <li class="font-claude-response-body whitespace-normal break-words pl-2">Manage cloud and hybrid infrastructure with Terraform or CloudFormation, balancing cost, scalability, and resilience</li> <li class="font-claude-response-body whitespace-normal break-words pl-2">Improve observability across systems through proactive monitoring, alerting, and dashboards that surface issues early</li> <li class="font-claude-response-body whitespace-normal break-words pl-2">Partner with development teams to bake reliability into the SDLC, including deployment automation, capacity planning, and chaos engineering</li> <li class="font-claude-response-body whitespace-normal break-words pl-2">Cut operational toil through automation, systems that recover themselves, and engineering solutions that scale</li> <li class="font-claude-response-body whitespace-normal break-words pl-2">Support containerized and serverless workloads so they stay highly available and fault tolerant in production</li> <li class="font-claude-response-body whitespace-normal break-words pl-2">Stay current on SRE, cloud, and observability practices and bring what works back to the team</li> </ul> <p class="font-claude-response-body break-words whitespace-normal leading-[1.7]"><strong>Required Qualifications</strong></p> <ul class="[li_&amp;]:mb-0 [li_&amp;]:mt-1 [li_&amp;]:gap-1 [&amp;:not(:last-child)_ul]:pb-1 [&amp;:not(:last-child)_ol]:pb-1 list-disc flex flex-col gap-1 pl-8 mb-3"> <li class="font-claude-response-body whitespace-normal break-words pl-2">4 to 5 years of AWS production experience</li> <li class="font-claude-response-body whitespace-normal break-words pl-2">IaC ownership with Terraform or CloudFormation, including state management</li> <li class="font-claude-response-body whitespace-normal break-words pl-2">AWS ECS production experience (or strong Kubernetes background willing to ramp)</li> <li class="font-claude-response-body whitespace-normal break-words pl-2">Active on call rotation with incidents led and postmortems written</li> <li class="font-claude-response-body whitespace-normal break-words pl-2">Working fluency with SLOs, SLIs, and error budgets in production</li> </ul> <p class="font-claude-response-body break-words whitespace-normal leading-[1.7]"><strong>Preferred Qualifications</strong></p> <ul class="[li_&amp;]:mb-0 [li_&amp;]:mt-1 [li_&amp;]:gap-1 [&amp;:not(:last-child)_ul]:pb-1 [&amp;:not(:last-child)_ol]:pb-1 list-disc flex flex-col gap-1 pl-8 mb-3"> <li class="font-claude-response-body whitespace-normal break-words pl-2">Kubernetes production experience</li> <li class="font-claude-response-body whitespace-normal break-words pl-2">Broader observability tooling (Datadog, Dynatrace, CloudWatch, Elasticsearch/Kibana)</li> <li class="font-claude-response-body whitespace-normal break-words pl-2">Chaos engineering</li> <li class="font-claude-response-body whitespace-normal break-words pl-2">AWS Lambda or serverless workloads</li> <li class="font-claude-response-body whitespace-normal break-words pl-2">Ansible, Chef, or Puppet</li> <li class="font-claude-response-body whitespace-normal break-words pl-2">DevSecOps work (vulnerability scanning, secrets management, SOC2 or ISO 27001)</li> <li class="font-claude-response-body whitespace-normal break-words pl-2">Production database support (RDS, PostgreSQL, MySQL)</li> <li class="font-claude-response-body whitespace-normal break-words pl-2">Open source contributions or public technical portfolio</li> </ul> <p>&nbsp;</p> <p class="font-claude-response-body break-words whitespace-normal leading-[1.7]">The expected annual base salary for this role is CAD $115,000 to CAD $130,000. Final offer will depend on experience, skills, and internal equity. This posting is for an existing vacancy.</p> <p class="font-claude-response-body break-words whitespace-normal leading-[1.7]">&nbsp;</p> <p>&nbsp;</p> <p>&nbsp;</p><div class="content-conclusion"><p><strong>Additional information</strong><br><em>Kaseya provides equal employment opportunity to all employees and applicants without regard to race, religion, age, ancestry, gender, sex, sexual orientation, national origin, citizenship status, physical or mental disability, veteran status, marital status, or any other characteristic protected by applicable law.</em></p></div>

Related Roles

  • Data Center Technician II

    Kaseya

    Pennsylvania, US
  • Staff Network Engineer

    Kaseya

    United States - RemoteRemote
  • Sr. Network Security Engineer

    Kaseya

    United States - RemoteRemote
  • Sr. Network Engineer

    Kaseya

    United States - RemoteRemote
  • Senior Account Manager, Mid-Market Accounts

    Kaseya

    Miami, FL
  • Product Manager

    Kaseya

    Red Bank, NJ