
Manager, Site Reliability Engineering at Navan
Tel-Aviv, IsraelFull-timeEngineeringPosted 1 day ago
About the Role
<p>At Navan, we’re committed to creating the best experience for business travelers, ensuring that our systems are always reliable, scalable, and efficient. As we continue to grow, we’re looking for a <strong>Site Reliability Engineering (SRE) Manager</strong> to join our team in headquarters based out of Palo Alto, California. In this role, you will lead a team of SREs, drive innovation in infrastructure design and automation, and ensure our systems run seamlessly at scale, serving thousands of travelers every day.</p>
<h3><strong>What You’ll Do</strong></h3>
<ul>
<li><strong>Lead & Mentor the SRE Team: </strong>Guide and develop a high-performing team of SREs, fostering a culture of collaboration, reliability, and continuous improvement.</li>
<li><strong>Drive Infrastructure Reliability & Automation:</strong> Collaborate with Engineering and Product teams to design and implement scalable, fault-tolerant systems. Leverage IaC tools (e.g., Terraform, CloudFormation) and microservices architectures to automate and improve infrastructure.</li>
<li><strong>Incident Management:</strong> Improve incident response processes, reduce MTTR, and proactively mitigate risks. Apply resiliency patterns to ensure systems are fault-tolerant and highly available.</li>
<li><strong>Define & Measure SLOs:</strong> Develop service-level objectives (SLOs) and KPIs to track and improve system reliability, using tools like NewRelic or DataDog for observability.</li>
<li><strong>24x7 Production Support:</strong> Ensure system availability in a 24x7 environment, applying expertise in AWS (e.g., ECS, Lambda, DynamoDB) and database management for optimal performance.</li>
<li><strong>Optimize CI/CD Pipelines:</strong> Automate and streamline deployment workflows using tools like Jenkins or GitHub Actions to ensure faster and more reliable deployments.</li>
<li><strong>Resource Management:</strong> Manage team resources, including capacity planning, hiring, and upskilling, to meet evolving business needs.</li>
</ul>
<h3><strong>What We’re Looking For</strong></h3>
<ul>
<li>8+ years in Site Reliability Engineering, DevOps, or Infrastructure roles, with at least 3 years in a leadership position.</li>
<li>Proven ability to lead and mentor teams, fostering a culture of collaboration and reliability.</li>
<li>Hands-on experience with AWS cloud technologies, Infrastructure as Code (Terraform/CloudFormation), microservices architectures, deployment automation (Jenkins/GitHub Actions), and observability tools (NewRelic/DataDog).</li>
<li>Strong background in designing scalable, fault-tolerant systems, improving incident response, and driving operational improvements.</li>
<li>Excellent interpersonal and communication skills, with the ability to work effectively across cross-functional teams.</li>
</ul>
Related Roles
Senior Site Reliability Engineer
Navan
Austin, TX; Dallas, TXEngineering Director, Specialty Travel
Navan
London, UKSite Reliability Engineer - 2
Navan
Palo Alto, CASenior Software Engineer - Developer Experience (DevEX)
Navan
Palo Alto, CASenior Product Growth
Navan
Tel-Aviv, IsraelSenior Backend Engineer
Navan
Tel-Aviv, Israel