Pipeline
Browse Jobs
Sign inSign up
Pipeline
Browse jobsSign inContactTermsPrivacyCookiesPreferences
Logos provided by Logo.dev

© 2026 Pipeline. All rights reserved.

  1. Home
  2. Jobs
  3. Customer Success
  4. SRE Lead – DBaaS Platform
Tessell logo

Tessell

SRE Lead – DBaaS Platform at Tessell

HyderabadFull-timeCustomer SuccessPosted 3 months ago
Apply with Pipeline→

About the Role

<p>Job Title: SRE Lead – DBaaS Platform<br>Role Overview<br>We are seeking an experienced Site Reliability Engineering (SRE) Lead to strengthen<br>production reliability ownership for our Database-as-a-Service (DBaaS) platform. This role<br>will bring hyperscaler-grade (RDS-level) operational expertise to drive deep product<br>debugging, reliability engineering, and Dev collaboration across cloud-native database<br>services.<br>The SRE Lead will own platform stability, availability, performance, and incident excellence<br>across Azure/AWS/GCP-hosted database workloads.<br>Location :- Hyderabad<br>Department :- Customer Success<br>Reporting :- Senior Director Customer Success/SRE</p> <p>Key Responsibilities<br>1. Production Reliability Ownership<br> Own end-to-end reliability, availability, and performance of the DBaaS platform.<br> Define and enforce SLIs, SLOs, and SLAs across all supported database engines.<br> Lead production incident response (P1/P2), RCAs, and long-term resilience<br>improvements.<br> Drive error budget governance with Engineering and Product teams.<br>2. Hyperscaler-Level Operational Excellence<br> Bring RDS/Cloud SQL/Azure SQL Managed Instance operational patterns into the<br>platform.<br> Implement automation-first operations (self-healing, auto-remediation, failover<br>orchestration).<br> Standardize HA/DR architectures across multi-region deployments.<br> Improve backup reliability, replication integrity, and failover predictability.<br>3. Deep Product Debugging &amp;amp; Dev Collaboration<br> Partner with Product Engineering for deep database engine-level debugging.<br> Troubleshoot complex performance bottlenecks (IO, CPU, locking, replication lag).</p> <p> Support root cause analysis involving cloud infrastructure, storage, networking, and<br>database internals.<br> Influence platform architecture for operability and reliability.<br>4. Observability &amp;amp; Reliability Engineering<br> Build unified observability across DBaaS (metrics, logs, traces).<br> Define golden signals for database reliability.<br> Improve proactive anomaly detection and capacity forecasting.<br> Drive chaos testing and resilience validation practices.<br>5. Automation &amp;amp; Platform Hardening<br> Lead reliability automation (runbooks → code).<br> Improve provisioning, patching, upgrade, and scaling reliability.<br> Standardize configuration management and drift detection.<br> Enhance security posture aligned to enterprise compliance needs.<br>6. DevOps &amp;amp; Platform Governance<br> Champion SRE best practices across engineering teams.<br> Establish production readiness review frameworks.<br> Define release reliability gates for DBaaS components.<br> Mentor junior SREs and build a reliability-first culture.</p> <p>Technical Requirements<br>Cloud Platforms (Mandatory – Multi-Cloud Preferred)<br> Deep hands-on experience with:<br>o AWS RDS / Aurora<br>o Azure SQL MI / Azure Database Services<br>o GCP Cloud SQL / AlloyDB<br> Strong understanding of cloud networking, storage, IAM, HA architectures.<br>Database Expertise<br> Strong operational knowledge of:<br>o Oracle<br>o PostgreSQL<br>o MySQL<br>o SQL Server<br> Experience handling large-scale production databases (TB+ workloads).<br> Performance tuning, replication troubleshooting, and backup recovery validation.<br>SRE &amp;amp; Platform Skills</p> <p> Strong scripting: Python / Bash / Go.<br> Infrastructure as Code (Terraform / ARM / CloudFormation).<br> CI/CD pipelines and release automation.<br> Observability stack (Prometheus, Grafana, ELK, Datadog, etc.).<br> Kubernetes exposure preferred.</p> <p>Leadership Expectations<br> 10+ years overall experience, 5+ in SRE/Platform roles.<br> Prior experience in hyperscaler environments or cloud-native SaaS products.<br> Strong incident leadership and executive communication skills.<br> Ability to influence cross-functional stakeholders.<br> Experience building and leading SRE teams preferred.</p> <p>Success Metrics (First 12 Months)<br> Reduction in P1/P2 incidents by X%.<br> Improved MTTR by X%.<br> Defined SLO framework implemented across all DBaaS services.<br> Automation coverage &amp;gt;70% of repeat operational tasks.<br> Zero critical audit non-compliance findings.</p> <p>Why Join Us<br> Opportunity to build hyperscaler-grade DBaaS reliability.<br> Direct impact on mission-critical enterprise workloads.<br> Multi-cloud platform engineering exposure.<br> High visibility role working with Product, Engineering, and Leadership.</p> <p>&nbsp;</p>

Related Roles

  • Solution Architect – Multi-Cloud DBaaS Platform (Azure / AWS / GCP)

    Tessell

    Bangalore
  • DBRE Lead – Multi- Cloud DBaaS Platform

    Tessell

    Hyderabad
  • Software Development Engineer (SDE 3)

    Tessell

    Bangalore
  • Staff / Sr. Staff Software Engineer (Backend)

    Tessell

    San Francisco Bay Area, California, United States
  • Enterprise Account Executive – UKI

    Tessell

    London
  • Senior Product Designer

    Tessell

    San Francisco Bay Area, California, United States