- Home
- Jobs
- Customer Success
- SRE Lead – DBaaS Platform

SRE Lead – DBaaS Platform at Tessell
HyderabadFull-timeCustomer SuccessPosted 3 months ago
Apply with PipelineAbout the Role
<p>Job Title: SRE Lead – DBaaS Platform<br>Role Overview<br>We are seeking an experienced Site Reliability Engineering (SRE) Lead to strengthen<br>production reliability ownership for our Database-as-a-Service (DBaaS) platform. This role<br>will bring hyperscaler-grade (RDS-level) operational expertise to drive deep product<br>debugging, reliability engineering, and Dev collaboration across cloud-native database<br>services.<br>The SRE Lead will own platform stability, availability, performance, and incident excellence<br>across Azure/AWS/GCP-hosted database workloads.<br>Location :- Hyderabad<br>Department :- Customer Success<br>Reporting :- Senior Director Customer Success/SRE</p>
<p>Key Responsibilities<br>1. Production Reliability Ownership<br> Own end-to-end reliability, availability, and performance of the DBaaS platform.<br> Define and enforce SLIs, SLOs, and SLAs across all supported database engines.<br> Lead production incident response (P1/P2), RCAs, and long-term resilience<br>improvements.<br> Drive error budget governance with Engineering and Product teams.<br>2. Hyperscaler-Level Operational Excellence<br> Bring RDS/Cloud SQL/Azure SQL Managed Instance operational patterns into the<br>platform.<br> Implement automation-first operations (self-healing, auto-remediation, failover<br>orchestration).<br> Standardize HA/DR architectures across multi-region deployments.<br> Improve backup reliability, replication integrity, and failover predictability.<br>3. Deep Product Debugging &amp; Dev Collaboration<br> Partner with Product Engineering for deep database engine-level debugging.<br> Troubleshoot complex performance bottlenecks (IO, CPU, locking, replication lag).</p>
<p> Support root cause analysis involving cloud infrastructure, storage, networking, and<br>database internals.<br> Influence platform architecture for operability and reliability.<br>4. Observability &amp; Reliability Engineering<br> Build unified observability across DBaaS (metrics, logs, traces).<br> Define golden signals for database reliability.<br> Improve proactive anomaly detection and capacity forecasting.<br> Drive chaos testing and resilience validation practices.<br>5. Automation &amp; Platform Hardening<br> Lead reliability automation (runbooks → code).<br> Improve provisioning, patching, upgrade, and scaling reliability.<br> Standardize configuration management and drift detection.<br> Enhance security posture aligned to enterprise compliance needs.<br>6. DevOps &amp; Platform Governance<br> Champion SRE best practices across engineering teams.<br> Establish production readiness review frameworks.<br> Define release reliability gates for DBaaS components.<br> Mentor junior SREs and build a reliability-first culture.</p>
<p>Technical Requirements<br>Cloud Platforms (Mandatory – Multi-Cloud Preferred)<br> Deep hands-on experience with:<br>o AWS RDS / Aurora<br>o Azure SQL MI / Azure Database Services<br>o GCP Cloud SQL / AlloyDB<br> Strong understanding of cloud networking, storage, IAM, HA architectures.<br>Database Expertise<br> Strong operational knowledge of:<br>o Oracle<br>o PostgreSQL<br>o MySQL<br>o SQL Server<br> Experience handling large-scale production databases (TB+ workloads).<br> Performance tuning, replication troubleshooting, and backup recovery validation.<br>SRE &amp; Platform Skills</p>
<p> Strong scripting: Python / Bash / Go.<br> Infrastructure as Code (Terraform / ARM / CloudFormation).<br> CI/CD pipelines and release automation.<br> Observability stack (Prometheus, Grafana, ELK, Datadog, etc.).<br> Kubernetes exposure preferred.</p>
<p>Leadership Expectations<br> 10+ years overall experience, 5+ in SRE/Platform roles.<br> Prior experience in hyperscaler environments or cloud-native SaaS products.<br> Strong incident leadership and executive communication skills.<br> Ability to influence cross-functional stakeholders.<br> Experience building and leading SRE teams preferred.</p>
<p>Success Metrics (First 12 Months)<br> Reduction in P1/P2 incidents by X%.<br> Improved MTTR by X%.<br> Defined SLO framework implemented across all DBaaS services.<br> Automation coverage &gt;70% of repeat operational tasks.<br> Zero critical audit non-compliance findings.</p>
<p>Why Join Us<br> Opportunity to build hyperscaler-grade DBaaS reliability.<br> Direct impact on mission-critical enterprise workloads.<br> Multi-cloud platform engineering exposure.<br> High visibility role working with Product, Engineering, and Leadership.</p>
<p> </p>
Related Roles
Solution Architect – Multi-Cloud DBaaS Platform (Azure / AWS / GCP)
Tessell
BangaloreDBRE Lead – Multi- Cloud DBaaS Platform
Tessell
HyderabadSoftware Development Engineer (SDE 3)
Tessell
BangaloreStaff / Sr. Staff Software Engineer (Backend)
Tessell
San Francisco Bay Area, California, United StatesEnterprise Account Executive – UKI
Tessell
LondonSenior Product Designer
Tessell
San Francisco Bay Area, California, United States