- Home
- Jobs
- Engineering
- Staff Platform Engineer, Voice AI

Staff Platform Engineer, Voice AI at Together AI
San FranciscoFull-timeEngineeringPosted 23 days ago
Apply with PipelineAbout the Role
<p> </p>
<h3><strong>About the Role</strong></h3>
<p><span style="font-size: 12pt;">Together AI is defining the infrastructure layer for the next generation of voice applications. Our Voice AI platform powers production-grade, real-time voice agents at scale — and we're looking for a Staff Platform Engineer to own the architecture that makes it possible.</span></p>
<p><span style="font-size: 12pt;">This isn't a role about maintaining what exists. You'll set the technical direction for how developers interact with Together's voice platform — from the real-time API primitives they build on, to the autoscaling systems that keep latency SLOs intact under unpredictable load, to the multi-provider abstraction layer that makes our platform uniquely powerful. Voice infrastructure is categorically harder than text inference: bidirectional audio streams, stateful long-lived connections, millisecond latency requirements, and complex multi-model routing don't forgive architectural shortcuts. You'll bring the judgment to get this right the first time, at scale.</span></p>
<p><span style="font-size: 12pt;">This is a foundational hire on a small, high-conviction team. The decisions you make in this role will define the platform architecture for years.</span></p>
<h3><strong>Responsibilities</strong></h3>
<ul>
<li><span style="font-size: 12pt;">Own the architecture and reliability of Together's real-time API layer — set the technical direction for WebSocket and HTTP streaming APIs powering STT and TTS at scale; establish the reliability bar (connection lifecycle, backpressure, graceful degradation, reconnection) that production voice agents — contact centers, AI agents, communication platforms — depend on.</span></li>
<li><span style="font-size: 12pt;">Lead autoscaling architecture for latency-sensitive voice workloads — design and ship orchestration systems that handle bursty, real-time traffic across tens of thousands of GPUs; solve the hard problems at the intersection of concurrent connection limits, streaming state, and hard latency ceilings that generic autoscalers weren't built for.</span></li>
<li><span style="font-size: 12pt;">Define the voice API feature surface — make the architectural calls on word-level alignment, real-time speaker diarization, audio format support (g711/mulaw, PCM, WebRTC), pronunciation controls, and multi-context WebSocket — with a clear view of what unlocks the next category of developer use cases.</span></li>
<li><span style="font-size: 12pt;">Build the observability platform for voice infrastructure — design the latency breakdown pipelines, audio quality signal collection, and customer-facing dashboards that give both the team and developers the instrumentation they need to operate at production quality; make debugging voice issues fast and systematic.</span></li>
<li><span style="font-size: 12pt;">Own the multi-provider abstraction layer — architect the normalization layer across model partners (Cartesia, Deepgram, Rime, and others) that delivers consistent, provider-agnostic API behavior; your design should absorb upstream variability without exposing it to developers.</span></li>
<li><span style="font-size: 12pt;">Drive the interface between API and ML serving — partner closely with ML engineering leadership to define the contract between the API layer and the model serving stack; your decisions here have direct impact on end-to-end latency and reliability SLAs.</span></li>
<li><span style="font-size: 12pt;">Raise the bar for developer experience across the platform — lead API design reviews, shape documentation strategy, define integration patterns and cookbooks; the voice developer experience should be something the industry references, not just adequate.</span></li>
<li><span style="font-size: 12pt;">Architect for the product surface that doesn't exist yet — build systems with the foresight that they become the foundation for multiple new voice products; your platform decisions should expand what's possible, not constrain it.</span></li>
</ul>
<h3><strong>Requirements</strong></h3>
<ul>
<li><span style="font-size: 12pt; font-family: helvetica, arial, sans-serif;">8+ years of experience building large-scale, real-time distributed systems — with clear ownership of systems that carried production traffic at meaningful scale; you can speak to the architectural decisions you made and defend the tradeoffs.</span></li>
<li><span style="font-size: 12pt; font-family: helvetica, arial, sans-serif;">Deep, battle-tested expertise in real-time streaming infrastructure — WebSocket server architecture, SSE, bidirectional streaming, connection multiplexing, stateful protocol design — you've debugged production failures in these systems and come out with durable architectural improvements.</span></li>
<li><span style="font-size: 12pt; font-family: helvetica, arial, sans-serif;">Expert-level TypeScript and Python, with strong proficiency in systems-level thinking; Rust experience is a meaningful advantage at this level given where voice infrastructure is heading.</span></li>
<li><span style="font-size: 12pt; font-family: helvetica, arial, sans-serif;">Senior distributed systems judgment — load balancing, autoscaling, rate limiting, and traffic shaping for latency-sensitive workloads aren't concepts you reference, they're problems you've solved under pressure.</span></li>
<li><span style="font-size: 12pt; font-family: helvetica, arial, sans-serif;">Deep Kubernetes expertise — custom autoscalers, resource management, and health checking for stateful, streaming services; you've built Kubernetes automation that handled edge cases the off-the-shelf tooling couldn't.</span></li>
<li><span style="font-size: 12pt; font-family: helvetica, arial, sans-serif;">Strong technical leadership — you set direction, influence across teams without authority, bring clarity to ambiguous problems, and leave systems and teams meaningfully better than you found them.</span></li>
<li><span style="font-size: 12pt; font-family: helvetica, arial, sans-serif;">Sharp product intuition for developer platforms — you have genuine opinions about API ergonomics, you think from the developer's perspective first, and you've shipped tooling that developers actually praised.</span></li>
<li><span style="font-size: 12pt; font-family: helvetica, arial, sans-serif;">Proven ability to operate with autonomy on high-ambiguity, high-stakes problems — you define the right problem before optimizing the solution, and you've done it on teams where the roadmap wasn't handed to you.</span></li>
<li><span style="font-size: 12pt; font-family: helvetica, arial, sans-serif;">Experience with audio and media protocols (WebRTC, g711, PCM encoding) is strongly preferred at this level — the domain specificity matters.</span></li>
<li><span style="font-size: 12pt; font-family: helvetica, arial, sans-serif;">Familiarity with ML model serving infrastructure and how inference engines work is a significant advantage — you'll be a key partner to the ML engineering side of the team.</span></li>
<li><span style="font-size: 12pt; font-family: helvetica, arial, sans-serif;">Full-stack experience (React, Next.js) for developer-facing tooling contributions is a plus.</span></li>
<li><span style="font-size: 12pt; font-family: helvetica, arial, sans-serif;">Bachelor's or Master's in Computer Science, Computer Engineering, or related field — or equivalent depth demonstrated through your work.</span></li>
</ul>
<p> </p>
<h3><strong>About Together AI</strong></h3>
<p>Together AI is a research-driven artificial intelligence company. We believe open and transparent AI systems will drive innovation and create the best outcomes for society, and together we are on a mission to significantly lower the cost of modern AI systems by co-designing software, hardware, algorithms, and models. We have contributed to leading open-source research, models, and datasets to advance the frontier of AI, and our team has been behind technological advancement such as FlashAttention, Hyena, FlexGen, and RedPajama. We invite you to join a passionate group of researchers and engineers in our journey in building the next generation AI infrastructure.</p>
<h3><strong>Compensation</strong></h3>
<p>We offer competitive compensation, startup equity, health insurance and other competitive benefits. The US base salary range for this full-time position is: $220,000 - $280,000 + equity + benefits. Our salary ranges are determined by location, level and role. Individual compensation will be determined by experience, skills, and job-related knowledge.</p>
<h3><strong>Equal Opportunity</strong></h3>
<p>Together AI is an Equal Opportunity Employer and is proud to offer equal employment opportunity to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and more.</p>
<p>Please see our privacy policy at<a href="https://www.together.ai/privacy"> https://www.together.ai/privacy</a> </p>
Related Roles
Staff Engineer, Distributed Storage and HPC & AI Infrastructure
Together AI
San FranciscoLead/Manager Together Cloud Infrastructure
Together AI
AmsterdamStaff Machine Learning Engineer, Voice AI
Together AI
San FranciscoAI Infrastructure Engineer
Together AI
San FranciscoStaff Engineer, Customer Insights
Together AI
San FranciscoEngineering Manager / Tech Lead
Together AI
Amsterdam