Pipeline
Browse Jobs
Sign inSign up
Pipeline
Browse jobsSign inContactTermsPrivacyCookiesPreferences
Logos provided by Logo.dev

© 2026 Pipeline. All rights reserved.

  1. Home
  2. Jobs
  3. Coupang Intelligent Cloud (CIC)
  4. Director - Backend Engineering - AI Infra
Coupang logo

Coupang

Director - Backend Engineering - AI Infra at Coupang

BengaluruFull-timeCoupang Intelligent Cloud (CIC)Posted 6 days ago
Apply with Pipeline→

About the Role

<p></p> <p>Job Description:&nbsp;<strong>Director of Backend Engineering (AI Infrastructure)</strong></p> <p>&nbsp;</p> <p><strong>Company Introduction</strong></p> <p>We exist to wow our customers. We know we’re doing the right thing when we hear our customers say, “How did I ever live without Coupang?” Born out of an obsession to make shopping, eating, and living easier than ever, we are collectively disrupting the multi-billion-dollar commerce industry from the ground up and establishing an unparalleled reputation for being leading and reliable force in South Korean commerce.</p> <p>We are proud to have the best of both worlds — a startup culture with the resources of a large global public company. This fuels us to continue our growth and launch new services at the speed we have been since our inception. We are all entrepreneurs surrounded by opportunities to drive new initiatives and innovations. At our core, we are bold and ambitious people that like to get our hands dirty and make a hands-on impact. At Coupang, you will see yourself, your colleagues, your team, and the company grow every day.</p> <p>&nbsp;</p> <p><strong>Role Overview</strong></p> <p>We are seeking a visionary Director of Backend Engineering to lead the teams responsible for the software "brain" that manages our global AI Physical Infrastructure. You will oversee the development of the SDN orchestrators, automated fleet management systems, and the high-performance storage backends that power our AI training and inference clusters.</p> <p>Your mission is to abstract the complexity of specialized hardware (NVIDIA/HPC) into a seamless, automated, and hyper-reliable cloud platform.</p> <p>&nbsp;</p> <p>&nbsp;</p> <p><strong>Key Responsibilities</strong></p> <p>&nbsp;</p> <p>1. Strategic Leadership &amp; Fleet Orchestration</p> <ul> <li><strong>Software-Defined Infrastructure:</strong>&nbsp;Lead the design and delivery of an&nbsp;<strong>SDN Orchestrator</strong>&nbsp;to automate complex GPU networking (InfiniBand/RoCE/NVLink) and core DC routing.</li> <li><strong>Fleet Health Automation:</strong>&nbsp;Oversee the development of backend services for&nbsp;<strong>GPU Health &amp; Fault Detection</strong>, automating the lifecycle from burn-in and diagnostics to global RMA workflows.</li> <li><strong>Capacity &amp; Traffic Engineering:</strong>&nbsp;Drive the backend logic for global traffic routing, load balancing (NGINX/Kong), and IPAM to ensure zero-bottleneck training environments.</li> </ul> <p>2. Data &amp; Storage Systems</p> <ul> <li><strong>HPC Data Pipelines:</strong>&nbsp;Collaborate with storage engineers to build backend interfaces for&nbsp;<strong>Parallel File Systems (Lustre, Weka, VAST etc.)</strong>, ensuring high-throughput data delivery to compute nodes.</li> <li><strong>Storage Durability:</strong>&nbsp;Direct the backend strategy for AI Object Storage, focusing on high durability and low-latency retrieval for massive datasets.</li> </ul> <p>3. Engineering Excellence</p> <ul> <li><strong>Scalable Architecture:</strong>&nbsp;Act as the final technical authority for AI Infra Architecture, ensuring systems are resilient, multi-region, and capable of sub-millisecond coordination.</li> <li><strong>DevOps &amp; IaC Culture:</strong>&nbsp;Champion a "Hardware-as-Code" mindset, utilizing&nbsp;<strong>Python, Ansible, and Terraform</strong>&nbsp;to eliminate manual intervention in DC operations.</li> </ul> <p>4. Team Development</p> <ul> <li>Lead a multi-disciplinary org including Backend Developers, SDN Engineers, and Infra Ops teams, AI Infra Engineering</li> <li>Establish 24/7 L1/L2/L3 operational standards to maintain &gt; 99.99% availability of the AI fleet.</li> </ul> <p>&nbsp;</p> <p><strong>Required Qualifications</strong></p> <ul> <li><strong>Experience:</strong>&nbsp;15+ years in Backend Engineering, with at least 5 years in a leadership role managing complex infrastructure (Cloud, FinTech, or HPC).</li> <li><strong>Deep Infrastructure Knowledge:</strong>&nbsp;Proven experience with&nbsp;<strong>Linux internals</strong>, hardware-software interfaces (drivers/firmware), and distributed systems.</li> <li><strong>Networking Mastery:</strong>&nbsp;Solid understanding of&nbsp;<strong>L2/L3 networking</strong>, and ideally, specialized fabrics like&nbsp;<strong>InfiniBand or RoCE</strong>.</li> <li><strong>The Stack:</strong>&nbsp;Professional proficiency in&nbsp;<strong>Python, Go, or C++</strong>, and deep experience with&nbsp;<strong>Terraform, Kubernetes, and Ansible</strong>.</li> <li><strong>Large-Scale Data:</strong>&nbsp;Experience managing high-performance storage backends (GPFS, Lustre, or equivalent parallel systems).</li> <li><strong>Hardware Savvy:</strong>&nbsp;You don't just write code; you understand power envelopes, liquid cooling constraints, and GPU architecture (NVIDIA/HPE/Dell).</li> </ul> <p>&nbsp;</p> <p><strong>Preferred Skills</strong></p> <ul> <li>Experience building custom&nbsp;<strong>SDN controllers</strong>&nbsp;or orchestration layers from scratch.</li> <li>Direct experience with&nbsp;<strong>NVIDIA&nbsp;</strong>or&nbsp;<strong>GPUDirect</strong>&nbsp;technologies.</li> <li>Previous success in a "Hyper-scale" environment (AWS, Azure, GCP, Meta, AI Cloouds etc.).</li> </ul> <p>&nbsp;</p> <p><strong>Recruitment Process</strong></p> <ul> <li>Application Review - Phone Interview - Onsite (or Virtual Onsite) Interview – Offer</li> </ul> <p>The exact nature of the recruitment process may vary according to the specific job and may be changed due to scheduling or other circumstances.</p> <ul> <li>Interview schedules and the results will be informed to the applicant via the e-mail address submitted at the application stage.</li> </ul> <p>&nbsp;</p> <p><strong>Details to Consider</strong></p> <ul> <li>This job posting may be closed prior to the stated end date for application if all openings are filled.</li> <li>Coupang has the right to rescind an offer of employment if a candidate is found to have submitted false information as part of the application process.</li> <li>Those eligible for employment protection (recipients of veteran’s benefits, the disabled, etc.) may receive preferential treatment for employment in accordance with applicable laws.</li> </ul> <p>&nbsp;</p> <p><strong>Privacy Notice</strong></p> <p>Your personal information will be collected and managed by Coupang as stated in the Application Privacy Notice located below.</p> <p>&nbsp;</p> <p>https://privacy.coupang.com/en/land/jobs/</p> <p></p>

Related Roles

  • Sr. Staff Observability Engineer (GPU Cloud & Telemetry Platform)

    Coupang

    Seoul, South Korea
  • Senior Staff System Engineer, GPU Fleet

    Coupang

    Bengaluru
  • Group Product Manager, Compute Platform

    Coupang

    Mountain View, USA
  • Staff Backend Engineer - IAM

    Coupang

    Bengaluru
  • Group Product Manager, Compute Platform

    Coupang

    Seattle, USA
  • Senior Staff Back-end Engineer

    Coupang

    Bengaluru; Mountain View, USA; Seattle, USA