Armada logo

Armada

Lead Software Engineer at Armada

Bangalore Office, AEDGE AICC India Pvt LtdFull-timeR&D - Platform EngineeringPosted 23 days ago

About the Role

<div class="content-intro"><p>&nbsp;</p> <p><strong>About the Company</strong></p> <div class="x_x_Paragraph x_x_SCXW198602660 x_x_BCX0">Armada is a full-stack edge infrastructure company delivering&nbsp;compute,&nbsp;connectivity, and sovereign AI/ML&nbsp;to&nbsp;some of&nbsp;the&nbsp;world’s most remote places. Named one of Fast Company's Most Innovative Companies, Armada’s solutions are deployed&nbsp;in over 60 countries globally&nbsp;for&nbsp;organizations ranging from energy&nbsp;to defense.&nbsp;</div> <div class="x_x_Paragraph x_x_SCXW198602660 x_x_BCX0">&nbsp;</div> <div class="x_x_Paragraph x_x_SCXW198602660 x_x_BCX0">With&nbsp;over&nbsp;$200&nbsp;million&nbsp;in funding,&nbsp;Armada is&nbsp;backed by top investors such as Microsoft (M12), Founders Fund,&nbsp;and&nbsp;has&nbsp;strategic partnerships&nbsp;including&nbsp;Starlink,&nbsp;Skydio, and NVIDIA.&nbsp;We are looking for&nbsp;the most brilliant minds&nbsp;in the world&nbsp;to&nbsp;join us.&nbsp;</div> <div class="x_x_Paragraph x_x_SCXW198602660 x_x_BCX0">&nbsp;</div> <div class="x_x_Paragraph x_x_SCXW198602660 x_x_BCX0">Working at Armada means taking ownership, driving autonomy, and delivering impact. You’ll tackle challenges that haven’t been solved before and help build something transformative from the ground up. What you do here will not only define your career but help further Armada’s mission to bridge the digital divide for customers around the world.&nbsp;</div> <p>&nbsp;</p></div><div> <h2 data-start="275" data-end="292"><span style="font-size: 10pt;">About the Role</span></h2> <p data-start="294" data-end="661"><span style="font-size: 10pt;">We are seeking a highly experienced <strong data-start="330" data-end="384">Lead Software Engineer / Lead AI Platform Engineer</strong> to architect and lead the development of our <strong data-start="430" data-end="468">GPU-as-a-Service (GPUaaS) platform</strong>. In this role, you will define the core abstractions that transform complex GPU fabrics, storage systems, and networking into a seamless, self-service experience for researchers and engineers.</span></p> <p data-start="663" data-end="1027"><span style="font-size: 10pt;">You will operate at the intersection of <strong data-start="703" data-end="772">distributed systems, Kubernetes internals, and GPU infrastructure</strong>, setting the technical direction of the platform while mentoring engineers and driving cross-functional collaboration. This role is ideal for leaders who enjoy hands-on architecture, deep technical ownership, and building infrastructure at massive scale.</span></p> <hr data-start="1029" data-end="1032"> <h2 data-start="1034" data-end="1074"><span style="font-size: 10pt;">What You’ll Do (Key Responsibilities)</span></h2> <h3 data-start="1076" data-end="1120"><span style="font-size: 10pt;">Architectural Strategy &amp; Platform Design</span></h3> <ul data-start="1121" data-end="1439"> <li style="font-size: 10pt;" data-start="1121" data-end="1227"> <p data-start="1123" data-end="1227"><span style="font-size: 10pt;">Lead the design of a <strong data-start="1144" data-end="1182">globally scalable AI control plane</strong> for GPU, storage, and network orchestration.</span></p> </li> <li style="font-size: 10pt;" data-start="1228" data-end="1349"> <p data-start="1230" data-end="1349"><span style="font-size: 10pt;">Define architectural patterns for <strong data-start="1264" data-end="1295">custom Kubernetes operators</strong> managing complex AI training and inference workloads.</span></p> </li> <li style="font-size: 10pt;" data-start="1350" data-end="1439"> <p data-start="1352" data-end="1439"><span style="font-size: 10pt;">Own the long-term <strong data-start="1370" data-end="1415">scalability, extensibility, and evolution</strong> of the GPUaaS platform.</span></p> </li> </ul> <h3 data-start="1441" data-end="1478"><span style="font-size: 10pt;">Systemic Multi-Tenancy &amp; Security</span></h3> <ul data-start="1479" data-end="1783"> <li style="font-size: 10pt;" data-start="1479" data-end="1602"> <p data-start="1481" data-end="1602"><span style="font-size: 10pt;">Architect <strong data-start="1491" data-end="1520">hard isolation strategies</strong> across kernel, hypervisor, and hardware layers (IOMMU, SR-IOV, device isolation).</span></p> </li> <li style="font-size: 10pt;" data-start="1603" data-end="1699"> <p data-start="1605" data-end="1699"><span style="font-size: 10pt;">Design secure multi-tenant execution models aligned with <strong data-start="1662" data-end="1698">zero-trust networking principles</strong>.</span></p> </li> <li style="font-size: 10pt;" data-start="1700" data-end="1783"> <p data-start="1702" data-end="1783"><span style="font-size: 10pt;">Ensure strong isolation without compromising performance in a shared environment.</span></p> </li> </ul> <h3 data-start="1785" data-end="1818"><span style="font-size: 10pt;">Storage &amp; Networking Strategy</span></h3> <ul data-start="1819" data-end="2085"> <li style="font-size: 10pt;" data-start="1819" data-end="1896"> <p data-start="1821" data-end="1896"><span style="font-size: 10pt;">Drive integration strategies for <strong data-start="1854" data-end="1877">VAST, Weka, and DDN</strong> storage platforms.</span></p> </li> <li style="font-size: 10pt;" data-start="1897" data-end="2010"> <p data-start="1899" data-end="2010"><span style="font-size: 10pt;">Collaborate with hardware and networking vendors to optimize <strong data-start="1960" data-end="1992">RDMA, GPUDirect, and RoCE v2</strong> traffic patterns.</span></p> </li> <li style="font-size: 10pt;" data-start="2011" data-end="2085"> <p data-start="2013" data-end="2085"><span style="font-size: 10pt;">Design and evolve <strong data-start="2031" data-end="2084">VXLAN and BGP-EVPN–based networking architectures</strong>.</span></p> </li> </ul> <h3 data-start="2087" data-end="2110"><span style="font-size: 10pt;">Feature Development</span></h3> <ul data-start="2111" data-end="2402"> <li style="font-size: 10pt;" data-start="2111" data-end="2223"> <p data-start="2113" data-end="2223"><span style="font-size: 10pt;">Design, develop, and maintain <strong data-start="2143" data-end="2174">custom Kubernetes operators</strong> for GPU, storage, and infrastructure automation.</span></p> </li> <li style="font-size: 10pt;" data-start="2224" data-end="2310"> <p data-start="2226" data-end="2310"><span style="font-size: 10pt;">Implement <strong data-start="2236" data-end="2292">CRDs, reconciliation logic, and lifecycle management</strong> for AI workloads.</span></p> </li> <li style="font-size: 10pt;" data-start="2311" data-end="2402"> <p data-start="2313" data-end="2402"><span style="font-size: 10pt;">Guide implementation patterns while remaining hands-on with critical platform components.</span></p> </li> </ul> <h3 data-start="2404" data-end="2440"><span style="font-size: 10pt;">Reliability, Performance &amp; Scale</span></h3> <ul data-start="2441" data-end="2721"> <li style="font-size: 10pt;" data-start="2441" data-end="2524"> <p data-start="2443" data-end="2524"><span style="font-size: 10pt;">Define platform <strong data-start="2459" data-end="2467">SLOs</strong>, capacity planning models, and GPU availability targets.</span></p> </li> <li style="font-size: 10pt;" data-start="2525" data-end="2624"> <p data-start="2527" data-end="2624"><span style="font-size: 10pt;">Establish benchmarking standards including <strong data-start="2570" data-end="2580">MLPerf</strong> and custom training/inference stress tests.</span></p> </li> <li style="font-size: 10pt;" data-start="2625" data-end="2721"> <p data-start="2627" data-end="2721"><span style="font-size: 10pt;">Lead <strong data-start="2632" data-end="2657">post-incident reviews</strong>, root-cause analysis, and performance optimization initiatives.</span></p> </li> </ul> <h3 data-start="2723" data-end="2760"><span style="font-size: 10pt;">Technical Leadership &amp; Mentorship</span></h3> <ul data-start="2761" data-end="3015"> <li style="font-size: 10pt;" data-start="2761" data-end="2864"> <p data-start="2763" data-end="2864"><span style="font-size: 10pt;">Set engineering standards through <strong data-start="2797" data-end="2863">design reviews, architecture documentation, and technical RFCs</strong>.</span></p> </li> <li style="font-size: 10pt;" data-start="2865" data-end="2931"> <p data-start="2867" data-end="2931"><span style="font-size: 10pt;">Mentor and grow <strong data-start="2883" data-end="2902">L3/L4 engineers</strong> into strong platform owners.</span></p> </li> <li style="font-size: 10pt;" data-start="2932" data-end="3015"> <p data-start="2934" data-end="3015"><span style="font-size: 10pt;">Influence and collaborate across <strong data-start="2967" data-end="3014">infrastructure, security, and product teams</strong>.</span></p> </li> </ul> <hr data-start="3017" data-end="3020"> <h2 data-start="3022" data-end="3048"><span style="font-size: 10pt;">Required Qualifications</span></h2> <ul data-start="3050" data-end="4034"> <li style="font-size: 10pt;" data-start="3050" data-end="3141"> <p data-start="3052" data-end="3141"><span style="font-size: 10pt;"><strong data-start="3052" data-end="3081">10–15 years of experience</strong> in software, platform, or infrastructure engineering roles.</span></p> </li> <li style="font-size: 10pt;" data-start="3142" data-end="3271"> <p data-start="3144" data-end="3271"><span style="font-size: 10pt;">Demonstrated expertise designing and operating <strong data-start="3191" data-end="3232">production-grade Kubernetes operators</strong> using Go (Kubebuilder / Operator SDK).</span></p> </li> <li style="font-size: 10pt;" data-start="3272" data-end="3399"> <p data-start="3274" data-end="3399"><span style="font-size: 10pt;">Deep understanding of <strong data-start="3296" data-end="3320">Kubernetes internals</strong>, including etcd performance, API machinery, CRDs, controllers, and scheduling.</span></p> </li> <li style="font-size: 10pt;" data-start="3400" data-end="3512"> <p data-start="3402" data-end="3512"><span style="font-size: 10pt;">Proven experience building <strong data-start="3429" data-end="3463">secure, multi-tenant platforms</strong> with strong isolation and zero-trust networking.</span></p> </li> <li style="font-size: 10pt;" data-start="3513" data-end="3654"> <p data-start="3515" data-end="3654"><span style="font-size: 10pt;">Strong hands-on knowledge of <strong data-start="3544" data-end="3587">high-performance storage and networking</strong>, including POSIX semantics, CSI drivers, and InfiniBand / RoCE v2.</span></p> </li> <li style="font-size: 10pt;" data-start="3655" data-end="3772"> <p data-start="3657" data-end="3772"><span style="font-size: 10pt;">Experience designing <strong data-start="3678" data-end="3717">infrastructure automation workflows</strong> using tools such as Ansible, Terraform, or equivalent.</span></p> </li> <li style="font-size: 10pt;" data-start="3773" data-end="3909"> <p data-start="3775" data-end="3909"><span style="font-size: 10pt;">Hands-on experience with <strong data-start="3800" data-end="3838">observability and monitoring tools</strong> such as Prometheus, OpenTelemetry (OTEL), Grafana, Splunk, or similar.</span></p> </li> <li style="font-size: 10pt;" data-start="3910" data-end="3952"> <p data-start="3912" data-end="3952"><span style="font-size: 10pt;">Strong proficiency in <strong data-start="3934" data-end="3951">Go and Python</strong>.</span></p> </li> <li style="font-size: 10pt;" data-start="3953" data-end="4034"> <p data-start="3955" data-end="4034"><span style="font-size: 10pt;">Excellent leadership, communication, and cross-functional collaboration skills.</span></p> </li> </ul> <hr data-start="4036" data-end="4039"> <h2 data-start="4041" data-end="4083"><span style="font-size: 10pt;">Preferred / Nice-to-Have Qualifications</span></h2> <ul data-start="4085" data-end="4573"> <li style="font-size: 10pt;" data-start="4085" data-end="4190"> <p data-start="4087" data-end="4190"><span style="font-size: 10pt;">Experience with <strong data-start="4103" data-end="4128">AI serving frameworks</strong> such as vLLM, Ray Serve, Triton Inference Server, or similar.</span></p> </li> <li style="font-size: 10pt;" data-start="4191" data-end="4322"> <p data-start="4193" data-end="4322"><span style="font-size: 10pt;">Familiarity with <strong data-start="4210" data-end="4252">virtualization and lower-layer systems</strong> including VMware vSphere, OpenStack, KVM, or bare-metal provisioning.</span></p> </li> <li style="font-size: 10pt;" data-start="4323" data-end="4458"> <p data-start="4325" data-end="4458"><span style="font-size: 10pt;">Experience with <strong data-start="4341" data-end="4363">GPU infrastructure</strong>, including NVIDIA DGX/HGX systems, GPU Operator, DCGM, Nsight, or performance profiling tools.</span></p> </li> <li style="font-size: 10pt;" data-start="4459" data-end="4573"> <p data-start="4461" data-end="4573"><span style="font-size: 10pt;">Exposure to <strong data-start="4473" data-end="4505">distributed training systems</strong> such as PyTorch DDP, DeepSpeed, or large-scale training frameworks.</span></p> </li> </ul> <hr data-start="4575" data-end="4578"> <h2 data-start="4580" data-end="4606"><span style="font-size: 10pt;">Compensation &amp; Benefits</span></h2> <p data-start="4608" data-end="4776"><span style="font-size: 10pt;">For India-based candidates, we offer a <strong data-start="4647" data-end="4700">competitive base salary along with equity options</strong>, providing an opportunity to share in the success and growth of <strong data-start="4765" data-end="4775">Armada</strong>.</span></p> <p>&nbsp;</p> </div><div class="content-conclusion"><p>&nbsp;</p> <p><strong>You're a Great Fit if You're</strong></p> <ul data-pattern="discCircleSquare" data-depth="1"> <li value="1">A go-getter with a growth mindset.&nbsp;You're intellectually curious, have strong business acumen, and actively seek opportunities to build relevant skills and knowledge&nbsp;</li> <li value="2">A detail-oriented problem-solver.&nbsp;You can independently gather information, solve problems efficiently, and deliver results with a "get-it-done" attitude&nbsp;</li> <li value="3">Thrive in a fast-paced environment.&nbsp;You're energized by an entrepreneurial spirit, capable of working quickly, and excited to contribute to a growing company</li> <li value="4">A collaborative team player.&nbsp;You focus on business success and are motivated by team accomplishment vs personal agenda&nbsp;</li> <li value="5">Highly organized and results-driven.&nbsp;Strong prioritization skills and a dedicated work ethic are essential for you&nbsp;</li> </ul> <p>&nbsp;</p> <p><strong>Equal Opportunity Statement</strong></p> <p>At Armada, we are committed to fostering a work environment where everyone is given equal opportunities to thrive. As an equal opportunity employer, we strictly prohibit discrimination or harassment based on race, color, gender, religion, sexual orientation, national origin, disability, genetic information, pregnancy, or any other characteristic protected by law. This policy applies to all employment decisions, including hiring, promotions, and compensation. Our hiring is guided by qualifications, merit, and the business needs at the time.</p> <p>&nbsp;</p> <p><strong>Unsolicited Resumes and Candidates</strong></p> <p><span data-teams="true">Armada does not accept unsolicited resumes or candidate submissions from external agencies or recruiters. All candidates must apply directly through our careers page. Any resumes submitted by agencies without a prior signed agreement will be considered unsolicited and Armada will not be obligated to pay any fees.</span></p> <p>&nbsp;</p></div>