- Home
- Jobs
- Engineering
- Data Operations Engineer

Data Operations Engineer at Abaka AI
Mountain View, CAFull-timeEngineeringPosted 22 days ago
Apply with PipelineAbout the Role
<div data-page-id="DdtRdAirRoCBFTxPTMXlNe1Igld" data-lark-html-role="root" data-docx-has-block-data="false">
<div class="ace-line ace-line old-record-id-P8jEdmLKEoOEI8xH4zylUT8cg8e">
<div data-page-id="ZwJIdmQPto6zYrxEXZelaBm0gSe" data-lark-html-role="root" data-docx-has-block-data="false">
<div class="ace-line ace-line old-record-id-SfEkdKOSHokzIRxGZ1kl0vtNgQc"><strong>About Abaka AI</strong></div>
<div class="ace-line ace-line old-record-id-YKk6d5itnoY38lx1r79lHGNcgVc">Abaka AI is built on one mission: to be the world’s most trusted data partner for AI companies. More than 1,000 industry leaders across Generative AI, Embodied AI, and Automotive AI rely on us to power their data pipelines. With our headquarters in Silicon Valley—and teams in Paris, Singapore, and Tokyo—we support global partners with fast, reliable, and scalable data solutions.</div>
<div class="ace-line ace-line old-record-id-HRyVd2BgBopnePxGDPLlhQbQg7b">Our offerings include a diverse catalog of off-the-shelf datasets (image, video, multimodal, reasoning, 3D, and beyond) as well as comprehensive data collection and annotation services. Whether teams need raw data, curated datasets, or full-cycle data engineering, Abaka AI provides the foundation for building high-performance AI systems.</div>
<div class="ace-line ace-line old-record-id-JmAWdB6ULog7BkxojQxlnHwZgjg"> </div>
<div class="ace-line ace-line old-record-id-NoUwdU0peoqlYLxKr0El1boKgTc"><strong>About the Role</strong></div>
<div class="ace-line ace-line old-record-id-A2oed27l0oeE97xpHlTla8u5gQe">We are hiring a Data Operations Engineer to own and operate Abaka AI’s internal dataset library. This role will serve as the central point of knowledge for all datasets across the company, working closely with engineering, product, and business teams to ensure fast, accurate, and scalable access to data.</div>
<div class="ace-line ace-line old-record-id-Pi6ydJav6o0FjTxqNHPlHtg8gTb">You will develop a deep understanding of our dataset inventory, including structure, quality, and use cases, and act as the primary point of contact for internal data-related questions. You will translate ambiguous requests into clear solutions, validate dataset quality, and coordinate across global teams to resolve issues efficiently.</div>
<div class="ace-line ace-line old-record-id-XeBId6RxjoGfFKxdJN6lzUoxg8e">This role is highly cross-functional and requires strong problem-solving ability, technical fluency, and a high level of ownership. You will play a critical role in improving how datasets are organized, accessed, and utilized across the company.</div>
<div class="ace-line ace-line old-record-id-ALZsd2M6GoNLjoxjPNdlVmTrgId"> </div>
<div class="ace-line ace-line old-record-id-BRCFdNTrFomWhix4tDXlKE0HgAd"><strong>Responsibilities</strong></div>
<ul class="list-bullet1">
<li class="ace-line ace-line old-record-id-OnuKdHVJ4oWFkDx9gDolCOpKgnr" data-list="bullet">
<div>Develop and maintain a comprehensive understanding of Abaka AI’s dataset library, including data structure, quality, and applicable use cases across modalities (text, image, video, audio, 3D).</div>
</li>
<li class="ace-line ace-line old-record-id-J2cpdmJTtoduB5xMUlflfLCNgKe" data-list="bullet">
<div>Serve as the internal point of contact for dataset-related inquiries, providing clear and timely responses to questions from engineering, product, and business teams.</div>
</li>
<li class="ace-line ace-line old-record-id-VCT6dT1JkoQX2yxKBFPlRIc7gEe" data-list="bullet">
<div>Translate ambiguous or high-level requests into concrete dataset solutions, identifying appropriate data sources or gaps.</div>
</li>
<li class="ace-line ace-line old-record-id-AQNpdsUIVo2IGdxILWQl8dwGgef" data-list="bullet">
<div>Inspect and validate datasets for quality, completeness, and consistency using SQL, Python, or other tools as needed.</div>
</li>
<li class="ace-line ace-line old-record-id-FB7CdGkcloTen3xmzvalIr0ggcc" data-list="bullet">
<div>Coordinate with global data teams, including teams in China, to resolve data issues, clarify requirements, and ensure timely delivery without unnecessary escalation.</div>
</li>
<li class="ace-line ace-line old-record-id-W1Hldw6ZKo1LLKxFoKJli3IhgAe" data-list="bullet">
<div>Maintain and improve internal documentation, organization, and accessibility of datasets.</div>
</li>
<li class="ace-line ace-line old-record-id-Vde2dGE36oNUgIx2f8olc3m3g8c" data-list="bullet">
<div>Identify inefficiencies in current workflows and propose improvements to systems, tooling, and processes that support dataset management and usage.</div>
</li>
<li class="ace-line ace-line old-record-id-QBhld7dEAotaDqx1AySlKMo1g9b" data-list="bullet">
<div>Support cross-functional initiatives by providing dataset insights, technical context, and operational guidance.</div>
</li>
</ul>
<div class="ace-line ace-line old-record-id-DcysdnU21oz5vHx6cOEljih4gjc"> </div>
<div class="ace-line ace-line old-record-id-HscOdeLsqowcjXxdAlolVVhcgAg"><strong>Qualifications</strong></div>
<ul class="list-bullet1">
<li class="ace-line ace-line old-record-id-QcmTdn2whogrsNxbosol3YgOgbh" data-list="bullet">
<div>Bachelor’s degree in Computer Science, Data Engineering, or a related field, or equivalent practical experience.</div>
</li>
<li class="ace-line ace-line old-record-id-FK57d3cDuoxtJIxJR92lF6oWg3d" data-list="bullet">
<div>1–4 years of experience in data operations, data engineering, or a related role involving direct interaction with datasets.</div>
</li>
<li class="ace-line ace-line old-record-id-AqARdj54AoKniNxTaXslUKz0gUc" data-list="bullet">
<div>Professional proficiency in Mandarin Chinese and English is required, as this role involves frequent collaboration with China-based vendors and external partners</div>
</li>
<li class="ace-line ace-line old-record-id-Qe2JdgpJQoZQbfxXXqrlFyGtgEh" data-list="bullet">
<div>Strong problem-solving skills and ability to operate effectively in ambiguous, fast-paced environments.</div>
</li>
<li class="ace-line ace-line old-record-id-OXYTdOXQYokO8MxeUwAlaHsFg2c" data-list="bullet">
<div>Proficiency in SQL and/or Python for data inspection, validation, and basic analysis.</div>
</li>
<li class="ace-line ace-line old-record-id-M88ZdI2sioPsfexGziilEPYbgRb" data-list="bullet">
<div>Experience working with real-world datasets, including handling data quality issues, inconsistencies, and edge cases.</div>
</li>
<li class="ace-line ace-line old-record-id-UbINdaTapoEE9YxTvOklTcGIgbf" data-list="bullet">
<div>Strong communication skills, with the ability to work across technical and non-technical teams.</div>
</li>
<li class="ace-line ace-line old-record-id-NnYgd6tDmo73mBxjt4mllpy5gCg" data-list="bullet">
<div>High level of ownership and accountability, with the ability to manage multiple requests and priorities simultaneously.</div>
</li>
</ul>
<div class="ace-line ace-line old-record-id-TOQod12znoOmwBxB32KlwCgLgjg"> </div>
<div class="ace-line ace-line old-record-id-VH8gd1i6ooXuH7xw5eUlWGGMgLe"><strong>Preferred Qualifications</strong></div>
<ul class="list-bullet1">
<li class="ace-line ace-line old-record-id-CDkpdN3hwoFecjxFLGolHisXg9b" data-list="bullet">
<div>Experience with multimodal datasets (text, image, video, audio, or 3D).</div>
</li>
<li class="ace-line ace-line old-record-id-TWtbdnwNkoR0GoxWivhl0Vrqgqh" data-list="bullet">
<div>Familiarity with data annotation, labeling workflows, or dataset preparation for machine learning.</div>
</li>
<li class="ace-line ace-line old-record-id-We0edgFY9oRqTixgomIl6wkFgNg" data-list="bullet">
<div>Experience working with international teams, particularly in cross-border environments.</div>
</li>
<li class="ace-line ace-line old-record-id-Z79Ldfw88oqsDJxR84rl3LZiglf" data-list="bullet">
<div>Exposure to AI/ML workflows, including training, fine-tuning, or evaluation datasets.</div>
</li>
</ul>
<div class="ace-line ace-line old-record-id-MHk6dFSTYo6RfFxROYllaOfVgne"> </div>
<div class="ace-line ace-line old-record-id-doxlg2Bkhk0hRffgV9FWosk04qf"><strong>Compensation & Benefits</strong></div>
<div class="ace-line ace-line old-record-id-doxlgeFwYb7ixcxuQZZfBYJjtxe">The base salary range for this position is $110,000 - $160,000 USD annually.</div>
<div class="ace-line ace-line old-record-id-doxlgqvgsgtIsAGvQrX0ghgSHEc">Compensation may vary outside of this range depending on a number of factors, including a candidate’s qualifications, skills, competencies and experience. Base pay is one part of the Total Package that is provided to compensate and recognize employees for their work at Abaka AI. This role is eligible for equity, as well as a comprehensive benefits package (health, dental, vision, PTO, flexible work schedule).</div>
</div>
</div>
</div>
Related Roles
Research Partnerships Manager
Abaka AI
Mountain View, CAMachine Learning Engineer
Abaka AI
Mountain View, CAQuality Project Associate
Abaka AI
Mountain View, CATechnical Project Associate
Abaka AI
Mountain View, CAResearch Program Associate
Abaka AI
Mountain View, CAData Solutions Engineer
Abaka AI
Mountain View, CA