
Staff/Senior Staff AI Engineer, Model Post-Training and Alignment at OKX
San Jose, California, United StatesFull-timeEngineeringPosted about 2 months ago
About the Role
<h3 class="heading-3 ace-line old-record-id-LgPLdbHBpoytXax9JbZlBfsZgCY"><strong>Who We Are</strong></h3>
<div class="ace-line ace-line old-record-id-XaCddqdpToiZjgxDzTflWQnngk5">At OKX, we believe that the future will be reshaped by crypto, and ultimately contribute to every individual's freedom.<br><br>OKX is a leading crypto exchange, and the developer of OKX Wallet, giving millions access to crypto trading and decentralized crypto applications (dApps). OKX is also a trusted brand by hundreds of large institutions seeking access to crypto markets. We are safe and reliable, backed by our Proof of Reserves.<br><br>Across our multiple offices globally, we are united by our core principles: <em>We Before Me</em>, <em>Do the Right Thing</em>, and <em>Get Things Done</em>. These shared values drive our culture, shape our processes, and foster a friendly, rewarding, and diverse environment for every OK-er.</div>
<div data-page-id="KgDDd1AYaowscRxaIZ1unda3sdb" data-lark-html-role="root" data-docx-has-block-data="false">
<div data-page-id="BkiUdblkMoBquYxnc9uutD1Ystc" data-lark-html-role="root" data-docx-has-block-data="false">
<div class="ace-line ace-line old-record-id-RJ7Td5cQ0oxO8rxtZnuu9zJ6sTc">
<div data-page-id="P9o8df9IXo4SHcxfhjclyoUqgld" data-lark-html-role="root" data-docx-has-block-data="false">
<h3 data-lark-html-role="root"><strong>About the Opportunity</strong></h3>
<div data-lark-html-role="root">
<div data-page-id="YnvfdojQpornTcxhh7FlZ37ugze" data-lark-html-role="root" data-docx-has-block-data="false">
<div class="ace-line ace-line old-record-id-Foo4dvQATotKkuxbUEglUO6sg2J">
<p data-start="176" data-end="486">We are seeking a highly skilled and hands-on Machine Learning Engineer specializing in <strong>large model post-training and alignment</strong>. This role focuses on designing, executing, and optimizing post-training pipelines to improve model performance, controllability, domain adaptation, and reasoning capabilities.</p>
<p data-start="488" data-end="671">You will work across the full lifecycle of post-training—from data strategy and reward modeling to reinforcement learning–based optimization and production-grade inference deployment.</p>
<h3 data-start="488" data-end="671"><strong>What You’ll Be Doing </strong></h3>
</div>
</div>
<div data-page-id="YnvfdojQpornTcxhh7FlZ37ugze" data-lark-html-role="root" data-docx-has-block-data="false">
<ul class="list-bullet1">
<li class="ace-line ace-line old-record-id-JLKOdU6cwoyIjbxLKZxlEttugxc" data-list="bullet">Lead and execute the full post-training pipeline for large language models (LLMs), including supervised fine-tuning, preference optimization, and reinforcement learning–based methods.</li>
<li class="ace-line ace-line old-record-id-AFvSdoB44oXpewxukzelPublgCe" data-list="bullet">Design and implement advanced training paradigms such as <strong data-start="947" data-end="987">DPO (Direct Preference Optimization)</strong> and <strong data-start="992" data-end="1041">GRPO (Generalized Reward Policy Optimization)</strong>.</li>
<li class="ace-line ace-line old-record-id-JXHrdNTcUoZBvWxvj1HlVENig3d" data-list="bullet">Develop domain-specific data recipes, curation strategies, and augmentation pipelines to optimize task performance.</li>
<li class="ace-line ace-line old-record-id-WnzhdfL28oLSntxM2KilxZrBgCg" data-list="bullet">Conduct post-training of specialized small models from scratch, including architecture selection, dataset construction, and optimization strategy.</li>
<li class="ace-line ace-line old-record-id-PKU2dkSMhoPMEuxuUiclXKKAgmg" data-list="bullet">Build and refine <strong data-start="1329" data-end="1346">Reward Models</strong> to support alignment and downstream optimization.</li>
<li class="ace-line ace-line old-record-id-PKU2dkSMhoPMEuxuUiclXKKAgmg" data-list="bullet">Design and implement <strong data-start="1420" data-end="1471">RLAIF (Reinforcement Learning from AI Feedback)</strong> closed-loop systems.</li>
<li class="ace-line ace-line old-record-id-PKU2dkSMhoPMEuxuUiclXKKAgmg" data-list="bullet">Optimize inference efficiency and deploy models using low-latency serving frameworks such as <strong data-start="1588" data-end="1596">vLLM</strong> and <strong data-start="1601" data-end="1611">SGLang</strong>.</li>
<li class="ace-line ace-line old-record-id-PKU2dkSMhoPMEuxuUiclXKKAgmg" data-list="bullet">Evaluate model performance using both automated benchmarks and human/AI feedback loops.</li>
<li class="ace-line ace-line old-record-id-PKU2dkSMhoPMEuxuUiclXKKAgmg" data-list="bullet">Collaborate with research and infrastructure teams to productionize training and deployment workflows.</li>
</ul>
</div>
<h3 class="heading-2 ace-line old-record-id-doxusA9TF9jUkqvydDVZ26era1g"><strong>What We Look For In You </strong></h3>
<div data-page-id="YnvfdojQpornTcxhh7FlZ37ugze" data-lark-html-role="root" data-docx-has-block-data="false">
<ul class="list-bullet1">
<li class="ace-line ace-line old-record-id-VoBZdvOhNomtYkxWILjl8wUVgQ1" data-list="bullet">Bachelor's in Computer Science, AI, Machine Learning, or related fields with at least <strong>8 years of industry experience</strong>.</li>
<li class="ace-line ace-line old-record-id-VoBZdvOhNomtYkxWILjl8wUVgQ1" data-list="bullet">Strong hands-on experience across the full <strong data-start="1886" data-end="1912">post-training pipeline</strong> for large models.</li>
<li class="ace-line ace-line old-record-id-Q4K1dT1Ynof7ttxDIHVln7Wjg1c" data-list="bullet">Deep familiarity with preference learning and alignment techniques, including <strong data-start="2011" data-end="2066">DPO, GRPO, and RL-based post-training methodologies</strong>.</li>
<li class="ace-line ace-line old-record-id-OIrUdcJaCo83dfx32OflI47Sgnc" data-list="bullet">
<div>Proven experience designing <strong data-start="2098" data-end="2133">domain-specific data strategies</strong> and training methodologies.</div>
</li>
<li class="ace-line ace-line old-record-id-OIrUdcJaCo83dfx32OflI47Sgnc" data-list="bullet">Experience training and post-training <strong data-start="2202" data-end="2243">specialized small models from scratch</strong>.</li>
<li class="ace-line ace-line old-record-id-OIrUdcJaCo83dfx32OflI47Sgnc" data-list="bullet">Solid understanding of reinforcement learning fundamentals and their application to model alignment.</li>
<li data-start="2348" data-end="2471">Experience deploying models in low-latency production environments using frameworks such as <strong data-start="2442" data-end="2470">vLLM, SGLang, or similar</strong>.</li>
</ul>
</div>
<h3 class="heading-2 ace-line old-record-id-doxusXHTxE1ng5NXSR8cKaY4vhf"><strong>Perks & Benefits</strong></h3>
<ul>
<li>Competitive total compensation package</li>
<li>L&D programs and Education subsidy for employees' growth and development</li>
<li>Various team building programs and company events</li>
<li>Wellness and meal allowances</li>
<li>Comprehensive healthcare schemes for employees and dependants</li>
<li>More that we love to tell you along the process!</li>
</ul>
</div>
<div data-lark-html-role="root">
<h3 class="heading-3 ace-line old-record-id-Xfu6dMaUkoTGqGx6arYu3QhKsZe"><strong>OKX Statement</strong></h3>
<div class="ace-line ace-line old-record-id-BDVHdvrnCoy11xx8RoVuyc2As3g">The salary range for this position is <span class="text-only text-font-italic" data-eleid="22">$313,055.00 to $450,000.00</span>. The salary offered depends on a variety of factors, including job-related knowledge, skills, experience, and market location. In addition to the salary, a performance bonus and long-term incentives may be provided as part of the compensation package, as well as a full range of medical, financial, and/or other benefits, dependent on the position offered. Applicants should apply via Okcoin and OKX internal or external careers site.</div>
<div class="ace-line ace-line old-record-id-ZfHydXbNIo2qmdxmQCru8ZDksBd"> </div>
<div class="ace-line ace-line old-record-id-BWwndhjsUozjW6xZoNxuILk6s9e">OKX is committed to equal employment opportunities regardless of race, color, genetic information, creed, religion, sex, sexual orientation, gender identity, lawful alien status, national origin, age, marital status, and non-job related physical or mental disability, or protected veteran status. Pursuant to the San Francisco Fair Chance Ordinance, we will consider for employment qualified applicants with arrest and conviction records.</div>
<div class="ace-line ace-line old-record-id-BWwndhjsUozjW6xZoNxuILk6s9e"> </div>
</div>
</div>
</div>
</div>
</div><div class="content-conclusion"><div data-lark-html-role="root"><span class="text-only" data-eleid="18"><span class="text-only"><span class="text-only" data-eleid="6">Notice:<br></span></span></span>
<div data-lark-html-role="root"><span class="text-only" data-eleid="26"><span class="text-only">All official </span><span class="text-only text-with-abbreviation text-with-abbreviation-bottomline">OKX</span><span class="text-only"> vacancies are published on this website.</span></span> <span class="text-only" data-eleid="28"><span class="text-only">While roles may appear on selected third-party platforms from time to time, information on other sites may be inaccurate or outdated. </span></span><strong><span class="text-only" data-eleid="29"><span class="text-only">If in doubt, please apply directly through our official careers website.</span></span></strong></div>
</div>
<div data-lark-html-role="root"><span class="text-only" data-eleid="18"><span class="text-only">Information collected and processed as part of the recruitment process of any job application you choose to submit is subject to </span><span class="text-only text-with-abbreviation text-with-abbreviation-bottomline">OKX</span><span class="text-only">'s </span></span><a class="link rich-text-anchor __anchor-intercept-flag__ text-content-link" href="https://www.okx.com/en-eu/help/okx-candidate-privacy-notice" target="_blank" data-eleid="19" data-lark-is-custom="true" data-lark-link="true">Candidate Privacy Notice</a><span class="text-only" data-eleid="20"><span class="text-only">.</span></span></div></div>
Related Roles
Software Engineer Mobile (iOS)
OKX
San Jose, California, United States; United States (US)Software Engineer Mobile (Android)
OKX
San Jose, California, United States; United States (US)Business Engineer, Trading
OKX
San Jose, California, United States; United States (US)Business Engineer
OKX
Singapore, SingaporeAI Agent Security Research Engineer
OKX
APAC; Hong Kong, Hong Kong SAR; Singapore, SingaporeSenior Engineer, Vulnerability Scanner
OKX
Hong Kong, Hong Kong SAR; Singapore, Singapore