OKX logo

OKX

Staff/Senior Staff AI Engineer, Model Post-Training and Alignment at OKX

San Jose, California, United StatesFull-timeEngineeringPosted about 2 months ago

About the Role

<h3 class="heading-3 ace-line old-record-id-LgPLdbHBpoytXax9JbZlBfsZgCY"><strong>Who We Are</strong></h3> <div class="ace-line ace-line old-record-id-XaCddqdpToiZjgxDzTflWQnngk5">At OKX, we believe that the future will be reshaped by crypto, and ultimately contribute to every individual's freedom.<br><br>OKX is a leading crypto exchange, and the developer of OKX Wallet, giving millions access to crypto trading and decentralized crypto applications (dApps). OKX is also a trusted brand by hundreds of large institutions seeking access to crypto markets. We are safe and reliable, backed by our Proof of Reserves.<br><br>Across our multiple offices globally, we are united by our core principles:&nbsp;<em>We Before Me</em>,&nbsp;<em>Do the Right Thing</em>, and&nbsp;<em>Get Things Done</em>. These shared values drive our culture, shape our processes, and foster a friendly, rewarding, and diverse environment for every OK-er.</div> <div data-page-id="KgDDd1AYaowscRxaIZ1unda3sdb" data-lark-html-role="root" data-docx-has-block-data="false"> <div data-page-id="BkiUdblkMoBquYxnc9uutD1Ystc" data-lark-html-role="root" data-docx-has-block-data="false"> <div class="ace-line ace-line old-record-id-RJ7Td5cQ0oxO8rxtZnuu9zJ6sTc"> <div data-page-id="P9o8df9IXo4SHcxfhjclyoUqgld" data-lark-html-role="root" data-docx-has-block-data="false"> <h3 data-lark-html-role="root"><strong>About the Opportunity</strong></h3> <div data-lark-html-role="root"> <div data-page-id="YnvfdojQpornTcxhh7FlZ37ugze" data-lark-html-role="root" data-docx-has-block-data="false"> <div class="ace-line ace-line old-record-id-Foo4dvQATotKkuxbUEglUO6sg2J"> <p data-start="176" data-end="486">We are seeking a highly skilled and hands-on Machine Learning Engineer specializing in&nbsp;<strong>large model post-training and alignment</strong>. This role focuses on designing, executing, and optimizing post-training pipelines to improve model performance, controllability, domain adaptation, and reasoning capabilities.</p> <p data-start="488" data-end="671">You will work across the full lifecycle of post-training—from data strategy and reward modeling to reinforcement learning–based optimization and production-grade inference deployment.</p> <h3 data-start="488" data-end="671"><strong>What You’ll Be Doing&nbsp;</strong></h3> </div> </div> <div data-page-id="YnvfdojQpornTcxhh7FlZ37ugze" data-lark-html-role="root" data-docx-has-block-data="false"> <ul class="list-bullet1"> <li class="ace-line ace-line old-record-id-JLKOdU6cwoyIjbxLKZxlEttugxc" data-list="bullet">Lead and execute the full post-training pipeline for large language models (LLMs), including supervised fine-tuning, preference optimization, and reinforcement learning–based methods.</li> <li class="ace-line ace-line old-record-id-AFvSdoB44oXpewxukzelPublgCe" data-list="bullet">Design and implement advanced training paradigms such as&nbsp;<strong data-start="947" data-end="987">DPO (Direct Preference Optimization)</strong>&nbsp;and&nbsp;<strong data-start="992" data-end="1041">GRPO (Generalized Reward Policy Optimization)</strong>.</li> <li class="ace-line ace-line old-record-id-JXHrdNTcUoZBvWxvj1HlVENig3d" data-list="bullet">Develop domain-specific data recipes, curation strategies, and augmentation pipelines to optimize task performance.</li> <li class="ace-line ace-line old-record-id-WnzhdfL28oLSntxM2KilxZrBgCg" data-list="bullet">Conduct post-training of specialized small models from scratch, including architecture selection, dataset construction, and optimization strategy.</li> <li class="ace-line ace-line old-record-id-PKU2dkSMhoPMEuxuUiclXKKAgmg" data-list="bullet">Build and refine&nbsp;<strong data-start="1329" data-end="1346">Reward Models</strong>&nbsp;to support alignment and downstream optimization.</li> <li class="ace-line ace-line old-record-id-PKU2dkSMhoPMEuxuUiclXKKAgmg" data-list="bullet">Design and implement&nbsp;<strong data-start="1420" data-end="1471">RLAIF (Reinforcement Learning from AI Feedback)</strong>&nbsp;closed-loop systems.</li> <li class="ace-line ace-line old-record-id-PKU2dkSMhoPMEuxuUiclXKKAgmg" data-list="bullet">Optimize inference efficiency and deploy models using low-latency serving frameworks such as&nbsp;<strong data-start="1588" data-end="1596">vLLM</strong>&nbsp;and&nbsp;<strong data-start="1601" data-end="1611">SGLang</strong>.</li> <li class="ace-line ace-line old-record-id-PKU2dkSMhoPMEuxuUiclXKKAgmg" data-list="bullet">Evaluate model performance using both automated benchmarks and human/AI feedback loops.</li> <li class="ace-line ace-line old-record-id-PKU2dkSMhoPMEuxuUiclXKKAgmg" data-list="bullet">Collaborate with research and infrastructure teams to productionize training and deployment workflows.</li> </ul> </div> <h3 class="heading-2 ace-line old-record-id-doxusA9TF9jUkqvydDVZ26era1g"><strong>What We Look For In You&nbsp;</strong></h3> <div data-page-id="YnvfdojQpornTcxhh7FlZ37ugze" data-lark-html-role="root" data-docx-has-block-data="false"> <ul class="list-bullet1"> <li class="ace-line ace-line old-record-id-VoBZdvOhNomtYkxWILjl8wUVgQ1" data-list="bullet">Bachelor's in Computer Science, AI, Machine Learning, or related fields with at least&nbsp;<strong>8 years of industry experience</strong>.</li> <li class="ace-line ace-line old-record-id-VoBZdvOhNomtYkxWILjl8wUVgQ1" data-list="bullet">Strong hands-on experience across the full&nbsp;<strong data-start="1886" data-end="1912">post-training pipeline</strong>&nbsp;for large models.</li> <li class="ace-line ace-line old-record-id-Q4K1dT1Ynof7ttxDIHVln7Wjg1c" data-list="bullet">Deep familiarity with preference learning and alignment techniques, including&nbsp;<strong data-start="2011" data-end="2066">DPO, GRPO, and RL-based post-training methodologies</strong>.</li> <li class="ace-line ace-line old-record-id-OIrUdcJaCo83dfx32OflI47Sgnc" data-list="bullet"> <div>Proven experience designing&nbsp;<strong data-start="2098" data-end="2133">domain-specific data strategies</strong>&nbsp;and training methodologies.</div> </li> <li class="ace-line ace-line old-record-id-OIrUdcJaCo83dfx32OflI47Sgnc" data-list="bullet">Experience training and post-training&nbsp;<strong data-start="2202" data-end="2243">specialized small models from scratch</strong>.</li> <li class="ace-line ace-line old-record-id-OIrUdcJaCo83dfx32OflI47Sgnc" data-list="bullet">Solid understanding of reinforcement learning fundamentals and their application to model alignment.</li> <li data-start="2348" data-end="2471">Experience deploying models in low-latency production environments using frameworks such as&nbsp;<strong data-start="2442" data-end="2470">vLLM, SGLang, or similar</strong>.</li> </ul> </div> <h3 class="heading-2 ace-line old-record-id-doxusXHTxE1ng5NXSR8cKaY4vhf"><strong>Perks &amp; Benefits</strong></h3> <ul> <li>Competitive total compensation package</li> <li>L&amp;D programs and Education subsidy for employees' growth and development</li> <li>Various team building programs and company events</li> <li>Wellness and meal allowances</li> <li>Comprehensive healthcare schemes for employees and dependants</li> <li>More that we love to tell you along the process!</li> </ul> </div> <div data-lark-html-role="root"> <h3 class="heading-3 ace-line old-record-id-Xfu6dMaUkoTGqGx6arYu3QhKsZe"><strong>OKX Statement</strong></h3> <div class="ace-line ace-line old-record-id-BDVHdvrnCoy11xx8RoVuyc2As3g">The salary range for this position is&nbsp;<span class="text-only text-font-italic" data-eleid="22">$313,055.00 to $450,000.00</span>. The salary offered depends on a variety of factors, including job-related knowledge, skills, experience, and market location. In addition to the salary, a performance bonus and long-term incentives may be provided as part of the compensation package, as well as a full range of medical, financial, and/or other benefits, dependent on the position offered. Applicants should apply via Okcoin and OKX internal or external careers site.</div> <div class="ace-line ace-line old-record-id-ZfHydXbNIo2qmdxmQCru8ZDksBd">&nbsp;</div> <div class="ace-line ace-line old-record-id-BWwndhjsUozjW6xZoNxuILk6s9e">OKX is committed to equal employment opportunities regardless of race, color, genetic information, creed, religion, sex, sexual orientation, gender identity, lawful alien status, national origin, age, marital status, and non-job related physical or mental disability, or protected veteran status. Pursuant to the San Francisco Fair Chance Ordinance, we will consider for employment qualified applicants with arrest and conviction records.</div> <div class="ace-line ace-line old-record-id-BWwndhjsUozjW6xZoNxuILk6s9e">&nbsp;</div> </div> </div> </div> </div> </div><div class="content-conclusion"><div data-lark-html-role="root"><span class="text-only" data-eleid="18"><span class="text-only"><span class="text-only" data-eleid="6">Notice:<br></span></span></span> <div data-lark-html-role="root"><span class="text-only" data-eleid="26"><span class="text-only">All official </span><span class="text-only text-with-abbreviation text-with-abbreviation-bottomline">OKX</span><span class="text-only"> vacancies are published on this website.</span></span> <span class="text-only" data-eleid="28"><span class="text-only">While roles may appear on selected third-party platforms from time to time, information on other sites may be inaccurate or outdated. </span></span><strong><span class="text-only" data-eleid="29"><span class="text-only">If in doubt, please apply directly through our official careers website.</span></span></strong></div> </div> <div data-lark-html-role="root"><span class="text-only" data-eleid="18"><span class="text-only">Information collected and processed as part of the recruitment process of any job application you choose to submit is subject to&nbsp;</span><span class="text-only text-with-abbreviation text-with-abbreviation-bottomline">OKX</span><span class="text-only">'s </span></span><a class="link rich-text-anchor __anchor-intercept-flag__ text-content-link" href="https://www.okx.com/en-eu/help/okx-candidate-privacy-notice" target="_blank" data-eleid="19" data-lark-is-custom="true" data-lark-link="true">Candidate Privacy Notice</a><span class="text-only" data-eleid="20"><span class="text-only">.</span></span></div></div>