Pipeline
Browse Jobs
Sign inSign up
Pipeline
Browse jobsSign inContactTermsPrivacyCookiesPreferences
Logos provided by Logo.dev

© 2026 Pipeline. All rights reserved.

  1. Home
  2. Jobs
  3. Software
  4. LLM Inference Performance & Evals Engineer
Cerebras Systems logo

Cerebras Systems

LLM Inference Performance & Evals Engineer at Cerebras Systems

Toronto, Ontario, CanadaFull-timeSoftwarePosted about 2 months ago
Apply with Pipeline→

About the Role

<div class="content-intro"><p><span data-contrast="none">Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry-leading training and inference speeds and empowers machine learning users to effortlessly run large-scale ML applications, without the hassle of managing hundreds of GPUs or TPUs.&nbsp;</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335559685&quot;:0,&quot;335559737&quot;:240,&quot;335559738&quot;:240,&quot;335559739&quot;:240,&quot;335559740&quot;:279}">&nbsp;</span></p> <p>Cerebras' current customers include top model labs, global enterprises, and cutting-edge AI-native startups.&nbsp;<a href="https://openai.com/index/cerebras-partnership/">OpenAI recently announced a multi-year partnership with Cerebras</a>, to deploy 750 megawatts of scale, transforming key workloads with ultra high-speed inference.&nbsp;</p> <p>Thanks to the groundbreaking wafer-scale architecture, Cerebras Inference offers the fastest Generative AI inference solution in the world, over 10 times faster than GPU-based hyperscale cloud inference services. This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via additional agentic computation.</p></div><h4>About The Role</h4> <p>Join the inference model team dedicated to bring up the state-of-the-art models, numerically validating and accelerating new model ideas on wafer-scale hardware. You will prototype architectural tweaks, build performance-eval pipelines, and turn hard numbers into changes that land in production.</p> <h4>Key Responsibilities</h4> <ul> <li>Prototype and benchmark cutting-edge ideas: new attentions, MoE, speculative decoding, and many more innovations as they emerge.&nbsp;</li> <li>Develop agent-driven automation that designs experiments, schedules runs, triages regressions, and drafts pull-requests.&nbsp;</li> <li>Work closely with compiler, runtime, and silicon teams: unique opportunity to experience the full stack of software/hardware innovation.&nbsp;</li> <li>Keep pace with the latest open- and closed-source models; run them first on wafer scale to expose new optimization opportunities.&nbsp;</li> </ul> <h4>Skills And Qualifications&nbsp;</h4> <ul> <li>3 + years building high-performance ML or systems software.&nbsp;</li> <li>Solid grounding in Transformer math—attention scaling, KV-cache, quantisation—or clear evidence you learn this material rapidly.&nbsp;</li> <li>Comfort navigating the full AI toolchain: Python modeling code, compiler IRs, performance profiling, etc.&nbsp;</li> <li>Strong debugging skills across performance, numerical accuracy, and runtime integration.&nbsp;</li> <li>Prior experience in modeling, compilers or crafting benchmarks or performance studies; not just black-box QA tests.&nbsp;</li> <li>Strong passion to leverage AI agents or workflow orchestration tools to boost personal productivity.</li> </ul> <h4>Assets</h4> <ul> <li>Hands-on with flash-attention, Triton kernels, linear-attention, or sparsity research.</li> <li>Performance-tuning experience on custom silicon, GPUs, or FPGAs.&nbsp;</li> <li>Proficiency in C/C++ programming and experience with low-level optimization.&nbsp;</li> <li>Proven experience in compiler development, particularly with LLVM and/or MLIR.&nbsp;</li> <li>Publications, repos, or blog posts dissecting model speed-ups.&nbsp;</li> <li>Contributions to open-source agent frameworks.</li> </ul><div class="content-conclusion"><h4><strong>Why Join Cerebras</strong></h4> <p>People who are serious about software make their own hardware. At Cerebras we have built a breakthrough architecture that is unlocking new opportunities for the AI industry. With dozens of model releases and rapid growth, we’ve reached an inflection&nbsp; point in our business. Members of our team tell us there are five main reasons they joined Cerebras:</p> <ol> <li>Build a breakthrough AI platform beyond the constraints of the GPU.</li> <li>Publish and open source their cutting-edge AI research.</li> <li>Work on one of the fastest AI supercomputers in the world.</li> <li>Enjoy job stability with startup vitality.</li> <li>Our simple, non-corporate work culture that respects individual beliefs.</li> </ol> <p>Read our blog:&nbsp;<a href="https://www.cerebras.net/blog/5-reasons-to-join-cerebras" target="_blank" data-auth="NotApplicable" data-linkindex="0">Five Reasons to Join Cerebras in 2026.</a></p> <h4>Apply today and become part of the forefront of groundbreaking advancements in AI!</h4> <hr> <p><em>Cerebras Systems is committed to creating an equal and diverse environment and is proud to be an equal opportunity employer.&nbsp;</em><em>We celebrate different backgrounds, perspectives, and skills. We believe inclusive teams build better products and companies. </em><em>We try every day to build a work environment that empowers people to do their best work through continuous learning, growth and support of those around them.</em></p> <hr> <p><em>This website or its third-party tools process personal data. For more details, click <a href="https://www.cerebras.net/privacy/" target="_blank">here</a> to review our CCPA disclosure notice.</em></p></div>

Related Roles

  • Network Architect

    Cerebras Systems

    Sunnyvale, CA
  • Staff Kernel Optimzation Engineer

    Cerebras Systems

    Remote, California, United StatesRemote
  • AI Engineer, Model Quality and Performance

    Cerebras Systems

    Sunnyvale, CA
  • Software Development Engineer in Test (Cloud)

    Cerebras Systems

    Bengaluru, Karnataka, India
  • ML Systems Performance Engineer

    Cerebras Systems

    Sunnyvale CA or Toronto Canada
  • Member of Technical Staff (Software Engineer)

    Cerebras Systems

    Sunnyvale, CA