Pipeline
Browse Jobs
Sign inSign up
Pipeline
Browse jobsSign inContactTermsPrivacyCookiesPreferences
Logos provided by Logo.dev

© 2026 Pipeline. All rights reserved.

  1. Home
  2. Jobs
  3. Engineering
  4. Staff MLOps Engineer – ML Platform
BrightAI logo

BrightAI

Staff MLOps Engineer – ML Platform at BrightAI

Palo Alto, CAFull-timeEngineeringPosted 24 days ago
Apply with Pipeline→

About the Role

<p>Bright.AI is a high-growth Physical AI company transforming how infrastructure businesses interact with the physical world through intelligent automation. Our AI platform processes visual, spatial, and temporal data from billions of real‑world events—captured across edge devices, mobile sensors, and cloud infrastructure—to enable intelligent decision‑making at scale.&nbsp;</p> <p>We are now hiring a&nbsp;<strong>Staff MLOps Engineer </strong>to lead the build‑out of our cloud‑native ML developer platform and production pipelines. This role is pivotal to building an integrated ML/AI development platform with programmatic data analysis and algorithm development capability on AWS—so teams can move from notebook to secure, reliable, and cost‑efficient production services quickly.</p> <p>You’ll work at the intersection of ML engineering, cloud infrastructure, and developer experience, designing scalable data/model workflows, CI/CD for ML, observability, and governance that turn ideas into durable, monitored ML services.</p> <p>&nbsp;</p> <p><strong>Key Responsibilities:</strong></p> <p>● Design, build, and operate our ML/AI development platform on&nbsp;<strong>AWS</strong>—including <strong>Amazon SageMaker AI </strong>(Studio/Notebooks, Training/Processing/Batch Transform, Real‑Time &amp; Async Inference, Pipelines, Feature Store) and supporting services.<br>● Establish golden‑path project templates, base Docker images, and internal Python libraries to standardize experiments, data processing, training, and deployment workflows.<br>● Implement <strong>Infrastructure‑as‑Code</strong> (e.g., Terraform) and workflow orchestration (<strong>Step Functions</strong>, Airflow); optionally support EKS for training/inference.<br>● Build automated data pipelines with <strong>S3, Glue, EMR/Spark (PySpark), Athena/Redshift</strong>; add data quality (Great Expectations/Deequ) and lineage.<br>● Stand up experiment tracking and a model registry (<strong>SageMaker Experiments &amp; Model Registry</strong> or <strong>MLflow</strong>); enforce versioning for <strong>data, code, and models</strong>.<br>● Implement CI/CD for ML (CodeBuild/CodePipeline or GitHub Actions): unit/integration tests, data contracts, model tests, canary/shadow deployments, and safe rollback.<br>● Ship <strong>real‑time</strong> endpoints (SageMaker endpoints/FastAPI on Lambda/ECS/EKS) and <strong>batch</strong> jobs; set SLOs and autoscaling, and optimize for cost/performance.<br>● Build <strong>monitoring &amp; observability</strong> for production models and services (drift, performance, bias with SageMaker Model Monitor; service telemetry with CloudWatch/Prometheus/Grafana).<br>● Enforce security &amp; governance: <strong>least‑privilege IAM</strong>, VPC isolation/PrivateLink, encryption, secret management.<br>● Partner with backend engineers to productionize notebooks and prototypes.<br>● Help integrate <strong>GenAI/Bedrock </strong>services where appropriate; support RAG pipelines with vector stores (OpenSearch) and evaluation harnesses.</p> <p>&nbsp;</p> <p><strong>Educational Background</strong></p> <p>● B.S. or M.S. in Computer Science, Electrical/Computer Engineering, or related field; advanced degree a plus.<br>● Strong foundation in machine learning systems, distributed computing, and data engineering; applied experience building production grade ML platforms.</p> <p>&nbsp;</p> <p><strong>Required Skills &amp; Expertise</strong></p> <p>● 8+ years in software/ML engineering, including 4<strong>+ years in MLOps</strong> or in a similar role.<br>● Strong programming skills (proficient in <strong>Python)</strong>, fluent with <strong>Docker</strong> and <strong>Terraform or AWS CDK</strong>.<br>● Hands-on with <strong>AWS</strong>: <strong>SageMaker</strong>, <strong>S3</strong>, <strong>IAM</strong>, <strong>CloudWatch</strong>, <strong>ECR</strong>, and <strong>ECS/EKS/Lambda</strong>.<br>● Built and operated <strong>CI/CD for ML</strong> (tests for code/data/models; automated deploys) and shipped <strong>real‑time &amp; batch</strong> ML workloads to production.<br>● Experience with <strong>experiment tracking &amp; model registry</strong> (e.g., SageMaker Experiments/Model Registry or MLflow) and <strong>data versioning</strong>.<br>● Implemented <strong>monitoring &amp; quality</strong> (SageMaker Model Monitor, EvidentlyAI, Great Expectations/Deequ) and created on‑call/runbooks for model &amp; service incidents.<br>● Solid grasp of <strong>security &amp; compliance</strong> in cloud ML (IAM policy design, VPC/private networking, KMS encryption, secrets management, audit logging).</p> <p>&nbsp;</p> <p><strong>Bonus Qualifications</strong></p> <p>● Distributed training at scale (SageMaker Training, PyTorch DDP, Hugging Face on SageMaker).<br>● Data engineering at scale (e.g., Spark/EMR, Glue, Redshift).<br>● Observability stacks (e.g., Grafana), performance tuning, and capacity planning for ML services.<br>● LLMOps/RAG (Bedrock, vector databases, evals) as optional capabilities.<br>● Prior startup experience building ML platforms and products from the ground up.</p>

Related Roles

  • Staff Cloud Engineer (Architect)

    BrightAI

    Palo Alto, California
  • Technical Program Manager

    BrightAI

    Palo Alto, CA
  • Supply Chain Manager

    BrightAI

    Palo Alto, CA
  • Staff Engineer, Cloud

    BrightAI

    Palo Alto, CA
  • Senior Computer Vision/AI Engineer

    BrightAI

    Palo Alto, CA
  • Senior / Staff Technical Program Manager

    BrightAI

    Palo Alto, CA