Pipeline
Browse Jobs
Sign inSign up
Pipeline
Browse jobsSign inContactTermsPrivacyCookiesPreferences
Logos provided by Logo.dev

© 2026 Pipeline. All rights reserved.

  1. Home
  2. Jobs
  3. Engineering
  4. Senior/Staff DevOps Engineer
Ethos logo

Ethos

Senior/Staff DevOps Engineer at Ethos

RemoteFull-timeRemoteEngineeringPosted 23 days ago
Apply with Pipeline→

About the Role

<p>&nbsp;</p> <p><span style="font-size: 14pt;">About Ethos </span></p> <p><span style="font-size: 10pt;">Ethos is on a mission to bridge the human readiness gap by transforming how training is developed, consumed, and aligned with strategic business outcomes. As a well-funded Series A startup ($40M+ raised), we’re a trusted partner to 150+ enterprise customers across the U.S. military, life sciences, manufacturing, supply chain, and professional sports.</span></p> <p><span style="font-size: 10pt;">We’re expanding our engineering team to deliver a best-in-class learning platform—smarter, faster, and more optimized. We’ve gone all-in on AI tooling in our development process, and we’re accepting and expanding upon the best new practices for creating software in this era.</span></p> <p><span style="font-size: 14pt;">About the Role&nbsp;</span></p> <p>You’ll lead the deployment and operationalization of our SaaS products across <strong>Commercial Cloud</strong>, <strong>government networks</strong>, and <strong>bespoke/air-gapped</strong> customer environments. As a <strong>Senior</strong> engineer, you’ll own end-to-end infrastructure delivery, elevate DevOps practices, and collaborate closely with Software and Product. As a <strong>Staff </strong>engineer, you’ll additionally shape <strong>platform engineering strategy</strong>, set technical direction for distributed systems at scale, and influence design patterns that enable AI workloads and complex data pipelines. You’ll treat AI tooling as core to your daily workflow — for IaC, pipelines, incident response, and toil reduction — and help shape the agentic operations patterns and AI workloads our platform runs.</p> <p><em>If you love solving hard deployment problems, care deeply about security and reliability, can scale modern cloud platforms with rigor, and embrace AI-augmented operations as the way forward, this role is for you.</em></p> <p><span style="font-size: 14pt;">What You’ll Do </span></p> <ul> <li><strong>Design &amp; Operate the Platform:</strong> Architect, implement, and run secure, scalable, multi-tenant infrastructure (infra as code, immutable artifacts, GitOps).</li> <li><strong>AI-Augmented Operations &amp; Platform Work:</strong> Use AI coding and agentic tools (Claude Code, Cursor, Copilot, MCP-based ops agents) for IaC authoring, pipeline development, log/trace analysis, postmortem drafting, and toil reduction; build and improve agentic workflows for the team.</li> <li><strong>CI/CD &amp; Release Engineering:</strong> Build and harden pipelines (build, test, scan, sign, promote, deploy) for multi-environment delivery—including disconnected/air-gapped workflows.</li> <li><strong>Observability &amp; Reliability:</strong> Establish SLOs; instrument systems for metrics/logs/traces; drive incident response and postmortems; reduce MTTR and change failure rate.</li> <li><strong>Security &amp; Compliance by Design:</strong> Integrate supply-chain security (SBOMs, signing, provenance), secrets management, and baseline hardening (CIS/STIG-aligned).</li> <li><strong>Cost &amp; Performance:</strong> Optimize infrastructure spend and performance (capacity planning, autoscaling, right-sizing, storage/egress strategies).</li> <li><strong>Technical Leadership:</strong> Lead design reviews, author RFCs, mentor engineers, and raise the quality bar for platform changes.</li> <li><strong>Gov/Constrained Deployments:</strong> Support IL-4/IL-5-aligned patterns, RMF documentation support, and offline artifact promotion processes where needed.</li> <li><strong>(Staff) Strategy &amp; Standards:</strong> Define platform roadmaps, establish consistent deployment and infrastructure patterns, and guide cross-team adoption of best practices.</li> </ul> <p><span style="font-size: 14pt;">Measures of Success (First 6–12 Months)</span></p> <ul> <li><strong>Availability &amp; Reliability:</strong> Meet or exceed service SLOs; reduce MTTR by ≥30%.</li> <li><strong>Delivery Velocity:</strong> Increase deployment frequency by ≥2× while keeping change failure rate ≤15%.</li> <li><strong>Pipeline Efficiency:</strong> Cut CI pipeline duration by ≥25% and reduce flaky tests significantly.</li> <li><strong>Security Posture:</strong> Achieve ≥95% pass rate for supply-chain/security gates (image signing, SBOM scans, vulnerability thresholds); reduce MTTR for CVEs to ≤14 days for high severity.</li> <li><strong>Cost &amp; Drift:</strong> Deliver ≥15% infra cost savings without performance regressions; keep infra drift near zero via GitOps and policy as code.</li> <li><strong>Gov/Offline Readiness:</strong> Stand up an artifact promotion flow (build → scan → sign → export) suitable for disconnected deployments with documented runbooks.</li> </ul> <p><span style="font-size: 14pt;">30/60/90 Day Plan</span></p> <p><strong>First 30 Days — Map &amp; Baseline</strong></p> <ul> <li>Deep-dive on current cloud topology, CI/CD, observability, security controls, and on-call.</li> <li>Inventory build and runtime artifacts; document deployment environments and promotion paths.</li> <li>Baseline reliability and delivery metrics (SLOs, MTTR, deploy frequency, CFR, pipeline timing).</li> <li>Establish and prove the effectiveness of your personal workflow with AI tooling.</li> </ul> <p><strong>60 Days — Design &amp; Deliver</strong></p> <ul> <li>Harden CI/CD: add SBOM generation, signing (e.g., Cosign/Sigstore), and policy gates.</li> <li>Implement or refine infrastructure modules (Terraform) and Helm/Kustomize charts with GitOps flows.</li> <li>Establish service SLOs and golden signals; wire alerts and dashboards for top services.</li> <li>Pilot artifact export/import flow for air-gapped/disconnected deployments; write runbooks.</li> </ul> <p><strong>90 Days — Scale &amp; Standardize</strong></p> <ul> <li>Standardize CI/CD pipelines and infrastructure modules across existing services.</li> <li>Migrate priority services to hardened delivery paths; deprecate legacy workflows.</li> <li>Land cost/performance wins (e.g., autoscaling policies, instance/storage class right-sizing).</li> </ul> <p><span style="font-size: 14pt;">Basic Qualifications</span>&nbsp;</p> <ul> <li>5+ years building and operating cloud platforms; 3+ years deploying SaaS in production.</li> <li>Strong with Terraform, Helm/Kustomize, and containers (Docker, Kubernetes).</li> <li>Deep AWS experience (e.g., VPC, EKS, EC2, S3, RDS, ECR, IAM/KMS, Route 53; CloudFront desirable).</li> <li>CI/CD expertise (e.g., GitHub Actions, CircleCI, or Argo Workflows) and GitOps (Argo CD or Flux).</li> <li>Observability across metrics, logs, and traces (e.g., Prometheus/Grafana, OpenTelemetry, ELK).</li> <li>Proven track record in IaC, scalable system design, and quality tooling (automated tests, canaries/blue-green, feature flags).</li> <li>Excellent communication; comfortable partnering with Product, Security, and Customer teams.</li> <li>Thrives in a startup environment—ownership, autonomy, and pragmatic delivery.</li> <li>Active, fluent use of AI development/operations tools as part of your daily workflow.</li> <li>Secret Clearance or eligibility and willingness to obtain one.</li> </ul> <p><span style="font-size: 14pt;">Preferred Qualifications</span></p> <ul> <li>Supply-chain security (SBOMs, SLSA concepts, image signing, provenance) and vulnerability management (e.g., Trivy/Grype, Snyk; Chainguard experience a plus).</li> <li>Experience identifying/mitigating CVEs and setting policy thresholds.</li> <li>Background with DoD/regulated customers; familiarity with IL-4/IL-5, Platform One patterns, and RMF documentation workflows.</li> <li>Knowledge of STIG/CIS hardening, air-gapped architectures, and offline update mechanisms.</li> <li>Experience operating AI/ML workloads in production (GPU scheduling, model artifact management, inference serving, vector DBs, queuing/streaming) or building agentic ops workflows / MCP-based integrations (alert triage, runbook automation, IaC review agents).</li> </ul> <p><span style="font-size: 14pt;">Tooling you might touch</span></p> <p>We use technologies similar to and including some of these to build our products:&nbsp;</p> <ul> <li>AI development tools (Claude Code, Cursor, GitHub Copilot, MCP servers);Terraform modules; Helm/Kustomize; Kubernetes (EKS); GitHub Actions/Workflows; Argo CD/Flux; Docker/OCI; Prometheus/Grafana, Datadog, OpenTelemetry; Loki/ELK; LaunchDarkly/Flagsmith; Cosign/Sigstore, Trivy/Grype/Snyk; AWS (VPC, EKS, EC2, S3, RDS, ECR, IAM/KMS, Route 53, CloudFront); HashiCorp Vault/Parameter Store/Secrets Manager.</li> </ul> <p><span style="font-size: 14pt;">Compensation &amp; Benefits</span></p> <ul> <li style="font-size: 10pt;"><span style="font-size: 10pt;">Competitive base salary (Senior: $150k-$190k; Staff: $170k-210k) based on location and experience with significant equity upside</span></li> <li style="font-size: 10pt;"><span style="font-size: 10pt;">Subsidized health insurance, 401(k), life insurance, and cell phone stipend.</span></li> <li style="font-size: 10pt;"><span style="font-size: 10pt;">Remote-first culture with up to 10% travel for offsites.</span></li> <li style="font-size: 10pt;"><span style="font-size: 10pt;">Work eligibility: Applicants must be authorized to work in the U.S.</span></li> </ul> <p><span style="font-size: 14pt;">One Final Note</span></p> <p><span style="font-size: 10pt;">We’re committed to building a diverse, inclusive, and authentic workplace. If you’re excited about this role but your experience doesn’t perfectly align with every qualification, please apply—you may be just the right candidate.</span></p> <p><span style="font-size: 10pt;"><strong>EEO &amp; accommodations:</strong> Ethos is an Equal Opportunity Employer. We welcome applicants of all backgrounds and provide reasonable accommodations throughout the hiring process.&nbsp;</span></p>

Related Roles

  • Senior/Staff Full Stack Engineer

    Ethos

    RemoteRemote
  • Pharma Account Manager

    Ethos

    RemoteRemote
  • Senior Cloud Engineer (Hybrid @ Bellevue, WA or Remote @ Florida)

    OfferUp

    Bellevue, WA or Remote FloridaRemote
  • Senior Software Engineer - Government Cloud

    Tines

    United States (Remote)Remote
  • Senior Advisory Consultant @Advizex (Remote)

    Myriad360

    RemoteRemote
  • Software Engineer - Markets

    Polymarket

    New York