Work with Autolab

Autolab builds autoresearch infrastructure: agents that accelerate model training by automating complex ML experiments and decision-making. We are hiring for our first roles.

Every role offers competitive salary and meaningful equity.

Infrastructure Engineer

San Francisco, California · Full-time

+−

Build the distributed systems that run thousands of concurrent ML experiments: job orchestration, GPU scheduling, artifact storage, observability.

WHAT YOU’LL DO

Design and operate the orchestration layer that schedules thousands of concurrent experiments across heterogeneous GPU fleets.
Build GPU scheduling that keeps clusters near full utilization, from a single lab 3090 to multi-node H100 clusters.
Own artifact storage for checkpoints, datasets, and logs: versioned, deduplicated, and fast to fetch.
Build the observability pipeline that ingests training telemetry at high volume without slowing runs down.
Make experiment launches reproducible: environments, dependencies, and data pinned by default.
Set the engineering practices, CI, deployment, on-call, that the team grows into.

WHAT WE’RE LOOKING FOR

You have built and operated distributed systems in production: schedulers, queues, or storage.
Strong Python plus a systems language (Go, Rust, or C++).
Hands-on experience with cluster tooling such as Kubernetes, Slurm, or Ray.
You have run GPU or HPC workloads and know where they break.
Pragmatic by default: ship the simple version, measure, then harden.
Comfortable owning large areas with little process in an early-stage team.

Apply

Harness & Agents Engineer

San Francisco, California · Full-time

+−

Design the agent harness: tool interfaces, sandboxing, evaluation loops, and the control logic that lets agents plan and execute research autonomously.

WHAT YOU’LL DO

Design the tool interfaces agents use to write code, launch experiments, and read results.
Build sandboxing that lets agents execute generated code safely against real training infrastructure.
Implement evaluation loops that measure whether agents make good research decisions, not just whether code runs.
Develop the control logic for long-horizon work: when to branch an experiment, kill a run, or escalate to a human.
Instrument agent trajectories so failures are debuggable and successes are repeatable.
Iterate on prompts, context assembly, and memory alongside the post-training team.

WHAT WE’RE LOOKING FOR

You have built LLM agent systems or harnesses that ran against real workloads, not just demos.
Strong software engineering fundamentals and good taste in API design.
Experience with isolation and sandboxing: containers, VMs, or permission models.
You have written evals and know how easily they mislead.
Enough ML background to understand the experiments your agents are running.
You treat agent reliability as an engineering problem, not a prompting trick.

Apply

Post-training ML Engineer

San Francisco, California · Full-time

+−

Own post-training for our agent models: RL and SFT pipelines, reward design, evals, and data engines built from experiment trajectories.

WHAT YOU’LL DO

Own RL and SFT pipelines end to end: data, training, evaluation, and deployment.
Design reward signals for research decisions, where ground truth arrives hours or days after the action.
Build data engines that turn experiment trajectories into training data.
Run post-training experiments with proper baselines and ablations, and write up what you learn.
Build the eval suites that catch capability regressions before they ship.
Work with the harness team to close the loop between agent behavior and model updates.

WHAT WE’RE LOOKING FOR

Hands-on post-training experience with LLMs: SFT, RLHF/RLAIF, DPO, or similar.
Strong PyTorch or JAX, and comfort working inside distributed training stacks.
Empirical rigor: you reach for a baseline and an ablation before an opinion.
You have felt reward hacking and eval overfitting firsthand and design against them.
Able to build the data tooling around training, not just the training loop.
Publications or shipped models are both welcome; neither is required.

Apply

Forward Deployed Engineer

San Francisco, California · Remote (US) · Full-time

+−

Embed with design partners (self-driving, robotics, CV teams), integrate Autolab into their training stacks, and feed what you learn back into the product.

WHAT YOU’LL DO

Embed with design partners in self-driving, robotics, and computer vision, and get Autolab running in their training stacks.
Debug across the boundary: their training code, our agents, and the infrastructure in between.
Build the integrations, adapters, and examples that make the next deployment faster than the last.
Turn partner pain points into concrete, prioritized product feedback.
Be the partner’s technical point of contact from first call to production use.
Travel to partner sites when the problem is easier to solve in the same room.

WHAT WE’RE LOOKING FOR

A strong generalist engineer who is productive in an unfamiliar codebase within days.
Real ML training experience: you have trained models, not just called APIs.
You communicate clearly with customers, including when the news is bad.
Bias to ownership: when something is broken at a partner, you fix it or find who can.
Comfortable with ambiguity, context switching, and occasional travel.
You enjoy being the person the product gets judged by.

Apply

Chief of Staff

San Francisco, California · Full-time

+−

Work directly with the founders on operations, hiring, partner relationships, and everything that keeps a fast-moving research company running.

WHAT YOU’LL DO

Run hiring end to end: sourcing, pipelines, scheduling, references, and closing.
Own day-to-day operations: legal, finance, vendors, and workspace.
Manage partner and investor relationships alongside the founders.
Build the lightweight internal processes the team needs, and kill the ones it doesn’t.
Take whole problems off the founders’ plates and return them solved.
Keep the company’s commitments tracked and kept as the pace increases.

WHAT WE’RE LOOKING FOR

Experience at an early-stage startup or another environment where you owned outcomes, not tasks.
Excellent writing: you can turn a messy discussion into a clear one-pager.
Organized under load: many threads, none dropped.
Comfortable with contracts, budgets, and people in the same afternoon.
Enough technical fluency to follow what an ML research company builds and why.
Discretion and judgment: you will see everything.

Apply

Don’t see your role? Tell us what you’d build: apply here.