The role
You will own agents end to end: the prompting and tool design, the retrieval, the guardrails, and the evals that tell us whether any of it is actually working once real traffic arrives. These systems run inside the customer’s own cloud on GCP, AWS, or Azure, take real actions, and stay live, so the bar is production, not a notebook.
What you'll do
- Build agentic systems that reason, call tools, and take real actions (answering calls, booking appointments, updating records) with guardrails you design.
- Stand up the evaluation loop: golden sets, replay of real transcripts, online sampling, and the metrics that actually predict quality in production.
- Deploy and operate inside the client’s cloud under least-privilege access, with full logging and a kill-switch.
- Tune prompts, tools, and retrieval against real traffic, and stay on call for the systems you ship.
What we're looking for
- Strong software engineering plus real experience building LLM or agent systems that went to production, not just demos.
- A point of view on evaluation: you have measured an AI system you could not fully script.
- Comfort owning a system end to end, from design through the pager.
- Clear writing and judgment. We are async and senior, and we trust it.
Bonus points
- Experience deploying inside customer cloud accounts (GCP, AWS, or Azure) under tight access controls.
- Voice, telephony, or real-time pipelines.
- Healthcare or other regulated environments.
How we work
Senior, remote-first, and async. The person who designs your system builds it and operates it: no layers, no handoffs. You own your work in front of the client, and we back you with real ownership, meaningful equity, and the time and tools to do it well. The full list of what you get is on the careers page.
