Insights

AI Agent Insights

Notes from the field on AI Agent reliability

Ryan Brandt - Author
Ryan Brandt
November 12, 2025·8 min read

Cursor: The Everything App

I wanted to interview at OpenAI but didn't know anyone there. So I built an agent in Cursor to optimize my cold outreach. That worked. Then I kept building. Now Cursor runs my entire life.

AICursorAutomationProductivity
Ryan Brandt - Author
Ryan Brandt
October 28, 2025·22 min read

Testing LangSmith's Insights Agent: 87.92% Coverage in 35 Minutes

We spent 20 hours with domain experts manually annotating 207 production agent traces to understand failure patterns. Then we tested if LangSmith's Insights Agent could automate this process. It found 87.92% of our failure patterns in 35 minutes.

AIEvalsLangSmithTestingAgent Engineering
Ryan Brandt - Author
Ryan Brandt
October 13, 2025·14 min read

The Unknown Unknowns Problem in AI Evaluation

Why automated tests miss the failures that matter most, and how manual error analysis discovers the bugs you never imagined existed.

AIEvalsTestingError AnalysisEngineering
Ryan Brandt - Author
Ryan Brandt
October 10, 2025·7 min read

The $500 AI That Just Beat Gemini at Abstract Reasoning

Samsung's 7-million parameter model outperforms giants on ARC-AGI 2. As the lead contributor to that benchmark, here's why this matters and what it means for the future of AI.

AIMachine LearningReasoningEfficiencyResearch
Ryan Brandt - Author
Ryan Brandt
October 8, 2025·13 min read

How to Actually Evaluate Your LLM (And Stop Guessing)

A methodological walkthrough using a hypothetical customer service bot to show how to move from vibes-based evaluation to systematic, measurable improvements.

AIEvalsLLMProduct DesignEngineering
Ryan Brandt - Author
Ryan Brandt
July 29, 2025·5 min read

Prompting 101: How to Make a Good Prompt

A practical guide to writing clear, effective prompts that get consistent results from LLMs.

PromptsAI DevelopmentLLMTutorial
Ryan Brandt - Author
Ryan Brandt
July 25, 2025·8 min read

The Most Valuable Part of Evals Cannot Be Automated

A simple, non-technical guide to fixing AI agents by analyzing what went wrong, measuring the impact, and improving systematically.

EvalsAI DevelopmentAgentic WorkflowsDebugging Agents
Ryan Brandt - Author
Ryan Brandt
July 22, 2025·7 min read

Application-Centric Evals: Stop Playing Whack-a-Mole

How to ship something people trust, come back to, and pay for. Inspired by Hamel Husain and Shreya Shankar's course.

EvalsAI DevelopmentLLMProduct
Ryan Brandt - Author
Ryan Brandt
July 3, 2025·9 min read

How MCP actually works and why FastMCP is the easiest way to use it

Breaking down how the Model Context Protocol works, why it's structured the way it is, and why FastMCP is the best way to implement it in practice.

MCPAI DevelopmentProtocolFastMCPAI AgentsLangChain
Ryan Brandt - Author
Ryan Brandt
January 20, 2025·18 min read

Building High-Quality LLM Judges: A Data-Driven Approach with Claude Code

How we achieved 82% recall with only a 2% generalization gap through 10 iterations of systematic prompt engineering in a single afternoon.

AIEvalsLLMPrompt EngineeringClaude Code