Blog
Insights on AI safety, agent testing, and building reliable LLM applications.

Introducing Agent Probe: Automated AI Agent Testing
We're launching Agent Probe — a framework that automatically tests LLM-based agents for hallucinations, security vulnerabilities, PII leaks, bias, and toxicity. Here's why we built it and how it works.
Announcement Read more →
The 6-Layer AI Test Pyramid Explained
Traditional software has the testing pyramid. We adapted it for AI agents. Learn how our 6-layer approach — from accuracy to ethics — provides comprehensive coverage for LLM-based applications.
Technical Read more →
Why Your AI Agent Needs a Security Test
Prompt injection and jailbreak attacks are real threats to production AI. We analyze common attack vectors and show how Agent Probe's security evaluator catches them using JailbreakBench datasets.
Security Read more →
Detecting Hallucinations in LLM Agents
When your AI agent confidently tells a customer false information, the damage is already done. We explore how TruthfulQA-based evaluation catches hallucinations before they reach production.
Evaluation Read more →
Bias Testing with BBQ: 20 Categories, Zero Tolerance
Does your agent respond differently based on gender, race, or age? We explain how the BBQ dataset enables systematic bias detection across 20 demographic categories.
Ethics Read more →
EU AI Act: What It Means for AI Agent Testing
The EU AI Act is here, and it requires systematic testing of AI systems. We break down the regulation's risk tiering and show how Agent Probe helps you stay compliant.
Compliance Read more →