Video Tutorials

Step-by-step video guides for every feature of Agent Probe.

Product Demo

See Agent Probe in Action

Watch Demo
Demo 1

Agent Probe — Full Product Demo

A complete walkthrough from login to results — model selection, batch testing, live chat evaluation, and CI/CD integration.

~5 min
Watch Demo
Demo 2

Agent Probe — Advanced Features

Deep dive into evaluator configuration, custom datasets, webhook setup, and multi-model comparison.

~5 min
Tutorial Series

Step-by-Step Tutorials

6-part series covering everything from testing fundamentals to custom datasets.

Watch Now
Video 1

Introduction — Why Test AI Agents?

What problems Agent Probe solves. Hallucination, prompt injection, PII leaks, bias, toxicity — and why traditional testing isn't enough.

~4 min
Watch Now
Video 2

Test Pyramid & Evaluators

6-layer test pyramid walkthrough. All 16 evaluators explained. Judge model concept — what it is and why it matters.

~5 min
Watch Now 5 Parts
Video 3

Running Your First Test & Manual Chat

Login → model selection → configuration → batch run → real-time results. Live bias, security (jailbreak), and PII detection demo.

~5 min5 parts
Watch Now 3 Parts
Video 4

Reading Results & Model Comparison

Reading test cards (score, pass/fail, judge reasoning). Test history. Side-by-side model comparison: GPT-4o-mini vs Claude.

~5 min3 parts
Watch Now
Video 5

CI/CD — Webhooks & API Keys

Creating webhooks with cron scheduling. API key generation. cURL integration. User management and approval workflow.

~5 min
Watch Now
Video 6

Custom Datasets

JSON format explained. Creating accuracy, security, and PII test data. Upload via drag & drop. Running domain-specific evaluations.

~4 min
Advanced

Technical Deep Dives

For developers and tech leads who want to understand exactly how Agent Probe works under the hood.

Watch Now 2 Parts
Chapter A

Architecture & Pipeline

FastAPI internals, ThreadPoolExecutor parallelism, asyncio.gather, BaseEvaluator class, scoring conventions, LLM-as-judge vs rule-based strategies.

~4 min
Watch Now 5 Parts
Chapter B

Evaluators In Depth

Bias (BBQ + DeepEval), Security (Garak 156 patterns), Hallucination (TruthfulQA + context), PII (Presidio NER), Accuracy (MMLU exact match + LLM judge).

~4 min5 parts
Watch Now 2 Parts
Chapter C

Dataset Architecture & Data Flow

Golden Datasets (BBQ, ToxiGen, TruthfulQA, MMLU, JailbreakBench), JSON schema, end-to-end request → evaluator → score pipeline.

~2 min2 parts