Bias Testing with BBQ: 20 Categories, Zero Tolerance
2025-02-20 · Ethics
What Is AI Bias?
AI bias occurs when a language model treats people differently based on their demographic characteristics. This is not a theoretical concern — it is a well-documented phenomenon with measurable real-world consequences. A biased AI agent might provide more detailed medical advice when the patient is described as male, recommend lower salaries when evaluating a female candidate, express more skepticism toward individuals from certain ethnic backgrounds, or use different language tones based on the perceived socioeconomic status of the user. These biases are typically not intentional but are inherited from the training data, which reflects the historical prejudices and stereotypes present in human-generated text.
The BBQ Dataset: A Systematic Approach
The Bias Benchmark for QA (BBQ) is an academic dataset specifically designed to measure social biases in question-answering models. Developed by researchers studying fairness in AI, BBQ provides carefully constructed question templates where the only variable is the demographic attribute. Each question set includes an ambiguous context followed by a question that can be answered in a biased or unbiased way. The dataset is structured so that an unbiased model should either select the correct answer regardless of demographics or explicitly state that the information is insufficient to determine the answer. Any systematic deviation from this pattern indicates bias.
20 Categories of Bias Detection
Agent Probe's bias evaluator tests across 20 comprehensive demographic categories using the BBQ dataset: age, disability status, gender identity, nationality, physical appearance, race/ethnicity, religion, socioeconomic status (SES), sexual orientation, and sentiment perturbation, among others. Each category contains multiple question variants that probe different aspects of potential bias. For example, the gender identity category tests not just binary gender assumptions but also responses related to non-binary and transgender individuals. The race/ethnicity category covers a wide range of ethnic groups and tests for both overt stereotyping and subtle differential treatment.
How Testing Works: Differential Response Analysis
The core methodology behind bias testing is differential response analysis. Agent Probe presents the same question to your AI agent multiple times, changing only the demographic attribute each time. For example, the prompt might describe "A female engineer working on a complex algorithm" in one test case and "A male engineer working on a complex algorithm" in another. If the agent's response changes significantly — perhaps expressing more confidence in the male engineer's ability or providing more detailed technical guidance to one gender — bias is detected. The evaluator quantifies the degree of bias by measuring the semantic distance between responses across demographic variants of the same question.
Zero Tolerance: Why This Matters
Agent Probe adopts a zero-tolerance approach to bias for good reason. Even small biases, when deployed at scale, have a compounding effect that can reinforce societal inequalities. An AI agent used by thousands of users per day that is slightly more helpful to one demographic group effectively discriminates against others thousands of times daily. The BBQ-based evaluation provides concrete, quantifiable evidence of bias that teams can act on. Results include specific examples of biased responses, the demographic dimensions where bias was detected, the magnitude of differential treatment, and clear before-and-after metrics when model adjustments are made. This data-driven approach transforms bias from an abstract concern into a measurable, improvable metric.