Bias Testing with BBQ: 20 Categories, Zero Tolerance

2025-02-20 · Ethics

What Is AI Bias?

AI bias occurs when a language model treats people differently based on their demographic characteristics. This is not a theoretical concern — it is a well-documented phenomenon with measurable real-world consequences. A biased AI agent might provide more detailed medical advice when the patient is described as male, recommend lower salaries when evaluating a female candidate, express more skepticism toward individuals from certain ethnic backgrounds, or use different language tones based on the perceived socioeconomic status of the user. These biases are typically not intentional but are inherited from the training data, which reflects the historical prejudices and stereotypes present in human-generated text.

The BBQ Dataset: A Systematic Approach

The Bias Benchmark for QA (BBQ) is an academic dataset specifically designed to measure social biases in question-answering models. Developed by researchers studying fairness in AI, BBQ provides carefully constructed question templates where the only variable is the demographic attribute. Each question set includes an ambiguous context followed by a question that can be answered in a biased or unbiased way. The dataset is structured so that an unbiased model should either select the correct answer regardless of demographics or explicitly state that the information is insufficient to determine the answer. Any systematic deviation from this pattern indicates bias.

20 Categories of Bias Detection

Agent Probe's bias evaluator tests across 20 comprehensive demographic categories using the BBQ dataset: age, disability status, gender identity, nationality, physical appearance, race/ethnicity, religion, socioeconomic status (SES), sexual orientation, and sentiment perturbation, among others. Each category contains multiple question variants that probe different aspects of potential bias. For example, the gender identity category tests not just binary gender assumptions but also responses related to non-binary and transgender individuals. The race/ethnicity category covers a wide range of ethnic groups and tests for both overt stereotyping and subtle differential treatment.

How Testing Works: Differential Response Analysis

The core methodology behind bias testing is differential response analysis. Agent Probe presents the same question to your AI agent multiple times, changing only the demographic attribute each time. For example, the prompt might describe "A female engineer working on a complex algorithm" in one test case and "A male engineer working on a complex algorithm" in another. If the agent's response changes significantly — perhaps expressing more confidence in the male engineer's ability or providing more detailed technical guidance to one gender — bias is detected. The evaluator quantifies the degree of bias by measuring the semantic distance between responses across demographic variants of the same question.

Zero Tolerance: Why This Matters

Agent Probe adopts a zero-tolerance approach to bias for good reason. Even small biases, when deployed at scale, have a compounding effect that can reinforce societal inequalities. An AI agent used by thousands of users per day that is slightly more helpful to one demographic group effectively discriminates against others thousands of times daily. The BBQ-based evaluation provides concrete, quantifiable evidence of bias that teams can act on. Results include specific examples of biased responses, the demographic dimensions where bias was detected, the magnitude of differential treatment, and clear before-and-after metrics when model adjustments are made. This data-driven approach transforms bias from an abstract concern into a measurable, improvable metric.

AI Onyargisi Nedir?

AI onyargisi, bir dil modelinin insanlara demografik ozelliklerine gore farkli davranmasi durumunda ortaya cikar. Bu teorik bir endise degildir — olculebilir gercek dunya sonuclari olan, iyi belgelenmis bir olgudur. Onyargili bir AI agent'i, hasta erkek olarak tanimlandiginda daha ayrintili tibbi tavsiye saglayabilir, kadın bir aday degerlendirilirken daha dusuk maaslar onerebilir, belirli etnik kokenlerden bireylere karsi daha fazla suphecilik ifade edebilir veya kullanicinin algilanan sosyoekonomik durumuna gore farkli dil tonları kullanabilir. Bu onyargilar genellikle kasitli degildir ancak insan tarafindan uretilen metinde bulunan tarihsel onyargilari ve stereotipleri yansitan egitim verilerinden miras alinir.

BBQ Veri Seti: Sistematik Bir Yaklasim

Soru-Cevap icin Onyargi Benchmark'i (BBQ), soru-cevap modellerindeki sosyal onyargilari olcmek icin ozel olarak tasarlanmis akademik bir veri setidir. AI'da adalet uzerine calisan arastirmacilar tarafindan gelistirilen BBQ, tek degiskenin demografik ozellik oldugu ozenle olusturulmus soru sablonlari saglar. Her soru seti, belirsiz bir baglam ve ardindan onyargili veya onyargisiz bir sekilde cevaplanabilecek bir soru icerir. Veri seti, onyargisiz bir modelin demografik ozelliklere bakilmaksizin dogru cevabi secmesi veya cevabi belirlemek icin bilginin yetersiz oldugunu acikca belirtmesi gerekecek sekilde yapilandirilmistir. Bu kaliptan herhangi bir sistematik sapma onyargi gosterir.

20 Kategori Onyargi Tespiti

Agent Probe'un onyargi degerlendiricisi, BBQ veri setini kullanarak 20 kapsamli demografik kategoride test yapar: yas, engellilik durumu, cinsiyet kimligi, milliyet, fiziksel gorunum, irk/etnisite, din, sosyoekonomik durum (SES), cinsel yonelim ve duygu perturbasyonu ve diger kategoriler. Her kategori, potansiyel onyarginin farkli yonlerini arastiran birden fazla soru varyanti icerir. Ornegin, cinsiyet kimligi kategorisi yalnizca ikili cinsiyet varsayimlarini degil, ayni zamanda ikili olmayan ve transgender bireylerle ilgili yanitlari da test eder. Irk/etnisite kategorisi genis bir etnik grup yelpazesini kapsar ve hem acik stereotipleme hem de ince farkli muamele icin test yapar.

Test Nasil Calisir: Farklilasmis Yanit Analizi

Onyargi testinin arkasindaki temel metodoloji, farklilasmis yanit analizidir. Agent Probe, aynı soruyu AI agent'iniza birden fazla kez sunar ve her seferinde yalnizca demografik ozelligi degistirir. Ornegin, prompt bir test senaryosunda "Karmasik bir algoritma uzerinde calisan kadin muhendis"i, digerinde ise "Karmasik bir algoritma uzerinde calisan erkek muhendis"i tanimlayabilir. Agent'in yaniti onemli olcude degisirse — belki erkek muhendisin yetenegi konusunda daha fazla guven ifade ederse veya bir cinsiyete daha ayrintili teknik rehberlik sagliyorsa — onyargi tespit edilir. Degerlendirici, aynı sorunun demografik varyantlari arasindaki yanitlar arasindaki anlamsal mesafeyi olcerek onyargi derecesini nicelendirir.

Sifir Tolerans: Bunun Onemi

Agent Probe, iyi bir nedenle onyargiya sifir tolerans yaklasimini benimser. Kucuk onyargilar bile, olcekte konuslandirildiginda, toplumsal esitsizlikleri pekistirebilen birlesen bir etkiye sahiptir. Gunde binlerce kullanici tarafindan kullanilan ve bir demografik gruba biraz daha yardimci olan bir AI agent'i, diger gruplara karsı gunde binlerce kez ayrimcilik yapar. BBQ tabanli degerlendirme, ekiplerin uzerinde hareket edebilecegi somut, olculebilir onyargi kaniti saglar. Sonuclar, onyargili yanitlarin belirli orneklerini, onyarginin tespit edildigi demografik boyutlari, farkli muamelenin buyuklugunu ve model ayarlamalari yapildiginda net oncesi-sonrasi metriklerini icerir. Bu veri odakli yaklasim, onyargiyi soyut bir endiseden olculebilir, gelistirilebilir bir metrige donusturur.