Frameworks for fairness evaluation

Several frameworks offer structured approaches to assess AI systems for fair treatment across groups.

Demographic parity measures whether different groups receive the same proportion of positive outcomes, ensuring equal representation but potentially ignoring qualified differences. For example, a loan approval AI achieves demographic parity when it approves 30% of applications from all demographic groups, regardless of qualifications. However, this might ignore meaningful differences in qualifications.
Equality of opportunity focuses on whether qualified individuals have equal chances of receiving positive predictions from the AI. A hiring AI demonstrates this when qualified candidates from all demographic groups have the same chance of being recommended for interviews.
Counterfactual fairness evaluates whether changing only protected attributes (like gender or race) would alter the AI's decision. This means a resume screening AI should make the same recommendation for identical qualifications regardless of the applicant's demographic information.^[1]

These frameworks help designers move beyond vague notions of "unbiased AI" toward measurable criteria. By implementing specific evaluation methods, comparing approval rates, qualified candidate success rates, or paired examples, designers can identify precisely where an AI system might be treating groups inequitably and take targeted action to address these disparities.