Fails federal fair lending standards

This loan approval AI is 98.8% accurate.
It also breaks federal law.

A forensic audit of algorithmic bias in US mortgage lending, across 50,000 real HMDA applications. XGBoost, SHAP, and threshold debiasing. What the model sees, what it hides, and what the law says about it.

12 min read6 chaptersInteractive tool

See the audit Try the tool

Signal
N° 01

3/4

Failing 4/5 rule

Signal
N° 02

0.707

Pacific Islander DIR

Signal
N° 03

10/15

Intersectional violations

Scroll

Chapter 01 · The 4/5 rule

A rule written in 1978, still the line between bias and crime.

If a protected group is approved at less than 80% of the most-approved group's rate, US regulators treat it as evidence of discrimination. Simple math, serious consequences.

Imagine 10 applicants from each group

White applicants85.1%

9 of 10 approved

Black applicants67.5%

7 of 10 approved

Disparate Impact Ratio

0.793

Federal threshold

0.800

0.80

67.5% / 85.1% = 0.7934, below the 0.80 legal floor.

Chapter 03 · The smoking gun

The model doesn’t see race.

It sees ZIP code.

And ZIP code, in America, is race.

On the rediscovery of redlining by machine

Exhibit An=50,000 HMDA applications, 2024. Density-binned correlation between tract-level minority population and individual approval. Each hexagon represents the local concentration of applications.

−0.416

Pearson r · Tract minority % ↔ race

The XGBoost model was trained without race as an input. It learned to use something else.

The 5th most important feature in the model is the percentage of minority population in the census tract where the property sits. Its correlation with the applicant’s actual race is −0.416, stronger than any other feature in the dataset.

The model never saw race. It saw geography. In America, that distinction collapses. This is redlining, rediscovered by a machine.

Exhibit B· SHAP feature importance, top 5

01
interest_rate
0.545
02
purchaser_type
0.382
03
applicant_credit_score_type
0.045
04
income
0.003
05
tract_minority_population_percentProxy
0.011
Unlike the top 4 features, which encode borrower risk signals, this feature encodes where the property is. Geography is not a credit attribute. It is a demographic one.

Chapter 04 · The fix

Three lines of code,four groups restoredto legal compliance.

Microsoft’s Fairlearn library includes a ThresholdOptimizer. Apply it once, with the demographic_parity constraint, and the model that was violating federal law is no longer violating federal law. The accuracy cost: 3.39 percentage points. The discrimination, it turns out, was a choice.

Exhibit A· Disparate Impact Ratio, before and after ThresholdOptimizer

Asian

0.80 · Legal floor

0.962→0.976

Black

0.793→1.005

Native American

0.766→0.946

Pacific Islander

0.0

0.5

1.0

1.2

0.707→1.014

White applicants are the reference group (DIR = 1.000 by definition) and are omitted from this view.

The accuracy cost, in plain English.

For every 100 loan applications the original model evaluated, it assigned the correct label (approve or deny) 98.78 times. The debiased model gets 95.39 correct. The tradeoff is 3.39 applications per 100. The gain: full legal compliance across every demographic group the federal government protects.

Original model · 98.78% correct

98.78

Debiased model · 95.39% correct + legal

95.39

The implementation

from fairlearn.postprocessing import ThresholdOptimizer
 
# Three lines to go from violation to compliance
optimizer = ThresholdOptimizer(
    estimator=xgb_model,
    constraints="demographic_parity",
    objective="accuracy_score",
)
 
optimizer.fit(X_train, y_train, sensitive_features=race)
y_pred = optimizer.predict(X_test, sensitive_features=race)

Fairlearn v0.12.0 · Microsoft Research · Open source

The implication

The tools exist.

The data is public.

The law is clear.

Every biased lending model in production is a decision,

not a mistake.

End of chapter 04

Chapter 05 · Try it

The audit, as a toolyou can use.

This calibrator runs a simplified version of our audited model directly in your browser. It uses the actual per-race thresholds produced by the ThresholdOptimizer. Change the applicant’s race below and watch the threshold shift. The decision logic is the exact calibration that brings all groups into compliance with federal lending law.

NOTE · This is a calibrated approximation of the production XGBoost model, not the full tree ensemble. It preserves the threshold logic for demonstration purposes.

The default values are calibrated to demonstrate the threshold contrast. Run them as-is, then change the race to see the calibration in action.

Live prediction

Decision

DENIED

22.7%Confidence

Raw 0.23·Threshold 0.996

Fairness adjustment

0.996

Per-race threshold

Default 0.5

0.996

More selective than the default to prevent over-approval relative to baseline.

Disparate impact ratio

Reference

1.000

White is the reference group. DIR = 1.000 by construction; every other group is measured relative to this baseline.

Top 3 decision factors

Interest rate

6.50%

Debt-to-income ratio

36.0%

Annual income

$95,000

Chapter 06 · The receipts

Don’t believe me?Here’s every number,every formula,every model.

Radical transparency is the opposite of how most lending audits are conducted. Most live behind NDAs. This one doesn’t. Below is every dataset, every preprocessing decision, every hyperparameter, every fairness metric formula, and every library version used to produce the findings on this page. If you find a flaw, the data and code are public. Reproduce it. Challenge it. That is the point.

Exhibits · Chapter 06 · Methodology of audit

ExhibitA

Data source

Training data

Dataset·Home Mortgage Disclosure Act (HMDA) Loan Application Register (LAR)
Year·2024 (full disclosure year)
Data release schedule·2024 represents the most recent full-year disclosure available; 2025 release scheduled Q3 2026
Geographic scope·National
Sample size analyzed·50,000 applications
Sampling·Stratified by loan_type, action_taken, and applicant_race to preserve original distribution
Held-out test set·10,000 applications (20% of total)

SourceU.S. Consumer Financial Protection Bureau, ffiec.cfpb.gov

ExhibitB

Model design

Preprocessing

Missing value handling·Median imputation for continuous, mode for categorical
Categorical encoding·One-hot encoding for low-cardinality, target encoding for high-cardinality
Feature scaling·StandardScaler on continuous features
Sensitive attributes (race, sex) excluded from feature matrix
Train/test split·80/20 stratified by action_taken

Sourcescikit-learn v1.5.2

ExhibitC

Model training

Models trained

Logistic Regression·max_iter=1000, C=1.0, penalty=l2 (baseline)
Random Forest·n_estimators=200, max_depth=null, class_weight=balanced
XGBoost (selected)·n_estimators=200, max_depth=6, learning_rate=0.1, subsample=0.85
Test set accuracy·0.9878 · F1: 0.9897 · AUC-ROC: 0.9987

Sourcexgboost v2.1.0

ExhibitD

Validation

Fairness metrics computed

Disparate Impact Ratio (DIR)·P(approved | minority) / P(approved | reference)
Demographic Parity Difference (DPD)·P(approved | minority) − P(approved | reference)
Equalized Odds Difference (EOD)·max(|TPR difference|, |FPR difference|) across groups
Reference group·White (largest group, set as baseline per industry convention)
Legal threshold for DIR·0.80 per EEOC Uniform Guidelines, 29 CFR 1607.4(D)

Sourcefairlearn v0.12.0

ExhibitE

Data analysis

Intersectional analysis

Groups computed·5 races × 3 sex categories (Male, Female, Joint) = 15 cells
Joint = applications with two co-applicants of different recorded sex
Per-cell metrics·approval rate, DIR, n
Reference cell·White Male (largest cell, n=4,238 in test set)
Cells failing 4/5 rule·10 of 15 (66.7%)

SourceCustom implementation

ExhibitF

Validation

SHAP explainability

Method·TreeExplainer on the XGBoost model
Sample size for explanation·5,000 test observations
Top features by mean absolute SHAP value: interest_rate (0.545), purchaser_type (0.382), applicant_credit_score_type (0.045), tract_minority_population_percent (0.011)
Proxy detection threshold·|Pearson r| > 0.15 with race

Sourceshap v0.46.0

ExhibitG

Intervention

Debiasing methods tested

Reweighting·balanced sample weights via fairlearn.preprocessing.Reweighing
ExponentiatedGradient·with DemographicParity constraint, eps=0.01
ThresholdOptimizer (selected)·per-group thresholds optimized for demographic_parity, objective=accuracy_score
Selected because·only method achieving 4/4 group compliance with 4/5 rule
Per-group thresholds applied·White 0.996, Asian 0.963, Black 0.007, Native American 0.004, Pacific Islander 0.002

Sourcefairlearn v0.12.0

ExhibitH

Infrastructure

Compute

Hardware·MacBook Pro M3 Pro, 18GB RAM (XGBoost training)
Platform·Zerve cloud notebooks (full pipeline reproduction for ZerveHack 2026)
Total runtime·~12 minutes for full audit pipeline

SourceZerve, zerve.ai

Statement of provenance

This audit was conducted as a single-author research artifact for the ZerveHack 2026 hackathon. No vendor relationships influenced the methodology. The dataset is publicly available; the libraries are open source; the methodology follows established conventions in the algorithmic fairness literature. The findings are reproducible from the raw HMDA disclosure files.

For corrections, methodological challenges, or replication assistance, contact the author.

Audit metadata

Audit period: April 24, 2026
Data version: HMDA 2024 LAR
Compute environment: Zerve cloud
Code language: Python 3.12
Random seeds: numpy 42, sklearn 42, xgb 42
Reproducibility: full
License: MIT (code), public (data)

How to cite this work

Singh, S. (2026). FairLens: A Forensic Audit of Algorithmic Bias in US Mortgage Lending [Hackathon submission]. ZerveHack 2026. https://fairlensweb.vercel.app/

Code Audit

Author

Shauryaditya Singh

M.S. Applied Artificial Intelligence (Data Engineering)

Stevens Institute of Technology, Hoboken, NJ

Submitted to ZerveHack 2026

End of investigation

Last updated · April 25, 2026

This loan approval AI is 98.8% accurate.
It also breaks federal law.

A rule written in 1978, still the line between bias and crime.

Three lines of code,four groups restoredto legal compliance.

The accuracy cost, in plain English.

The audit, as a toolyou can use.

Enter the applicant details

Don’t believe me?Here’s every number,every formula,every model.

Training data

Preprocessing

Models trained

Fairness metrics computed

Intersectional analysis

SHAP explainability

Debiasing methods tested

Compute

Approval rates diverge by ~25 points

Three of four minority groups fail the legal floor

Native American women sit at DIR 0.544

Interest rate dominates, geography quietly leaks race

This loan approval AI is 98.8% accurate.It also breaks federal law.

Three lines of code,four groups restoredto legal compliance.

The accuracy cost, in plain English.

The audit, as a toolyou can use.

Don’t believe me?Here’s every number,every formula,every model.

Training data

Preprocessing

Models trained

Fairness metrics computed

Intersectional analysis

SHAP explainability

Debiasing methods tested

Compute

Approval rates diverge by ~25 points

Three of four minority groups fail the legal floor

Native American women sit at DIR 0.544

Interest rate dominates, geography quietly leaks race

This loan approval AI is 98.8% accurate.
It also breaks federal law.