Fails federal fair lending standards

This loan approval AI is 98.8% accurate.
It also breaks federal law.

A forensic audit of algorithmic bias in US mortgage lending, across 50,000 real HMDA applications. XGBoost, SHAP, and threshold debiasing. What the model sees, what it hides, and what the law says about it.

12 min read6 chaptersInteractive tool
Signal
01
3/4
Failing 4/5 rule
Signal
02
0.707
Pacific Islander DIR
Signal
03
10/15
Intersectional violations
Scroll
Chapter 01 · The 4/5 rule

A rule written in 1978, still the line between bias and crime.

If a protected group is approved at less than 80% of the most-approved group's rate, US regulators treat it as evidence of discrimination. Simple math, serious consequences.

Imagine 10 applicants from each group
White applicants85.1%
9 of 10 approved
Black applicants67.5%
7 of 10 approved
Disparate Impact Ratio
0.793
Federal threshold
0.800
0.80

67.5% / 85.1% = 0.7934, below the 0.80 legal floor.

Chapter 03 · The smoking gun

The model doesn’t see race.

It sees ZIP code.

And ZIP code, in America, is race.

On the rediscovery of redlining by machine
Tract minority population %Approval rate02550751000.000.250.500.751.00r = 0.416
Exhibit An=50,000 HMDA applications, 2024. Density-binned correlation between tract-level minority population and individual approval. Each hexagon represents the local concentration of applications.
0.416
Pearson r · Tract minority % ↔ race

The XGBoost model was trained without race as an input. It learned to use something else.

The 5th most important feature in the model is the percentage of minority population in the census tract where the property sits. Its correlation with the applicant’s actual race is 0.416, stronger than any other feature in the dataset.

The model never saw race. It saw geography. In America, that distinction collapses. This is redlining, rediscovered by a machine.

Exhibit B· SHAP feature importance, top 5
  1. 01
    interest_rate
    0.545
  2. 02
    purchaser_type
    0.382
  3. 03
    applicant_credit_score_type
    0.045
  4. 04
    income
    0.003
  5. 05
    tract_minority_population_percentProxy
    0.011

    Unlike the top 4 features, which encode borrower risk signals, this feature encodes where the property is. Geography is not a credit attribute. It is a demographic one.

Chapter 04 · The fix

Three lines of code,four groups restoredto legal compliance.

Microsoft’s Fairlearn library includes a ThresholdOptimizer. Apply it once, with the demographic_parity constraint, and the model that was violating federal law is no longer violating federal law. The accuracy cost: 3.39 percentage points. The discrimination, it turns out, was a choice.

Exhibit A· Disparate Impact Ratio, before and after ThresholdOptimizer
Asian
0.80 · Legal floor
0.9620.976
Black
0.7931.005
Native American
0.7660.946
Pacific Islander
0.0
0.5
1.0
1.2
0.7071.014

White applicants are the reference group (DIR = 1.000 by definition) and are omitted from this view.

The accuracy cost, in plain English.

For every 100 loan applications the original model evaluated, it assigned the correct label (approve or deny) 98.78 times. The debiased model gets 95.39 correct. The tradeoff is 3.39 applications per 100. The gain: full legal compliance across every demographic group the federal government protects.

Original model · 98.78% correct
98.78
Debiased model · 95.39% correct + legal
95.39
The implementation
from fairlearn.postprocessing import ThresholdOptimizer
 
# Three lines to go from violation to compliance
optimizer = ThresholdOptimizer(
estimator=xgb_model,
constraints="demographic_parity",
objective="accuracy_score",
)
 
optimizer.fit(X_train, y_train, sensitive_features=race)
y_pred = optimizer.predict(X_test, sensitive_features=race)

Fairlearn v0.12.0 · Microsoft Research · Open source

The implication

The tools exist.

The data is public.

The law is clear.

Every biased lending model in production is a decision,

not a mistake.

End of chapter 04
Chapter 05 · Try it

The audit, as a toolyou can use.

This calibrator runs a simplified version of our audited model directly in your browser. It uses the actual per-race thresholds produced by the ThresholdOptimizer. Change the applicant’s race below and watch the threshold shift. The decision logic is the exact calibration that brings all groups into compliance with federal lending law.

NOTE · This is a calibrated approximation of the production XGBoost model, not the full tree ensemble. It preserves the threshold logic for demonstration purposes.

The default values are calibrated to demonstrate the threshold contrast. Run them as-is, then change the race to see the calibration in action.

Loan application

Enter the applicant details

$
%
$
$
%
Loan term (months)
Sex
Race

Race and sex are NOT inputs to the underlying model. They are used only to apply the correct ThresholdOptimizer calibration for demographic parity.

Live prediction
Decision
DENIED
22.7%Confidence
Raw 0.23·Threshold 0.996
Fairness adjustment
0.996
Per-race threshold
Default 0.5
0.996

More selective than the default to prevent over-approval relative to baseline.

Disparate impact ratio
Reference
1.000

White is the reference group. DIR = 1.000 by construction; every other group is measured relative to this baseline.

Top 3 decision factors
Interest rate
6.50%
Debt-to-income ratio
36.0%
Annual income
$95,000
Chapter 06 · The receipts

Don’t believe me?Here’s every number,every formula,every model.

Radical transparency is the opposite of how most lending audits are conducted. Most live behind NDAs. This one doesn’t. Below is every dataset, every preprocessing decision, every hyperparameter, every fairness metric formula, and every library version used to produce the findings on this page. If you find a flaw, the data and code are public. Reproduce it. Challenge it. That is the point.

Exhibits · Chapter 06 · Methodology of audit
ExhibitA
Data source

Training data

  • Dataset·Home Mortgage Disclosure Act (HMDA) Loan Application Register (LAR)
  • Year·2024 (full disclosure year)
  • Data release schedule·2024 represents the most recent full-year disclosure available; 2025 release scheduled Q3 2026
  • Geographic scope·National
  • Sample size analyzed·50,000 applications
  • Sampling·Stratified by loan_type, action_taken, and applicant_race to preserve original distribution
  • Held-out test set·10,000 applications (20% of total)
SourceU.S. Consumer Financial Protection Bureau, ffiec.cfpb.gov
ExhibitB
Model design

Preprocessing

  • Missing value handling·Median imputation for continuous, mode for categorical
  • Categorical encoding·One-hot encoding for low-cardinality, target encoding for high-cardinality
  • Feature scaling·StandardScaler on continuous features
  • Sensitive attributes (race, sex) excluded from feature matrix
  • Train/test split·80/20 stratified by action_taken
Sourcescikit-learn v1.5.2
ExhibitC
Model training

Models trained

  • Logistic Regression·max_iter=1000, C=1.0, penalty=l2 (baseline)
  • Random Forest·n_estimators=200, max_depth=null, class_weight=balanced
  • XGBoost (selected)·n_estimators=200, max_depth=6, learning_rate=0.1, subsample=0.85
  • Test set accuracy·0.9878 · F1: 0.9897 · AUC-ROC: 0.9987
Sourcexgboost v2.1.0
ExhibitD
Validation

Fairness metrics computed

  • Disparate Impact Ratio (DIR)·P(approved | minority) / P(approved | reference)
  • Demographic Parity Difference (DPD)·P(approved | minority) − P(approved | reference)
  • Equalized Odds Difference (EOD)·max(|TPR difference|, |FPR difference|) across groups
  • Reference group·White (largest group, set as baseline per industry convention)
  • Legal threshold for DIR·0.80 per EEOC Uniform Guidelines, 29 CFR 1607.4(D)
Sourcefairlearn v0.12.0
ExhibitE
Data analysis

Intersectional analysis

  • Groups computed·5 races × 3 sex categories (Male, Female, Joint) = 15 cells
  • Joint = applications with two co-applicants of different recorded sex
  • Per-cell metrics·approval rate, DIR, n
  • Reference cell·White Male (largest cell, n=4,238 in test set)
  • Cells failing 4/5 rule·10 of 15 (66.7%)
SourceCustom implementation
ExhibitF
Validation

SHAP explainability

  • Method·TreeExplainer on the XGBoost model
  • Sample size for explanation·5,000 test observations
  • Top features by mean absolute SHAP value: interest_rate (0.545), purchaser_type (0.382), applicant_credit_score_type (0.045), tract_minority_population_percent (0.011)
  • Proxy detection threshold·|Pearson r| > 0.15 with race
Sourceshap v0.46.0
ExhibitG
Intervention

Debiasing methods tested

  • Reweighting·balanced sample weights via fairlearn.preprocessing.Reweighing
  • ExponentiatedGradient·with DemographicParity constraint, eps=0.01
  • ThresholdOptimizer (selected)·per-group thresholds optimized for demographic_parity, objective=accuracy_score
  • Selected because·only method achieving 4/4 group compliance with 4/5 rule
  • Per-group thresholds applied·White 0.996, Asian 0.963, Black 0.007, Native American 0.004, Pacific Islander 0.002
Sourcefairlearn v0.12.0
ExhibitH
Infrastructure

Compute

  • Hardware·MacBook Pro M3 Pro, 18GB RAM (XGBoost training)
  • Platform·Zerve cloud notebooks (full pipeline reproduction for ZerveHack 2026)
  • Total runtime·~12 minutes for full audit pipeline
SourceZerve, zerve.ai
Statement of provenance

This audit was conducted as a single-author research artifact for the ZerveHack 2026 hackathon. No vendor relationships influenced the methodology. The dataset is publicly available; the libraries are open source; the methodology follows established conventions in the algorithmic fairness literature. The findings are reproducible from the raw HMDA disclosure files.

For corrections, methodological challenges, or replication assistance, contact the author.

Audit metadata
Audit period
April 24, 2026
Data version
HMDA 2024 LAR
Compute environment
Zerve cloud
Code language
Python 3.12
Random seeds
numpy 42, sklearn 42, xgb 42
Reproducibility
full
License
MIT (code), public (data)
How to cite this work
Singh, S. (2026). FairLens: A Forensic Audit of Algorithmic Bias in US Mortgage Lending [Hackathon submission]. ZerveHack 2026. https://fairlensweb.vercel.app/
CodeAudit
SS
Author
Shauryaditya Singh
M.S. Applied Artificial Intelligence (Data Engineering)
Stevens Institute of Technology, Hoboken, NJ
Submitted to ZerveHack 2026
End of investigation
Last updated · April 25, 2026