Collections
The collections
module provides predefined collections of fairness scenarios, developed to maximize the diversity in how existing fair ML methods perform on them.
Prespecified Collections
fairml_datasets.collections.Corpus
Bases: Collection
The full corpus including all scenarios and datasets.
This collection contains all available datasets and their associated scenarios, providing a comprehensive set for fairness analysis across the entire corpus.
Source code in fairml_datasets/collections.py
Functions
__init__(inclue_large_datasets=True)
Initialize the Corpus with all available datasets and scenarios.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
inclue_large_datasets
|
Whether to include datasets marked as 'large' |
True
|
Source code in fairml_datasets/collections.py
fairml_datasets.collections.DecorrelatedSmall
Bases: PrespecifiedCollection
Collection of De-Correlated Datasets with k = 5.
This corresponds to Scenarios described in Table 3 identified with a k.
Source code in fairml_datasets/collections.py
fairml_datasets.collections.DecorrelatedLarge
Bases: PrespecifiedCollection
Collection of De-Correlated Datasets with tau = 0.
This corresponds to Scenarios described in Table 3 identified with a tau.
Source code in fairml_datasets/collections.py
fairml_datasets.collections.PermissivelyLicensedSmall
Bases: PrespecifiedCollection
Collection of Permissively Licensed Datasets with k = 5.
This corresponds to Scenarios described in Table 4 identified with a k.
Source code in fairml_datasets/collections.py
fairml_datasets.collections.PermissivelyLicensedLarge
Bases: PrespecifiedCollection
Collection of Permissively Licensed Datasets with tau = 0.
This corresponds to Scenarios described in Table 4 identified with a tau.
Source code in fairml_datasets/collections.py
fairml_datasets.collections.PermissivelyLicensedFull
Bases: PrespecifiedCollection
Full collection of Permissively Licensed Datasets.
This corresponds to all Scenarios described in Table 4.
Source code in fairml_datasets/collections.py
fairml_datasets.collections.GeographicSmall
Bases: PrespecifiedCollection
Collection of Geographically Diverse Datasets with k = 5.
This corresponds to Scenarios described in Table 5 identified with a k.
Source code in fairml_datasets/collections.py
fairml_datasets.collections.GeographicLarge
Bases: PrespecifiedCollection
Collection of Geographically Diverse Datasets with tau = 0.
This corresponds to Scenarios described in Table 5 identified with a tau.
Source code in fairml_datasets/collections.py
fairml_datasets.collections.GeographicFull
Bases: PrespecifiedCollection
Full collection of Geographically Diverse Datasets.
This corresponds to all Scenarios described in Table 5.
Source code in fairml_datasets/collections.py
Usage Examples
Using the Complete Corpus
from fairml_datasets.collections import Corpus
# Create the corpus (all available datasets and their scenarios)
corpus = Corpus(inclue_large_datasets=True)
# Iterate through all scenarios in the corpus
for scenario in corpus:
print(f"Dataset: {scenario.dataset_id}")
print(f"Sensitive columns: {scenario.sensitive_columns}")
# Load the data
df = scenario.load(stage="prepared")
Using Predefined Collections
from fairml_datasets.collections import DecorrelatedSmall, PermissivelyLicensedFull, GeographicLarge
# Use a small collection of decorrelated datasets
collection = DecorrelatedSmall()
print(f"Collection contains {len(collection)} scenarios")
# Or use the full collection of permissively licensed datasets
full_collection = PermissivelyLicensedFull()
print(f"Full collection contains {len(full_collection)} scenarios")
# Load and analyze datasets from the geographic collection
geo_collection = GeographicLarge()
for scenario in geo_collection:
print(f"Dataset: {scenario.dataset_id}")
print(f"Sensitive columns: {scenario.sensitive_columns}")
# Load the data
df = scenario.load(stage="prepared")