Command-Line Interface

FairML Datasets provides a command-line interface (CLI) for common operations on fairness datasets. This page documents the available commands and their options.

Overview

The CLI is accessible via the fairml_datasets module:

python -m fairml_datasets [COMMAND] [OPTIONS]

Available Commands

python -m fairml_datasets

Command-line interface for the fairml datasets package.

Usage:

python -m fairml_datasets [OPTIONS] COMMAND [ARGS]...

Options:

  --debug  Enable debug logging.
  --help   Show this message and exit.

export-citations

Export dataset citations as a .bib file.

This command collects all citations from either all datasets or the specified datasets and exports them to a .bib file, ensuring duplicate citations are only included once.

Usage:

python -m fairml_datasets export-citations [OPTIONS]

Options:

  -o, --output TEXT  Output file for the citations in .bib format.
  --ids TEXT         Comma-separated list of dataset IDs to export citations
                     for. If not provided, exports citations for all datasets.
  --help             Show this message and exit.

export-datasets

Export datasets as files.

Usage:

python -m fairml_datasets export-datasets [OPTIONS]

Options:

  -s, --stage [downloaded|loaded|prepared|binarized|transformed|split]
                                  At which stage of processing to export the
                                  data.
  --id TEXT                       Export only a single dataset.
  --collection [DecorrelatedSmall|DecorrelatedLarge|PermissivelyLicensedSmall|PermissivelyLicensedLarge|PermissivelyLicensedFull|GeographicSmall|GeographicLarge|GeographicFull]
                                  Export all datasets from a specific
                                  collection.
  --include-large-datasets        Include large datasets in the export.
  --include-usage-info            Whether to also export information regarding
                                  the role of different columns e.g. which
                                  ones are features, sensitive and target.
  -o, --output-path TEXT          Directory path where the exported datasets
                                  will be saved.
  -f, --format [csv|parquet]      Output format for the exported datasets.
  --help                          Show this message and exit.

metadata

Generate and save metadata for the datasets.

Usage:

python -m fairml_datasets metadata [OPTIONS]

Options:

  -f, --file TEXT                 Which file to write the metadata to, the
                                  ending will determine the format (csv and
                                  json supported).
  --id TEXT                       Generate metadata for only a single dataset.
  --include-large-datasets        Include large datasets in the metadata
                                  generation (only used if descriptives are
                                  computed).
  --type [annotations|descriptives|all]
                                  Type of metadata to generate (annotations,
                                  descriptives, or both).
  --help                          Show this message and exit.

Examples

Generating Metadata

Generate and save metadata for all datasets:

python -m fairml_datasets metadata

Export metadata in JSON format:

python -m fairml_datasets metadata -f metadata.json

Generate metadata for a specific dataset:

python -m fairml_datasets metadata --id adult

Exporting Datasets

Export all datasets in prepared format:

python -m fairml_datasets export-datasets --stage prepared

Export a specific dataset with train/test/validation splits:

python -m fairml_datasets export-datasets --id adult --stage split

Include usage information:

python -m fairml_datasets export-datasets --include-usage-info

Exporting Citations

Export citations for all datasets:

python -m fairml_datasets export-citations

Export citations for specific datasets:

python -m fairml_datasets export-citations --ids adult,compas