Datasets
The Datasets
class provides a collection of datasets with methods for filtering and batch operations. It serves as an alternative entry point for accessing individual datasets in the package.
Class Documentation
fairml_datasets.datasets.Datasets
Helper class to easily work with multiple datasets.
This class provides interfaces for accessing and working with collections of Dataset objects, allowing for batch operations like metadata generation across multiple datasets.
Source code in fairml_datasets/datasets.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 |
|
Functions
__getitem__(index)
Get a dataset by index or ID.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
index
|
Union[int, str]
|
Integer index or string ID of the dataset |
required |
Returns:
Name | Type | Description |
---|---|---|
Dataset |
Dataset
|
The requested dataset object |
Raises:
Type | Description |
---|---|
AssertionError
|
If index is not an integer or string |
Source code in fairml_datasets/datasets.py
__init__(ids=None, inclue_large_datasets=False, df_info=None)
Initialize a Datasets collection.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
ids
|
Optional[List[str]]
|
Optional list of dataset IDs to include |
None
|
inclue_large_datasets
|
bool
|
Whether to include datasets marked as 'large' |
False
|
df_info
|
Optional[DataFrame]
|
Optional DataFrame containing dataset annotations (will be loaded if None) |
None
|
Source code in fairml_datasets/datasets.py
__iter__()
Make the Datasets collection iterable, yielding Dataset objects.
Yields:
Name | Type | Description |
---|---|---|
Dataset |
Dataset
|
The next dataset in the collection |
Source code in fairml_datasets/datasets.py
__len__()
Get the number of datasets in the collection.
Returns:
Name | Type | Description |
---|---|---|
int |
int
|
The number of datasets |
generate_metadata(progress_bar=True)
Generate a dataframe of metadata for all datasets in the collection.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
progress_bar
|
bool
|
Whether to display a progress bar during generation |
True
|
Returns:
Type | Description |
---|---|
DataFrame
|
pd.DataFrame: DataFrame containing metadata for all datasets |
Source code in fairml_datasets/datasets.py
get_ids()
Get a list of all dataset IDs in the collection.
Returns:
Type | Description |
---|---|
List[str]
|
List[str]: List of dataset IDs |