datasetops
Submodules
Package Contents
Classes
Contains information on how to access the raw data, and performs |
|
Contains information on how to access the raw data, and performs |
Functions
|
Predicate used for filtering/sampling a dataset classwise. |
|
Create a user defined transform. |
|
|
|
Transform data into a categorical int label. |
|
Transform data into a one-hot encoded label. |
|
Creates a template mapping function to be with one_hot. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Create dataset from a Pytorch dataset |
|
Load data from a folder with the data structure: |
|
Load data from a folder with the data structure: |
Load data from a folder with the data structure: |
|
Load data from .mat file consisting of multiple data. |
- class datasetops.Dataset(downstream_getter: datasetops.types.Union[datasetops.abstract.ItemGetter, Dataset], name: str = None, ids: datasetops.types.Ids = None, item_transform_fn: datasetops.types.ItemTransformFn = lambda x: ..., item_names: datasetops.types.Dict[str, int] = None)
Bases:
datasetops.abstract.AbstractDataset
Contains information on how to access the raw data, and performs sampling and splitting related operations.
- property shape: datasetops.types.Sequence[int]
Get the shape of a dataset item.
- Returns:
Sequence[int] – Item shapes
- property names: datasetops.types.List[str]
Get the names of the elements in an item.
- Returns:
List[str] – A list of element names
- __len__()
Return the total number of elements in the dataset.
- __getitem__(i: int) datasetops.types.Tuple
Returns the element at the specified index.
Parameters
- idxint
the index from which to read the sample.
- counts(*itemkeys: datasetops.types.Key) datasetops.types.List[datasetops.types.Tuple[datasetops.types.Any, int]]
Compute the counts of each unique item in the dataset.
Warning: this operation may be expensive for large datasets
- Arguments:
itemkeys {Union[str, int]} – The item keys (str) or indexes (int) to be checked for uniqueness. If no key is given, all item-parts must match for them to be considered equal
- Returns:
List[Tuple[Any,int]] – List of tuples, each containing the unique value and its number of occurences
- unique(*itemkeys: datasetops.types.Key) datasetops.types.List[datasetops.types.Any]
Compute a list of unique values in the dataset.
Warning: this operation may be expensive for large datasets
- Arguments:
itemkeys {str} – The item keys to be checked for uniqueness
- Returns:
List[Any] – List of the unique items
- sample(num: int, seed: int = None)
Sample data randomly from the dataset.
- Arguments:
num {int} – Number of samples. If the number of samples is larger than the dataset size, some samples may be samples multiple times
- Keyword Arguments:
seed {int} – Random seed (default: {None})
- Returns:
[Dataset] – Sampled dataset
- filter(predicates: datasetops.types.Optional[datasetops.types.Union[datasetops.types.DataPredicate, datasetops.types.Sequence[datasetops.types.Optional[datasetops.types.DataPredicate]]]] = None, **kwpredicates: datasetops.types.DataPredicate)
Filter a dataset using a predicate function.
- Keyword Arguments:
predicates {Union[DataPredicate, Sequence[Optional[DataPredicate]]]} – either a single or a list of functions taking a single dataset item and returning a bool if a single function is passed, it is applied to the whole item, if a list is passed, the functions are applied itemwise element-wise predicates can also be passed, if item_names have been named. kwpredicates {DataPredicate} – TODO
- Returns:
[Dataset] – A filtered Dataset
- split_filter(predicates: datasetops.types.Optional[datasetops.types.Union[datasetops.types.DataPredicate, datasetops.types.Sequence[datasetops.types.Optional[datasetops.types.DataPredicate]]]] = None, **kwpredicates: datasetops.types.DataPredicate)
Split a dataset using a predicate function.
- Keyword Arguments:
predicates {Union[DataPredicate, Sequence[Optional[DataPredicate]]]} – either a single or a list of functions taking a single dataset item and returning a bool. if a single function is passed, it is applied to the whole item, if a list is passed, the functions are applied itemwise element-wise predicates can also be passed, if item_names have been named.
- Returns:
[Dataset] – Two datasets, one that passed the predicate and one that didn’t
- shuffle(seed: int = None)
Shuffle the items in a dataset.
- Keyword Arguments:
seed {[int]} – Random seed (default: {None})
- Returns:
[Dataset] – Dataset with shuffled items
- split(fractions: datasetops.types.List[float], seed: int = None)
Split dataset into multiple datasets, determined by the fractions given.
A wildcard (-1) may be given at a single position, to fill in the rest. If fractions don’t add up, the last fraction in the list receives the remainding data.
- Arguments:
fractions {List[float]} – a list or tuple of floats i the interval ]0,1[. One of the items may be a -1 wildcard.
- Keyword Arguments:
seed {int} – Random seed (default: {None})
- Returns:
List[Dataset] – Datasets with the number of samples corresponding to the fractions given
- take(num: int)
Take the first elements of a dataset.
- Arguments:
num {int} – number of elements to take
- Returns:
Dataset – A dataset with only the first num elements
- repeat(times=1, mode='itemwise')
Repeat the dataset elements.
- Keyword Arguments:
times {int} – Number of times an element is repeated (default: {1}) mode {str} – Repeat ‘itemwise’ (i.e. [1,1,2,2,3,3]) or as a ‘whole’ (i.e. [1,2,3,1,2,3]) (default: {‘itemwise’})
- Returns:
[type] – [description]
- reorder(*keys: datasetops.types.Key)
Reorder items in the dataset (similar to numpy.transpose).
- Arguments:
new_inds {Union[int,str]} – positioned item index or key (if item names were previously set) of item
- Returns:
[Dataset] – Dataset with items whose elements have been reordered
- named(first: datasetops.types.Union[str, datasetops.types.Sequence[str]], *rest: str)
Set the names associated with the elements of an item.
- Arguments:
first {Union[str, Sequence[str]]} – The new item name(s)
- Returns:
[Dataset] – A Dataset whose item elements can be accessed by name
- transform(fns: datasetops.types.Optional[datasetops.types.Union[datasetops.types.ItemTransformFn, datasetops.types.Sequence[datasetops.types.Union[datasetops.types.ItemTransformFn, datasetops.types.DatasetTransformFn]]]] = None, **kwfns: datasetops.types.DatasetTransformFn)
Transform the items of a dataset according to some function (passed as argument).
- Arguments:
If a single function taking one input given, e.g. transform(lambda x: x), it will be applied to the whole item. If a list of functions are given, e.g. transform([image(), one_hot()]) they will be applied to the elements of the item corresponding to the position. If key is used, e.g. transform(data=lambda x:-x), the item associated with the key i transformed.
- Raises:
ValueError: If more functions are passed than there are elements in an item. KeyError: If a key doesn’t match
- Returns:
[Dataset] – Dataset whose items are transformed
- categorical(key: datasetops.types.Key, mapping_fn: datasetops.types.Callable[[datasetops.types.Any], int] = None)
Transform elements into categorical categoricals (int).
- Arguments:
key {Key} – Index of name for the element to be transformed
- Keyword Arguments:
mapping_fn {Callable[[Any], int]} – User defined mapping function (default: {None})
- Returns:
[Dataset] – Dataset with items that have been transformed to categorical labels
- one_hot(key: datasetops.types.Key, encoding_size: int = None, mapping_fn: datasetops.types.Callable[[datasetops.types.Any], int] = None, dtype='bool')
Transform elements into a categorical one-hot encoding.
- Arguments:
key {Key} – Index of name for the element to be transformed
- Keyword Arguments:
encoding_size {int} – The number of positions in the one-hot vector. If size it not provided, it we be automatically inferred (with a O(N) runtime cost) (default: {None}) mapping_fn {Callable[[Any], int]} – User defined mapping function (default: {None}) dtype {str} – Numpy datatype for the one-hot encoded data (default: {‘bool’})
- Returns:
[Dataset] – Dataset with items that have been transformed to categorical labels
- image(*positional_flags: datasetops.types.Any)
Transforms item elements that are either numpy arrays or path strings into a PIL.Image.Image.
- Arguments:
positional flags, e.g. (True, False) denoting which element should be converted. If no flags are supplied, all data that can be converted will be converted.
- Returns:
[Dataset] – Dataset with PIL.Image.Image elements
- numpy(*positional_flags: datasetops.types.Any)
Transforms elements into numpy.ndarray.
- Arguments:
positional flags, e.g. (True, False) denoting which element should be converted. If no flags are supplied, all data that can be converted will be converted.
- Returns:
[Dataset] – Dataset with np.ndarray elements
- zip(*datasets)
- cartesian_product(*datasets)
- concat(*datasets)
- reshape(*new_shapes: datasetops.types.Optional[datasetops.types.Shape], **kwshapes: datasetops.types.Optional[datasetops.types.Shape])
- image_resize(*new_sizes: datasetops.types.Optional[datasetops.types.Shape], **kwsizes: datasetops.types.Optional[datasetops.types.Shape])
- to_tensorflow()
- to_pytorch()
- datasetops.allow_unique(max_num_duplicates=1) datasetops.types.Callable[[datasetops.types.Any], bool]
Predicate used for filtering/sampling a dataset classwise.
- Keyword Arguments:
max_num_duplicates {int} – max number of samples to take that share the same value (default: {1})
- Returns:
Callable[[Any], bool] – Predicate function
- datasetops.custom(elem_transform_fn: datasetops.types.Callable[[datasetops.types.Any], datasetops.types.Any], elem_check_fn: datasetops.types.Callable[[datasetops.types.Any], None] = None) datasetops.types.DatasetTransformFn
Create a user defined transform.
- Arguments:
fn {Callable[[Any], Any]} – A user defined function, which takes the element as only argument
- Keyword Arguments:
check_fn {Callable[[Any]]} – A function that raises an Exception if the elem is incompatible (default: {None})
- Returns:
DatasetTransformFn – [description]
- datasetops.reshape(new_shape: datasetops.types.Shape) datasetops.types.DatasetTransformFn
- datasetops.categorical(mapping_fn: datasetops.types.Callable[[datasetops.types.Any], int] = None) datasetops.types.DatasetTransformFn
Transform data into a categorical int label.
- Arguments:
mapping_fn {Callable[[Any], int]} – A function transforming the input data to the integer label. If not specified, labels are automatically inferred from the data.
- Returns:
DatasetTransformFn – A function to be passed to the Dataset.transform()
- datasetops.one_hot(encoding_size: int, mapping_fn: datasetops.types.Callable[[datasetops.types.Any], int] = None, dtype='bool') datasetops.types.DatasetTransformFn
Transform data into a one-hot encoded label.
- Arguments:
encoding_size {int} – The size of the encoding mapping_fn {Callable[[Any], int]} – A function transforming the input data to an integer label. If not specified, labels are automatically inferred from the data.
- Returns:
DatasetTransformFn – A function to be passed to the Dataset.transform()
- datasetops.categorical_template(ds: Dataset, key: datasetops.types.Key) datasetops.types.Callable[[datasetops.types.Any], int]
Creates a template mapping function to be with one_hot.
- Arguments:
ds {Dataset} – Dataset from which to create a template for one_hot coding key {Key} – Dataset key (name or item index) on the one_hot coding is made
- Returns:
{Callable[[Any],int]} – mapping_fn for one_hot
- datasetops.numpy() datasetops.types.DatasetTransformFn
- datasetops.image() datasetops.types.DatasetTransformFn
- datasetops.image_resize(new_size: datasetops.types.Shape, resample=Image.NEAREST) datasetops.types.DatasetTransformFn
- datasetops.zipped(*datasets: datasetops.abstract.AbstractDataset)
- datasetops.cartesian_product(*datasets: datasetops.abstract.AbstractDataset)
- datasetops.concat(*datasets: datasetops.abstract.AbstractDataset)
- class datasetops.Loader(getdata: datasetops.types.Callable[[datasetops.types.Any], datasetops.types.Any], name: str = None)
Bases:
datasetops.dataset.Dataset
Contains information on how to access the raw data, and performs sampling and splitting related operations.
- append(identifier: datasetops.types.Data)
- extend(ids: datasetops.types.Union[datasetops.types.List[datasetops.types.Data], numpy.ndarray])
- datasetops.from_pytorch(pytorch_dataset)
Create dataset from a Pytorch dataset
- Arguments:
tf_dataset {torch.utils.data.Dataset} – A Pytorch dataset to load from
- Returns:
[Dataset] – A datasetops.Dataset
- datasetops.from_folder_data(path: datasetops.types.AnyPath) datasetops.dataset.Dataset
Load data from a folder with the data structure:
folder |- sample1.jpg |- sample2.jpg
- Arguments:
path {AnyPath} – path to folder
- Returns:
- Dataset – A dataset of data paths,
e.g. (‘nested_folder/class1/sample1.jpg’)
- datasetops.from_folder_class_data(path: datasetops.types.AnyPath) datasetops.dataset.Dataset
Load data from a folder with the data structure:
nested_folder |- class1
- datasetops.from_folder_dataset_class_data(path: datasetops.types.AnyPath) datasetops.types.List[datasetops.dataset.Dataset]
Load data from a folder with the data structure:
nested_folder |- dataset1
- datasetops.from_mat_single_mult_data(path: datasetops.types.AnyPath) datasetops.types.List[datasetops.dataset.Dataset]
Load data from .mat file consisting of multiple data.
E.g. a .mat file with keys [‘X_src’, ‘Y_src’, ‘X_tgt’, ‘Y_tgt’]
- Arguments:
path {AnyPath} – path to .mat file
- Returns:
- List[Dataset] – A list of datasets, where a dataset was created for each suffix
e.g. a dataset with data from the keys (‘X_src’, ‘Y_src’) and from (‘X_tgt’, ‘Y_tgt’)