`datasetops`

Submodules

Package Contents

Classes

`Dataset`	Contains information on how to access the raw data, and performs
`Loader`	Contains information on how to access the raw data, and performs

Functions

`allow_unique`(...)	Predicate used for filtering/sampling a dataset classwise.
`custom`(→ datasetops.types.DatasetTransformFn)	Create a user defined transform.
`reshape`(→ datasetops.types.DatasetTransformFn)
`categorical`(→ datasetops.types.DatasetTransformFn)	Transform data into a categorical int label.
`one_hot`(→ datasetops.types.DatasetTransformFn)	Transform data into a one-hot encoded label.
`categorical_template`(...)	Creates a template mapping function to be with one_hot.
`numpy`(→ datasetops.types.DatasetTransformFn)
`image`(→ datasetops.types.DatasetTransformFn)
`image_resize`(→ datasetops.types.DatasetTransformFn)
`zipped`(*datasets)
`cartesian_product`(*datasets)
`concat`(*datasets)
`to_tensorflow`(dataset)
`to_pytorch`(dataset)
`from_pytorch`(pytorch_dataset)	Create dataset from a Pytorch dataset
`from_folder_data`(→ datasetops.dataset.Dataset)	Load data from a folder with the data structure:
`from_folder_class_data`(→ datasetops.dataset.Dataset)	Load data from a folder with the data structure:
`from_folder_dataset_class_data`(...)	Load data from a folder with the data structure:
`from_mat_single_mult_data`(...)	Load data from .mat file consisting of multiple data.

class datasetops.Dataset(downstream_getter: datasetops.types.Union[datasetops.abstract.ItemGetter, Dataset], name: str = None, ids: datasetops.types.Ids = None, item_transform_fn: datasetops.types.ItemTransformFn = lambda x: ..., item_names: datasetops.types.Dict[str, int] = None)

Bases: datasetops.abstract.AbstractDataset

Contains information on how to access the raw data, and performs sampling and splitting related operations.

property shape: datasetops.types.Sequence[int]

Get the shape of a dataset item.

Returns:: Sequence[int] – Item shapes

property names: datasetops.types.List[str]

Get the names of the elements in an item.

Returns:: List[str] – A list of element names

__len__(): Return the total number of elements in the dataset.

__getitem__(i: int) → datasetops.types.Tuple

Returns the element at the specified index.

Parameters

idxint: the index from which to read the sample.

counts(*itemkeys: datasetops.types.Key) → datasetops.types.List[datasetops.types.Tuple[datasetops.types.Any, int]]

Compute the counts of each unique item in the dataset.

Warning: this operation may be expensive for large datasets

Arguments:: itemkeys {Union[str, int]} – The item keys (str) or indexes (int) to be checked for uniqueness. If no key is given, all item-parts must match for them to be considered equal
Returns:: List[Tuple[Any,int]] – List of tuples, each containing the unique value and its number of occurences

unique(*itemkeys: datasetops.types.Key) → datasetops.types.List[datasetops.types.Any]

Compute a list of unique values in the dataset.

Warning: this operation may be expensive for large datasets

Arguments:: itemkeys {str} – The item keys to be checked for uniqueness
Returns:: List[Any] – List of the unique items

sample(num: int, seed: int = None)

Sample data randomly from the dataset.

Arguments:: num {int} – Number of samples. If the number of samples is larger than the dataset size, some samples may be samples multiple times
Keyword Arguments:: seed {int} – Random seed (default: {None})
Returns:: [Dataset] – Sampled dataset

filter(predicates: datasetops.types.Optional[datasetops.types.Union[datasetops.types.DataPredicate, datasetops.types.Sequence[datasetops.types.Optional[datasetops.types.DataPredicate]]]] = None, **kwpredicates: datasetops.types.DataPredicate)

Filter a dataset using a predicate function.

Keyword Arguments:: predicates {Union[DataPredicate, Sequence[Optional[DataPredicate]]]} – either a single or a list of functions taking a single dataset item and returning a bool if a single function is passed, it is applied to the whole item, if a list is passed, the functions are applied itemwise element-wise predicates can also be passed, if item_names have been named. kwpredicates {DataPredicate} – TODO
Returns:: [Dataset] – A filtered Dataset

split_filter(predicates: datasetops.types.Optional[datasetops.types.Union[datasetops.types.DataPredicate, datasetops.types.Sequence[datasetops.types.Optional[datasetops.types.DataPredicate]]]] = None, **kwpredicates: datasetops.types.DataPredicate)

Split a dataset using a predicate function.

Keyword Arguments:: predicates {Union[DataPredicate, Sequence[Optional[DataPredicate]]]} – either a single or a list of functions taking a single dataset item and returning a bool. if a single function is passed, it is applied to the whole item, if a list is passed, the functions are applied itemwise element-wise predicates can also be passed, if item_names have been named.
Returns:: [Dataset] – Two datasets, one that passed the predicate and one that didn’t

shuffle(seed: int = None)

Shuffle the items in a dataset.

Keyword Arguments:: seed {[int]} – Random seed (default: {None})
Returns:: [Dataset] – Dataset with shuffled items

split(fractions: datasetops.types.List[float], seed: int = None)

Split dataset into multiple datasets, determined by the fractions given.

A wildcard (-1) may be given at a single position, to fill in the rest. If fractions don’t add up, the last fraction in the list receives the remainding data.

Arguments:: fractions {List[float]} – a list or tuple of floats i the interval ]0,1[. One of the items may be a -1 wildcard.
Keyword Arguments:: seed {int} – Random seed (default: {None})
Returns:: List[Dataset] – Datasets with the number of samples corresponding to the fractions given

take(num: int)

Take the first elements of a dataset.

Arguments:: num {int} – number of elements to take
Returns:: Dataset – A dataset with only the first num elements

repeat(times=1, mode='itemwise')

Repeat the dataset elements.

Keyword Arguments:: times {int} – Number of times an element is repeated (default: {1}) mode {str} – Repeat ‘itemwise’ (i.e. [1,1,2,2,3,3]) or as a ‘whole’ (i.e. [1,2,3,1,2,3]) (default: {‘itemwise’})
Returns:: [type] – [description]

reorder(*keys: datasetops.types.Key)

Reorder items in the dataset (similar to numpy.transpose).

Arguments:: new_inds {Union[int,str]} – positioned item index or key (if item names were previously set) of item
Returns:: [Dataset] – Dataset with items whose elements have been reordered

named(first: datasetops.types.Union[str, datasetops.types.Sequence[str]], *rest: str)

Set the names associated with the elements of an item.

Arguments:: first {Union[str, Sequence[str]]} – The new item name(s)
Returns:: [Dataset] – A Dataset whose item elements can be accessed by name

transform(fns: datasetops.types.Optional[datasetops.types.Union[datasetops.types.ItemTransformFn, datasetops.types.Sequence[datasetops.types.Union[datasetops.types.ItemTransformFn, datasetops.types.DatasetTransformFn]]]] = None, **kwfns: datasetops.types.DatasetTransformFn)

Transform the items of a dataset according to some function (passed as argument).

Arguments:: If a single function taking one input given, e.g. transform(lambda x: x), it will be applied to the whole item. If a list of functions are given, e.g. transform([image(), one_hot()]) they will be applied to the elements of the item corresponding to the position. If key is used, e.g. transform(data=lambda x:-x), the item associated with the key i transformed.
Raises:: ValueError: If more functions are passed than there are elements in an item. KeyError: If a key doesn’t match
Returns:: [Dataset] – Dataset whose items are transformed

categorical(key: datasetops.types.Key, mapping_fn: datasetops.types.Callable[[datasetops.types.Any], int] = None)

Transform elements into categorical categoricals (int).

Arguments:: key {Key} – Index of name for the element to be transformed
Keyword Arguments:: mapping_fn {Callable[[Any], int]} – User defined mapping function (default: {None})
Returns:: [Dataset] – Dataset with items that have been transformed to categorical labels

one_hot(key: datasetops.types.Key, encoding_size: int = None, mapping_fn: datasetops.types.Callable[[datasetops.types.Any], int] = None, dtype='bool')

Transform elements into a categorical one-hot encoding.

Arguments:: key {Key} – Index of name for the element to be transformed
Keyword Arguments:: encoding_size {int} – The number of positions in the one-hot vector. If size it not provided, it we be automatically inferred (with a O(N) runtime cost) (default: {None}) mapping_fn {Callable[[Any], int]} – User defined mapping function (default: {None}) dtype {str} – Numpy datatype for the one-hot encoded data (default: {‘bool’})
Returns:: [Dataset] – Dataset with items that have been transformed to categorical labels

image(*positional_flags: datasetops.types.Any)

Transforms item elements that are either numpy arrays or path strings into a PIL.Image.Image.

Arguments:: positional flags, e.g. (True, False) denoting which element should be converted. If no flags are supplied, all data that can be converted will be converted.
Returns:: [Dataset] – Dataset with PIL.Image.Image elements

numpy(*positional_flags: datasetops.types.Any)

Transforms elements into numpy.ndarray.

Arguments:: positional flags, e.g. (True, False) denoting which element should be converted. If no flags are supplied, all data that can be converted will be converted.
Returns:: [Dataset] – Dataset with np.ndarray elements

zip(*datasets)

cartesian_product(*datasets)

concat(*datasets)

reshape(*new_shapes: datasetops.types.Optional[datasetops.types.Shape], **kwshapes: datasetops.types.Optional[datasetops.types.Shape])

image_resize(*new_sizes: datasetops.types.Optional[datasetops.types.Shape], **kwsizes: datasetops.types.Optional[datasetops.types.Shape])

to_tensorflow()

to_pytorch()

datasetops.allow_unique(max_num_duplicates=1) → datasetops.types.Callable[[datasetops.types.Any], bool]

Predicate used for filtering/sampling a dataset classwise.

Keyword Arguments:: max_num_duplicates {int} – max number of samples to take that share the same value (default: {1})
Returns:: Callable[[Any], bool] – Predicate function

datasetops.custom(elem_transform_fn: datasetops.types.Callable[[datasetops.types.Any], datasetops.types.Any], elem_check_fn: datasetops.types.Callable[[datasetops.types.Any], None] = None) → datasetops.types.DatasetTransformFn

Create a user defined transform.

Arguments:: fn {Callable[[Any], Any]} – A user defined function, which takes the element as only argument
Keyword Arguments:: check_fn {Callable[[Any]]} – A function that raises an Exception if the elem is incompatible (default: {None})
Returns:: DatasetTransformFn – [description]

datasetops.reshape(new_shape: datasetops.types.Shape) → datasetops.types.DatasetTransformFn

datasetops.categorical(mapping_fn: datasetops.types.Callable[[datasetops.types.Any], int] = None) → datasetops.types.DatasetTransformFn

Transform data into a categorical int label.

Arguments:: mapping_fn {Callable[[Any], int]} – A function transforming the input data to the integer label. If not specified, labels are automatically inferred from the data.
Returns:: DatasetTransformFn – A function to be passed to the Dataset.transform()

datasetops.one_hot(encoding_size: int, mapping_fn: datasetops.types.Callable[[datasetops.types.Any], int] = None, dtype='bool') → datasetops.types.DatasetTransformFn

Transform data into a one-hot encoded label.

Arguments:: encoding_size {int} – The size of the encoding mapping_fn {Callable[[Any], int]} – A function transforming the input data to an integer label. If not specified, labels are automatically inferred from the data.
Returns:: DatasetTransformFn – A function to be passed to the Dataset.transform()

datasetops.categorical_template(ds: Dataset, key: datasetops.types.Key) → datasetops.types.Callable[[datasetops.types.Any], int]

Creates a template mapping function to be with one_hot.

Arguments:: ds {Dataset} – Dataset from which to create a template for one_hot coding key {Key} – Dataset key (name or item index) on the one_hot coding is made
Returns:: {Callable[[Any],int]} – mapping_fn for one_hot

datasetops.numpy() → datasetops.types.DatasetTransformFn

datasetops.image() → datasetops.types.DatasetTransformFn

datasetops.image_resize(new_size: datasetops.types.Shape, resample=Image.NEAREST) → datasetops.types.DatasetTransformFn

datasetops.zipped(*datasets: datasetops.abstract.AbstractDataset)

datasetops.cartesian_product(*datasets: datasetops.abstract.AbstractDataset)

datasetops.concat(*datasets: datasetops.abstract.AbstractDataset)

datasetops.to_tensorflow(dataset: Dataset)

datasetops.to_pytorch(dataset: Dataset)

class datasetops.Loader(getdata: datasetops.types.Callable[[datasetops.types.Any], datasetops.types.Any], name: str = None)

Bases: datasetops.dataset.Dataset

Contains information on how to access the raw data, and performs sampling and splitting related operations.

append(identifier: datasetops.types.Data)

extend(ids: datasetops.types.Union[datasetops.types.List[datasetops.types.Data], numpy.ndarray])

datasetops.from_pytorch(pytorch_dataset)

Create dataset from a Pytorch dataset

Arguments:: tf_dataset {torch.utils.data.Dataset} – A Pytorch dataset to load from
Returns:: [Dataset] – A datasetops.Dataset

datasetops.from_folder_data(path: datasetops.types.AnyPath) → datasetops.dataset.Dataset

Load data from a folder with the data structure:

folder |- sample1.jpg |- sample2.jpg

Arguments:

path {AnyPath} – path to folder

Returns:

Dataset – A dataset of data paths,: e.g. (‘nested_folder/class1/sample1.jpg’)

datasetops.from_folder_class_data(path: datasetops.types.AnyPath) → datasetops.dataset.Dataset

Load data from a folder with the data structure:

nested_folder |- class1

|- sample1.jpg |- sample2.jpg

|- class2

|- sample3.jpg

Arguments:

path {AnyPath} – path to nested folder

Returns:

Dataset – A labelled dataset of data paths and corresponding class labels,: e.g. (‘nested_folder/class1/sample1.jpg’, ‘class1’)

datasetops.from_folder_dataset_class_data(path: datasetops.types.AnyPath) → datasetops.types.List[datasetops.dataset.Dataset]

Load data from a folder with the data structure:

nested_folder |- dataset1

|- class1
|- sample1.jpg |- sample2.jpg

|- class2
|- sample3.jpg

|- dataset2

|- …

Arguments:

path {AnyPath} – path to nested folder

Returns:

List[Dataset] – A list of labelled datasets, each with data paths and corresponding class labels,: e.g. (‘nested_folder/class1/sample1.jpg’, ‘class1’)

datasetops.from_mat_single_mult_data(path: datasetops.types.AnyPath) → datasetops.types.List[datasetops.dataset.Dataset]

Load data from .mat file consisting of multiple data.

E.g. a .mat file with keys [‘X_src’, ‘Y_src’, ‘X_tgt’, ‘Y_tgt’]

Arguments:

path {AnyPath} – path to .mat file

Returns:

List[Dataset] – A list of datasets, where a dataset was created for each suffix: e.g. a dataset with data from the keys (‘X_src’, ‘Y_src’) and from (‘X_tgt’, ‘Y_tgt’)

datasetops

Submodules

Package Contents

Classes

Functions

Parameters

`datasetops`