datasetops.dataset

Module Contents

Classes

Dataset

Contains information on how to access the raw data, and performs

Functions

_warn_no_args([skip])

_raise_no_args([skip])

_dummy_arg_receiving(fn)

_key_index(→ int)

_split_bulk_itemwise(...)

_combine_conditions(→ datasetops.types.DataPredicate)

_optional_argument_indexed_transform(shape, ...)

_keywise(item_names, l, d)

_itemwise(item_names, l, d)

_dataset_element_transforming(fn[, check])

Applies the function to dataset item elements.

_check_shape_compatibility(shape)

convert2img(→ PIL.Image.Image)

_check_image_compatibility(elem)

_check_numpy_compatibility(elem)

allow_unique(...)

Predicate used for filtering/sampling a dataset classwise.

custom(→ datasetops.types.DatasetTransformFn)

Create a user defined transform.

reshape(→ datasetops.types.DatasetTransformFn)

categorical(→ datasetops.types.DatasetTransformFn)

Transform data into a categorical int label.

categorical_template(...)

Creates a template mapping function to be with one_hot.

one_hot(→ datasetops.types.DatasetTransformFn)

Transform data into a one-hot encoded label.

numpy(→ datasetops.types.DatasetTransformFn)

image(→ datasetops.types.DatasetTransformFn)

image_resize(→ datasetops.types.DatasetTransformFn)

zipped(*datasets)

cartesian_product(*datasets)

concat(*datasets)

_tf_compute_type(item)

_tf_compute_shape(item)

_tf_item_conversion(item)

to_tensorflow(dataset)

to_pytorch(dataset)

Attributes

_DEFAULT_SHAPE

datasetops.dataset._DEFAULT_SHAPE
datasetops.dataset._warn_no_args(skip=0)
datasetops.dataset._raise_no_args(skip=0)
datasetops.dataset._dummy_arg_receiving(fn)
datasetops.dataset._key_index(item_names: datasetops.types.ItemNames, key: datasetops.types.Key) int
datasetops.dataset._split_bulk_itemwise(l: datasetops.types.Union[datasetops.types.Optional[datasetops.types.Callable], datasetops.types.Sequence[datasetops.types.Optional[datasetops.types.Callable]]]) datasetops.types.Tuple[datasetops.types.Optional[datasetops.types.Callable], datasetops.types.Sequence[datasetops.types.Optional[datasetops.types.Callable]]]
datasetops.dataset._combine_conditions(item_names: datasetops.types.ItemNames, shape: datasetops.types.Shape, predicates: datasetops.types.Optional[datasetops.types.Union[datasetops.types.DataPredicate, datasetops.types.Sequence[datasetops.types.Optional[datasetops.types.DataPredicate]]]] = None, **kwpredicates: datasetops.types.DataPredicate) datasetops.types.DataPredicate
datasetops.dataset._optional_argument_indexed_transform(shape: datasetops.types.Shape, ds_transform: datasetops.types.Callable, transform_fn: datasetops.types.DatasetTransformFnCreator, args: datasetops.types.Sequence[datasetops.types.Any])
datasetops.dataset._keywise(item_names: datasetops.types.Dict[str, int], l: datasetops.types.Sequence, d: datasetops.types.Dict)
datasetops.dataset._itemwise(item_names: datasetops.types.Dict[str, int], l: datasetops.types.Sequence, d: datasetops.types.Dict)
class datasetops.dataset.Dataset(downstream_getter: datasetops.types.Union[datasetops.abstract.ItemGetter, Dataset], name: str = None, ids: datasetops.types.Ids = None, item_transform_fn: datasetops.types.ItemTransformFn = lambda x: ..., item_names: datasetops.types.Dict[str, int] = None)

Bases: datasetops.abstract.AbstractDataset

Contains information on how to access the raw data, and performs sampling and splitting related operations.

property shape: datasetops.types.Sequence[int]

Get the shape of a dataset item.

Returns:

Sequence[int] – Item shapes

property names: datasetops.types.List[str]

Get the names of the elements in an item.

Returns:

List[str] – A list of element names

__len__()

Return the total number of elements in the dataset.

__getitem__(i: int) datasetops.types.Tuple

Returns the element at the specified index.

Parameters

idxint

the index from which to read the sample.

counts(*itemkeys: datasetops.types.Key) datasetops.types.List[datasetops.types.Tuple[datasetops.types.Any, int]]

Compute the counts of each unique item in the dataset.

Warning: this operation may be expensive for large datasets

Arguments:

itemkeys {Union[str, int]} – The item keys (str) or indexes (int) to be checked for uniqueness. If no key is given, all item-parts must match for them to be considered equal

Returns:

List[Tuple[Any,int]] – List of tuples, each containing the unique value and its number of occurences

unique(*itemkeys: datasetops.types.Key) datasetops.types.List[datasetops.types.Any]

Compute a list of unique values in the dataset.

Warning: this operation may be expensive for large datasets

Arguments:

itemkeys {str} – The item keys to be checked for uniqueness

Returns:

List[Any] – List of the unique items

sample(num: int, seed: int = None)

Sample data randomly from the dataset.

Arguments:

num {int} – Number of samples. If the number of samples is larger than the dataset size, some samples may be samples multiple times

Keyword Arguments:

seed {int} – Random seed (default: {None})

Returns:

[Dataset] – Sampled dataset

filter(predicates: datasetops.types.Optional[datasetops.types.Union[datasetops.types.DataPredicate, datasetops.types.Sequence[datasetops.types.Optional[datasetops.types.DataPredicate]]]] = None, **kwpredicates: datasetops.types.DataPredicate)

Filter a dataset using a predicate function.

Keyword Arguments:

predicates {Union[DataPredicate, Sequence[Optional[DataPredicate]]]} – either a single or a list of functions taking a single dataset item and returning a bool if a single function is passed, it is applied to the whole item, if a list is passed, the functions are applied itemwise element-wise predicates can also be passed, if item_names have been named. kwpredicates {DataPredicate} – TODO

Returns:

[Dataset] – A filtered Dataset

split_filter(predicates: datasetops.types.Optional[datasetops.types.Union[datasetops.types.DataPredicate, datasetops.types.Sequence[datasetops.types.Optional[datasetops.types.DataPredicate]]]] = None, **kwpredicates: datasetops.types.DataPredicate)

Split a dataset using a predicate function.

Keyword Arguments:

predicates {Union[DataPredicate, Sequence[Optional[DataPredicate]]]} – either a single or a list of functions taking a single dataset item and returning a bool. if a single function is passed, it is applied to the whole item, if a list is passed, the functions are applied itemwise element-wise predicates can also be passed, if item_names have been named.

Returns:

[Dataset] – Two datasets, one that passed the predicate and one that didn’t

shuffle(seed: int = None)

Shuffle the items in a dataset.

Keyword Arguments:

seed {[int]} – Random seed (default: {None})

Returns:

[Dataset] – Dataset with shuffled items

split(fractions: datasetops.types.List[float], seed: int = None)

Split dataset into multiple datasets, determined by the fractions given.

A wildcard (-1) may be given at a single position, to fill in the rest. If fractions don’t add up, the last fraction in the list receives the remainding data.

Arguments:

fractions {List[float]} – a list or tuple of floats i the interval ]0,1[. One of the items may be a -1 wildcard.

Keyword Arguments:

seed {int} – Random seed (default: {None})

Returns:

List[Dataset] – Datasets with the number of samples corresponding to the fractions given

take(num: int)

Take the first elements of a dataset.

Arguments:

num {int} – number of elements to take

Returns:

Dataset – A dataset with only the first num elements

repeat(times=1, mode='itemwise')

Repeat the dataset elements.

Keyword Arguments:

times {int} – Number of times an element is repeated (default: {1}) mode {str} – Repeat ‘itemwise’ (i.e. [1,1,2,2,3,3]) or as a ‘whole’ (i.e. [1,2,3,1,2,3]) (default: {‘itemwise’})

Returns:

[type] – [description]

reorder(*keys: datasetops.types.Key)

Reorder items in the dataset (similar to numpy.transpose).

Arguments:

new_inds {Union[int,str]} – positioned item index or key (if item names were previously set) of item

Returns:

[Dataset] – Dataset with items whose elements have been reordered

named(first: datasetops.types.Union[str, datasetops.types.Sequence[str]], *rest: str)

Set the names associated with the elements of an item.

Arguments:

first {Union[str, Sequence[str]]} – The new item name(s)

Returns:

[Dataset] – A Dataset whose item elements can be accessed by name

transform(fns: datasetops.types.Optional[datasetops.types.Union[datasetops.types.ItemTransformFn, datasetops.types.Sequence[datasetops.types.Union[datasetops.types.ItemTransformFn, datasetops.types.DatasetTransformFn]]]] = None, **kwfns: datasetops.types.DatasetTransformFn)

Transform the items of a dataset according to some function (passed as argument).

Arguments:

If a single function taking one input given, e.g. transform(lambda x: x), it will be applied to the whole item. If a list of functions are given, e.g. transform([image(), one_hot()]) they will be applied to the elements of the item corresponding to the position. If key is used, e.g. transform(data=lambda x:-x), the item associated with the key i transformed.

Raises:

ValueError: If more functions are passed than there are elements in an item. KeyError: If a key doesn’t match

Returns:

[Dataset] – Dataset whose items are transformed

categorical(key: datasetops.types.Key, mapping_fn: datasetops.types.Callable[[datasetops.types.Any], int] = None)

Transform elements into categorical categoricals (int).

Arguments:

key {Key} – Index of name for the element to be transformed

Keyword Arguments:

mapping_fn {Callable[[Any], int]} – User defined mapping function (default: {None})

Returns:

[Dataset] – Dataset with items that have been transformed to categorical labels

one_hot(key: datasetops.types.Key, encoding_size: int = None, mapping_fn: datasetops.types.Callable[[datasetops.types.Any], int] = None, dtype='bool')

Transform elements into a categorical one-hot encoding.

Arguments:

key {Key} – Index of name for the element to be transformed

Keyword Arguments:

encoding_size {int} – The number of positions in the one-hot vector. If size it not provided, it we be automatically inferred (with a O(N) runtime cost) (default: {None}) mapping_fn {Callable[[Any], int]} – User defined mapping function (default: {None}) dtype {str} – Numpy datatype for the one-hot encoded data (default: {‘bool’})

Returns:

[Dataset] – Dataset with items that have been transformed to categorical labels

image(*positional_flags: datasetops.types.Any)

Transforms item elements that are either numpy arrays or path strings into a PIL.Image.Image.

Arguments:

positional flags, e.g. (True, False) denoting which element should be converted. If no flags are supplied, all data that can be converted will be converted.

Returns:

[Dataset] – Dataset with PIL.Image.Image elements

numpy(*positional_flags: datasetops.types.Any)

Transforms elements into numpy.ndarray.

Arguments:

positional flags, e.g. (True, False) denoting which element should be converted. If no flags are supplied, all data that can be converted will be converted.

Returns:

[Dataset] – Dataset with np.ndarray elements

zip(*datasets)
cartesian_product(*datasets)
concat(*datasets)
reshape(*new_shapes: datasetops.types.Optional[datasetops.types.Shape], **kwshapes: datasetops.types.Optional[datasetops.types.Shape])
image_resize(*new_sizes: datasetops.types.Optional[datasetops.types.Shape], **kwsizes: datasetops.types.Optional[datasetops.types.Shape])
to_tensorflow()
to_pytorch()
datasetops.dataset._dataset_element_transforming(fn: datasetops.types.Callable, check: datasetops.types.Callable = None)

Applies the function to dataset item elements.

datasetops.dataset._check_shape_compatibility(shape: datasetops.types.Shape)
datasetops.dataset.convert2img(elem: datasetops.types.Union[PIL.Image.Image, str, pathlib.Path, numpy.ndarray]) PIL.Image.Image
datasetops.dataset._check_image_compatibility(elem)
datasetops.dataset._check_numpy_compatibility(elem)
datasetops.dataset.allow_unique(max_num_duplicates=1) datasetops.types.Callable[[datasetops.types.Any], bool]

Predicate used for filtering/sampling a dataset classwise.

Keyword Arguments:

max_num_duplicates {int} – max number of samples to take that share the same value (default: {1})

Returns:

Callable[[Any], bool] – Predicate function

datasetops.dataset.custom(elem_transform_fn: datasetops.types.Callable[[datasetops.types.Any], datasetops.types.Any], elem_check_fn: datasetops.types.Callable[[datasetops.types.Any], None] = None) datasetops.types.DatasetTransformFn

Create a user defined transform.

Arguments:

fn {Callable[[Any], Any]} – A user defined function, which takes the element as only argument

Keyword Arguments:

check_fn {Callable[[Any]]} – A function that raises an Exception if the elem is incompatible (default: {None})

Returns:

DatasetTransformFn – [description]

datasetops.dataset.reshape(new_shape: datasetops.types.Shape) datasetops.types.DatasetTransformFn
datasetops.dataset.categorical(mapping_fn: datasetops.types.Callable[[datasetops.types.Any], int] = None) datasetops.types.DatasetTransformFn

Transform data into a categorical int label.

Arguments:

mapping_fn {Callable[[Any], int]} – A function transforming the input data to the integer label. If not specified, labels are automatically inferred from the data.

Returns:

DatasetTransformFn – A function to be passed to the Dataset.transform()

datasetops.dataset.categorical_template(ds: Dataset, key: datasetops.types.Key) datasetops.types.Callable[[datasetops.types.Any], int]

Creates a template mapping function to be with one_hot.

Arguments:

ds {Dataset} – Dataset from which to create a template for one_hot coding key {Key} – Dataset key (name or item index) on the one_hot coding is made

Returns:

{Callable[[Any],int]} – mapping_fn for one_hot

datasetops.dataset.one_hot(encoding_size: int, mapping_fn: datasetops.types.Callable[[datasetops.types.Any], int] = None, dtype='bool') datasetops.types.DatasetTransformFn

Transform data into a one-hot encoded label.

Arguments:

encoding_size {int} – The size of the encoding mapping_fn {Callable[[Any], int]} – A function transforming the input data to an integer label. If not specified, labels are automatically inferred from the data.

Returns:

DatasetTransformFn – A function to be passed to the Dataset.transform()

datasetops.dataset.numpy() datasetops.types.DatasetTransformFn
datasetops.dataset.image() datasetops.types.DatasetTransformFn
datasetops.dataset.image_resize(new_size: datasetops.types.Shape, resample=Image.NEAREST) datasetops.types.DatasetTransformFn
datasetops.dataset.zipped(*datasets: datasetops.abstract.AbstractDataset)
datasetops.dataset.cartesian_product(*datasets: datasetops.abstract.AbstractDataset)
datasetops.dataset.concat(*datasets: datasetops.abstract.AbstractDataset)
datasetops.dataset._tf_compute_type(item: datasetops.types.Any)
datasetops.dataset._tf_compute_shape(item: datasetops.types.Any)
datasetops.dataset._tf_item_conversion(item: datasetops.types.Any)
datasetops.dataset.to_tensorflow(dataset: Dataset)
datasetops.dataset.to_pytorch(dataset: Dataset)