`datasetops.dataset`

Module Contents

Classes

Dataset

Contains information on how to access the raw data, and performs

Functions

`_warn_no_args`([skip])
`_raise_no_args`([skip])
`_dummy_arg_receiving`(fn)
`_key_index`(→ int)
`_split_bulk_itemwise`(...)
`_combine_conditions`(→ datasetops.types.DataPredicate)
`_optional_argument_indexed_transform`(shape, ...)
`_keywise`(item_names, l, d)
`_itemwise`(item_names, l, d)
`_dataset_element_transforming`(fn[, check])	Applies the function to dataset item elements.
`_check_shape_compatibility`(shape)
`convert2img`(→ PIL.Image.Image)
`_check_image_compatibility`(elem)
`_check_numpy_compatibility`(elem)
`allow_unique`(...)	Predicate used for filtering/sampling a dataset classwise.
`custom`(→ datasetops.types.DatasetTransformFn)	Create a user defined transform.
`reshape`(→ datasetops.types.DatasetTransformFn)
`categorical`(→ datasetops.types.DatasetTransformFn)	Transform data into a categorical int label.
`categorical_template`(...)	Creates a template mapping function to be with one_hot.
`one_hot`(→ datasetops.types.DatasetTransformFn)	Transform data into a one-hot encoded label.
`numpy`(→ datasetops.types.DatasetTransformFn)
`image`(→ datasetops.types.DatasetTransformFn)
`image_resize`(→ datasetops.types.DatasetTransformFn)
`zipped`(*datasets)
`cartesian_product`(*datasets)
`concat`(*datasets)
`_tf_compute_type`(item)
`_tf_compute_shape`(item)
`_tf_item_conversion`(item)
`to_tensorflow`(dataset)
`to_pytorch`(dataset)

Attributes

_DEFAULT_SHAPE

datasetops.dataset._DEFAULT_SHAPE

datasetops.dataset._warn_no_args(skip=0)

datasetops.dataset._raise_no_args(skip=0)

datasetops.dataset._dummy_arg_receiving(fn)

datasetops.dataset._key_index(item_names: datasetops.types.ItemNames, key: datasetops.types.Key) → int

datasetops.dataset._split_bulk_itemwise(l: datasetops.types.Union[datasetops.types.Optional[datasetops.types.Callable], datasetops.types.Sequence[datasetops.types.Optional[datasetops.types.Callable]]]) → datasetops.types.Tuple[datasetops.types.Optional[datasetops.types.Callable], datasetops.types.Sequence[datasetops.types.Optional[datasetops.types.Callable]]]

datasetops.dataset._combine_conditions(item_names: datasetops.types.ItemNames, shape: datasetops.types.Shape, predicates: datasetops.types.Optional[datasetops.types.Union[datasetops.types.DataPredicate, datasetops.types.Sequence[datasetops.types.Optional[datasetops.types.DataPredicate]]]] = None, **kwpredicates: datasetops.types.DataPredicate) → datasetops.types.DataPredicate

datasetops.dataset._optional_argument_indexed_transform(shape: datasetops.types.Shape, ds_transform: datasetops.types.Callable, transform_fn: datasetops.types.DatasetTransformFnCreator, args: datasetops.types.Sequence[datasetops.types.Any])

datasetops.dataset._keywise(item_names: datasetops.types.Dict[str, int], l: datasetops.types.Sequence, d: datasetops.types.Dict)

datasetops.dataset._itemwise(item_names: datasetops.types.Dict[str, int], l: datasetops.types.Sequence, d: datasetops.types.Dict)

class datasetops.dataset.Dataset(downstream_getter: datasetops.types.Union[datasetops.abstract.ItemGetter, Dataset], name: str = None, ids: datasetops.types.Ids = None, item_transform_fn: datasetops.types.ItemTransformFn = lambda x: ..., item_names: datasetops.types.Dict[str, int] = None)

Bases: datasetops.abstract.AbstractDataset

Contains information on how to access the raw data, and performs sampling and splitting related operations.

property shape: datasetops.types.Sequence[int]

Get the shape of a dataset item.

Returns:: Sequence[int] – Item shapes

property names: datasetops.types.List[str]

Get the names of the elements in an item.

Returns:: List[str] – A list of element names

__len__(): Return the total number of elements in the dataset.

__getitem__(i: int) → datasetops.types.Tuple

Returns the element at the specified index.

Parameters

idxint: the index from which to read the sample.

counts(*itemkeys: datasetops.types.Key) → datasetops.types.List[datasetops.types.Tuple[datasetops.types.Any, int]]

Compute the counts of each unique item in the dataset.

Warning: this operation may be expensive for large datasets

Arguments:: itemkeys {Union[str, int]} – The item keys (str) or indexes (int) to be checked for uniqueness. If no key is given, all item-parts must match for them to be considered equal
Returns:: List[Tuple[Any,int]] – List of tuples, each containing the unique value and its number of occurences

unique(*itemkeys: datasetops.types.Key) → datasetops.types.List[datasetops.types.Any]

Compute a list of unique values in the dataset.

Warning: this operation may be expensive for large datasets

Arguments:: itemkeys {str} – The item keys to be checked for uniqueness
Returns:: List[Any] – List of the unique items

sample(num: int, seed: int = None)

Sample data randomly from the dataset.

Arguments:: num {int} – Number of samples. If the number of samples is larger than the dataset size, some samples may be samples multiple times
Keyword Arguments:: seed {int} – Random seed (default: {None})
Returns:: [Dataset] – Sampled dataset

filter(predicates: datasetops.types.Optional[datasetops.types.Union[datasetops.types.DataPredicate, datasetops.types.Sequence[datasetops.types.Optional[datasetops.types.DataPredicate]]]] = None, **kwpredicates: datasetops.types.DataPredicate)

Filter a dataset using a predicate function.

Keyword Arguments:: predicates {Union[DataPredicate, Sequence[Optional[DataPredicate]]]} – either a single or a list of functions taking a single dataset item and returning a bool if a single function is passed, it is applied to the whole item, if a list is passed, the functions are applied itemwise element-wise predicates can also be passed, if item_names have been named. kwpredicates {DataPredicate} – TODO
Returns:: [Dataset] – A filtered Dataset

split_filter(predicates: datasetops.types.Optional[datasetops.types.Union[datasetops.types.DataPredicate, datasetops.types.Sequence[datasetops.types.Optional[datasetops.types.DataPredicate]]]] = None, **kwpredicates: datasetops.types.DataPredicate)

Split a dataset using a predicate function.

Keyword Arguments:: predicates {Union[DataPredicate, Sequence[Optional[DataPredicate]]]} – either a single or a list of functions taking a single dataset item and returning a bool. if a single function is passed, it is applied to the whole item, if a list is passed, the functions are applied itemwise element-wise predicates can also be passed, if item_names have been named.
Returns:: [Dataset] – Two datasets, one that passed the predicate and one that didn’t

shuffle(seed: int = None)

Shuffle the items in a dataset.

Keyword Arguments:: seed {[int]} – Random seed (default: {None})
Returns:: [Dataset] – Dataset with shuffled items

split(fractions: datasetops.types.List[float], seed: int = None)

Split dataset into multiple datasets, determined by the fractions given.

A wildcard (-1) may be given at a single position, to fill in the rest. If fractions don’t add up, the last fraction in the list receives the remainding data.

Arguments:: fractions {List[float]} – a list or tuple of floats i the interval ]0,1[. One of the items may be a -1 wildcard.
Keyword Arguments:: seed {int} – Random seed (default: {None})
Returns:: List[Dataset] – Datasets with the number of samples corresponding to the fractions given

take(num: int)

Take the first elements of a dataset.

Arguments:: num {int} – number of elements to take
Returns:: Dataset – A dataset with only the first num elements

repeat(times=1, mode='itemwise')

Repeat the dataset elements.

Keyword Arguments:: times {int} – Number of times an element is repeated (default: {1}) mode {str} – Repeat ‘itemwise’ (i.e. [1,1,2,2,3,3]) or as a ‘whole’ (i.e. [1,2,3,1,2,3]) (default: {‘itemwise’})
Returns:: [type] – [description]

reorder(*keys: datasetops.types.Key)

Reorder items in the dataset (similar to numpy.transpose).

Arguments:: new_inds {Union[int,str]} – positioned item index or key (if item names were previously set) of item
Returns:: [Dataset] – Dataset with items whose elements have been reordered

named(first: datasetops.types.Union[str, datasetops.types.Sequence[str]], *rest: str)

Set the names associated with the elements of an item.

Arguments:: first {Union[str, Sequence[str]]} – The new item name(s)
Returns:: [Dataset] – A Dataset whose item elements can be accessed by name

transform(fns: datasetops.types.Optional[datasetops.types.Union[datasetops.types.ItemTransformFn, datasetops.types.Sequence[datasetops.types.Union[datasetops.types.ItemTransformFn, datasetops.types.DatasetTransformFn]]]] = None, **kwfns: datasetops.types.DatasetTransformFn)

Transform the items of a dataset according to some function (passed as argument).

Arguments:: If a single function taking one input given, e.g. transform(lambda x: x), it will be applied to the whole item. If a list of functions are given, e.g. transform([image(), one_hot()]) they will be applied to the elements of the item corresponding to the position. If key is used, e.g. transform(data=lambda x:-x), the item associated with the key i transformed.
Raises:: ValueError: If more functions are passed than there are elements in an item. KeyError: If a key doesn’t match
Returns:: [Dataset] – Dataset whose items are transformed

categorical(key: datasetops.types.Key, mapping_fn: datasetops.types.Callable[[datasetops.types.Any], int] = None)

Transform elements into categorical categoricals (int).

Arguments:: key {Key} – Index of name for the element to be transformed
Keyword Arguments:: mapping_fn {Callable[[Any], int]} – User defined mapping function (default: {None})
Returns:: [Dataset] – Dataset with items that have been transformed to categorical labels

one_hot(key: datasetops.types.Key, encoding_size: int = None, mapping_fn: datasetops.types.Callable[[datasetops.types.Any], int] = None, dtype='bool')

Transform elements into a categorical one-hot encoding.

Arguments:: key {Key} – Index of name for the element to be transformed
Keyword Arguments:: encoding_size {int} – The number of positions in the one-hot vector. If size it not provided, it we be automatically inferred (with a O(N) runtime cost) (default: {None}) mapping_fn {Callable[[Any], int]} – User defined mapping function (default: {None}) dtype {str} – Numpy datatype for the one-hot encoded data (default: {‘bool’})
Returns:: [Dataset] – Dataset with items that have been transformed to categorical labels

image(*positional_flags: datasetops.types.Any)

Transforms item elements that are either numpy arrays or path strings into a PIL.Image.Image.

Arguments:: positional flags, e.g. (True, False) denoting which element should be converted. If no flags are supplied, all data that can be converted will be converted.
Returns:: [Dataset] – Dataset with PIL.Image.Image elements

numpy(*positional_flags: datasetops.types.Any)

Transforms elements into numpy.ndarray.

Arguments:: positional flags, e.g. (True, False) denoting which element should be converted. If no flags are supplied, all data that can be converted will be converted.
Returns:: [Dataset] – Dataset with np.ndarray elements

zip(*datasets)

cartesian_product(*datasets)

concat(*datasets)

reshape(*new_shapes: datasetops.types.Optional[datasetops.types.Shape], **kwshapes: datasetops.types.Optional[datasetops.types.Shape])

image_resize(*new_sizes: datasetops.types.Optional[datasetops.types.Shape], **kwsizes: datasetops.types.Optional[datasetops.types.Shape])

to_tensorflow()

to_pytorch()

datasetops.dataset._dataset_element_transforming(fn: datasetops.types.Callable, check: datasetops.types.Callable = None): Applies the function to dataset item elements.

datasetops.dataset._check_shape_compatibility(shape: datasetops.types.Shape)

datasetops.dataset.convert2img(elem: datasetops.types.Union[PIL.Image.Image, str, pathlib.Path, numpy.ndarray]) → PIL.Image.Image

datasetops.dataset._check_image_compatibility(elem)

datasetops.dataset._check_numpy_compatibility(elem)

datasetops.dataset.allow_unique(max_num_duplicates=1) → datasetops.types.Callable[[datasetops.types.Any], bool]

Predicate used for filtering/sampling a dataset classwise.

Keyword Arguments:: max_num_duplicates {int} – max number of samples to take that share the same value (default: {1})
Returns:: Callable[[Any], bool] – Predicate function

datasetops.dataset.custom(elem_transform_fn: datasetops.types.Callable[[datasetops.types.Any], datasetops.types.Any], elem_check_fn: datasetops.types.Callable[[datasetops.types.Any], None] = None) → datasetops.types.DatasetTransformFn

Create a user defined transform.

Arguments:: fn {Callable[[Any], Any]} – A user defined function, which takes the element as only argument
Keyword Arguments:: check_fn {Callable[[Any]]} – A function that raises an Exception if the elem is incompatible (default: {None})
Returns:: DatasetTransformFn – [description]

datasetops.dataset.reshape(new_shape: datasetops.types.Shape) → datasetops.types.DatasetTransformFn

datasetops.dataset.categorical(mapping_fn: datasetops.types.Callable[[datasetops.types.Any], int] = None) → datasetops.types.DatasetTransformFn

Transform data into a categorical int label.

Arguments:: mapping_fn {Callable[[Any], int]} – A function transforming the input data to the integer label. If not specified, labels are automatically inferred from the data.
Returns:: DatasetTransformFn – A function to be passed to the Dataset.transform()

datasetops.dataset.categorical_template(ds: Dataset, key: datasetops.types.Key) → datasetops.types.Callable[[datasetops.types.Any], int]

Creates a template mapping function to be with one_hot.

Arguments:: ds {Dataset} – Dataset from which to create a template for one_hot coding key {Key} – Dataset key (name or item index) on the one_hot coding is made
Returns:: {Callable[[Any],int]} – mapping_fn for one_hot

datasetops.dataset.one_hot(encoding_size: int, mapping_fn: datasetops.types.Callable[[datasetops.types.Any], int] = None, dtype='bool') → datasetops.types.DatasetTransformFn

Transform data into a one-hot encoded label.

Arguments:: encoding_size {int} – The size of the encoding mapping_fn {Callable[[Any], int]} – A function transforming the input data to an integer label. If not specified, labels are automatically inferred from the data.
Returns:: DatasetTransformFn – A function to be passed to the Dataset.transform()

datasetops.dataset.numpy() → datasetops.types.DatasetTransformFn

datasetops.dataset.image() → datasetops.types.DatasetTransformFn

datasetops.dataset.image_resize(new_size: datasetops.types.Shape, resample=Image.NEAREST) → datasetops.types.DatasetTransformFn

datasetops.dataset.zipped(*datasets: datasetops.abstract.AbstractDataset)

datasetops.dataset.cartesian_product(*datasets: datasetops.abstract.AbstractDataset)

datasetops.dataset.concat(*datasets: datasetops.abstract.AbstractDataset)

datasetops.dataset._tf_compute_type(item: datasetops.types.Any)

datasetops.dataset._tf_compute_shape(item: datasetops.types.Any)

datasetops.dataset._tf_item_conversion(item: datasetops.types.Any)

datasetops.dataset.to_tensorflow(dataset: Dataset)

datasetops.dataset.to_pytorch(dataset: Dataset)

datasetops.dataset