datasetops.dataset
Module Contents
Classes
Contains information on how to access the raw data, and performs |
Functions
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Applies the function to dataset item elements. |
|
|
|
|
|
Predicate used for filtering/sampling a dataset classwise. |
|
Create a user defined transform. |
|
|
|
Transform data into a categorical int label. |
|
Creates a template mapping function to be with one_hot. |
|
Transform data into a one-hot encoded label. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Attributes
- datasetops.dataset._DEFAULT_SHAPE
- datasetops.dataset._warn_no_args(skip=0)
- datasetops.dataset._raise_no_args(skip=0)
- datasetops.dataset._dummy_arg_receiving(fn)
- datasetops.dataset._key_index(item_names: datasetops.types.ItemNames, key: datasetops.types.Key) int
- datasetops.dataset._split_bulk_itemwise(l: datasetops.types.Union[datasetops.types.Optional[datasetops.types.Callable], datasetops.types.Sequence[datasetops.types.Optional[datasetops.types.Callable]]]) datasetops.types.Tuple[datasetops.types.Optional[datasetops.types.Callable], datasetops.types.Sequence[datasetops.types.Optional[datasetops.types.Callable]]]
- datasetops.dataset._combine_conditions(item_names: datasetops.types.ItemNames, shape: datasetops.types.Shape, predicates: datasetops.types.Optional[datasetops.types.Union[datasetops.types.DataPredicate, datasetops.types.Sequence[datasetops.types.Optional[datasetops.types.DataPredicate]]]] = None, **kwpredicates: datasetops.types.DataPredicate) datasetops.types.DataPredicate
- datasetops.dataset._optional_argument_indexed_transform(shape: datasetops.types.Shape, ds_transform: datasetops.types.Callable, transform_fn: datasetops.types.DatasetTransformFnCreator, args: datasetops.types.Sequence[datasetops.types.Any])
- datasetops.dataset._keywise(item_names: datasetops.types.Dict[str, int], l: datasetops.types.Sequence, d: datasetops.types.Dict)
- datasetops.dataset._itemwise(item_names: datasetops.types.Dict[str, int], l: datasetops.types.Sequence, d: datasetops.types.Dict)
- class datasetops.dataset.Dataset(downstream_getter: datasetops.types.Union[datasetops.abstract.ItemGetter, Dataset], name: str = None, ids: datasetops.types.Ids = None, item_transform_fn: datasetops.types.ItemTransformFn = lambda x: ..., item_names: datasetops.types.Dict[str, int] = None)
Bases:
datasetops.abstract.AbstractDataset
Contains information on how to access the raw data, and performs sampling and splitting related operations.
- property shape: datasetops.types.Sequence[int]
Get the shape of a dataset item.
- Returns:
Sequence[int] – Item shapes
- property names: datasetops.types.List[str]
Get the names of the elements in an item.
- Returns:
List[str] – A list of element names
- __len__()
Return the total number of elements in the dataset.
- __getitem__(i: int) datasetops.types.Tuple
Returns the element at the specified index.
Parameters
- idxint
the index from which to read the sample.
- counts(*itemkeys: datasetops.types.Key) datasetops.types.List[datasetops.types.Tuple[datasetops.types.Any, int]]
Compute the counts of each unique item in the dataset.
Warning: this operation may be expensive for large datasets
- Arguments:
itemkeys {Union[str, int]} – The item keys (str) or indexes (int) to be checked for uniqueness. If no key is given, all item-parts must match for them to be considered equal
- Returns:
List[Tuple[Any,int]] – List of tuples, each containing the unique value and its number of occurences
- unique(*itemkeys: datasetops.types.Key) datasetops.types.List[datasetops.types.Any]
Compute a list of unique values in the dataset.
Warning: this operation may be expensive for large datasets
- Arguments:
itemkeys {str} – The item keys to be checked for uniqueness
- Returns:
List[Any] – List of the unique items
- sample(num: int, seed: int = None)
Sample data randomly from the dataset.
- Arguments:
num {int} – Number of samples. If the number of samples is larger than the dataset size, some samples may be samples multiple times
- Keyword Arguments:
seed {int} – Random seed (default: {None})
- Returns:
[Dataset] – Sampled dataset
- filter(predicates: datasetops.types.Optional[datasetops.types.Union[datasetops.types.DataPredicate, datasetops.types.Sequence[datasetops.types.Optional[datasetops.types.DataPredicate]]]] = None, **kwpredicates: datasetops.types.DataPredicate)
Filter a dataset using a predicate function.
- Keyword Arguments:
predicates {Union[DataPredicate, Sequence[Optional[DataPredicate]]]} – either a single or a list of functions taking a single dataset item and returning a bool if a single function is passed, it is applied to the whole item, if a list is passed, the functions are applied itemwise element-wise predicates can also be passed, if item_names have been named. kwpredicates {DataPredicate} – TODO
- Returns:
[Dataset] – A filtered Dataset
- split_filter(predicates: datasetops.types.Optional[datasetops.types.Union[datasetops.types.DataPredicate, datasetops.types.Sequence[datasetops.types.Optional[datasetops.types.DataPredicate]]]] = None, **kwpredicates: datasetops.types.DataPredicate)
Split a dataset using a predicate function.
- Keyword Arguments:
predicates {Union[DataPredicate, Sequence[Optional[DataPredicate]]]} – either a single or a list of functions taking a single dataset item and returning a bool. if a single function is passed, it is applied to the whole item, if a list is passed, the functions are applied itemwise element-wise predicates can also be passed, if item_names have been named.
- Returns:
[Dataset] – Two datasets, one that passed the predicate and one that didn’t
- shuffle(seed: int = None)
Shuffle the items in a dataset.
- Keyword Arguments:
seed {[int]} – Random seed (default: {None})
- Returns:
[Dataset] – Dataset with shuffled items
- split(fractions: datasetops.types.List[float], seed: int = None)
Split dataset into multiple datasets, determined by the fractions given.
A wildcard (-1) may be given at a single position, to fill in the rest. If fractions don’t add up, the last fraction in the list receives the remainding data.
- Arguments:
fractions {List[float]} – a list or tuple of floats i the interval ]0,1[. One of the items may be a -1 wildcard.
- Keyword Arguments:
seed {int} – Random seed (default: {None})
- Returns:
List[Dataset] – Datasets with the number of samples corresponding to the fractions given
- take(num: int)
Take the first elements of a dataset.
- Arguments:
num {int} – number of elements to take
- Returns:
Dataset – A dataset with only the first num elements
- repeat(times=1, mode='itemwise')
Repeat the dataset elements.
- Keyword Arguments:
times {int} – Number of times an element is repeated (default: {1}) mode {str} – Repeat ‘itemwise’ (i.e. [1,1,2,2,3,3]) or as a ‘whole’ (i.e. [1,2,3,1,2,3]) (default: {‘itemwise’})
- Returns:
[type] – [description]
- reorder(*keys: datasetops.types.Key)
Reorder items in the dataset (similar to numpy.transpose).
- Arguments:
new_inds {Union[int,str]} – positioned item index or key (if item names were previously set) of item
- Returns:
[Dataset] – Dataset with items whose elements have been reordered
- named(first: datasetops.types.Union[str, datasetops.types.Sequence[str]], *rest: str)
Set the names associated with the elements of an item.
- Arguments:
first {Union[str, Sequence[str]]} – The new item name(s)
- Returns:
[Dataset] – A Dataset whose item elements can be accessed by name
- transform(fns: datasetops.types.Optional[datasetops.types.Union[datasetops.types.ItemTransformFn, datasetops.types.Sequence[datasetops.types.Union[datasetops.types.ItemTransformFn, datasetops.types.DatasetTransformFn]]]] = None, **kwfns: datasetops.types.DatasetTransformFn)
Transform the items of a dataset according to some function (passed as argument).
- Arguments:
If a single function taking one input given, e.g. transform(lambda x: x), it will be applied to the whole item. If a list of functions are given, e.g. transform([image(), one_hot()]) they will be applied to the elements of the item corresponding to the position. If key is used, e.g. transform(data=lambda x:-x), the item associated with the key i transformed.
- Raises:
ValueError: If more functions are passed than there are elements in an item. KeyError: If a key doesn’t match
- Returns:
[Dataset] – Dataset whose items are transformed
- categorical(key: datasetops.types.Key, mapping_fn: datasetops.types.Callable[[datasetops.types.Any], int] = None)
Transform elements into categorical categoricals (int).
- Arguments:
key {Key} – Index of name for the element to be transformed
- Keyword Arguments:
mapping_fn {Callable[[Any], int]} – User defined mapping function (default: {None})
- Returns:
[Dataset] – Dataset with items that have been transformed to categorical labels
- one_hot(key: datasetops.types.Key, encoding_size: int = None, mapping_fn: datasetops.types.Callable[[datasetops.types.Any], int] = None, dtype='bool')
Transform elements into a categorical one-hot encoding.
- Arguments:
key {Key} – Index of name for the element to be transformed
- Keyword Arguments:
encoding_size {int} – The number of positions in the one-hot vector. If size it not provided, it we be automatically inferred (with a O(N) runtime cost) (default: {None}) mapping_fn {Callable[[Any], int]} – User defined mapping function (default: {None}) dtype {str} – Numpy datatype for the one-hot encoded data (default: {‘bool’})
- Returns:
[Dataset] – Dataset with items that have been transformed to categorical labels
- image(*positional_flags: datasetops.types.Any)
Transforms item elements that are either numpy arrays or path strings into a PIL.Image.Image.
- Arguments:
positional flags, e.g. (True, False) denoting which element should be converted. If no flags are supplied, all data that can be converted will be converted.
- Returns:
[Dataset] – Dataset with PIL.Image.Image elements
- numpy(*positional_flags: datasetops.types.Any)
Transforms elements into numpy.ndarray.
- Arguments:
positional flags, e.g. (True, False) denoting which element should be converted. If no flags are supplied, all data that can be converted will be converted.
- Returns:
[Dataset] – Dataset with np.ndarray elements
- zip(*datasets)
- cartesian_product(*datasets)
- concat(*datasets)
- reshape(*new_shapes: datasetops.types.Optional[datasetops.types.Shape], **kwshapes: datasetops.types.Optional[datasetops.types.Shape])
- image_resize(*new_sizes: datasetops.types.Optional[datasetops.types.Shape], **kwsizes: datasetops.types.Optional[datasetops.types.Shape])
- to_tensorflow()
- to_pytorch()
- datasetops.dataset._dataset_element_transforming(fn: datasetops.types.Callable, check: datasetops.types.Callable = None)
Applies the function to dataset item elements.
- datasetops.dataset._check_shape_compatibility(shape: datasetops.types.Shape)
- datasetops.dataset.convert2img(elem: datasetops.types.Union[PIL.Image.Image, str, pathlib.Path, numpy.ndarray]) PIL.Image.Image
- datasetops.dataset._check_image_compatibility(elem)
- datasetops.dataset._check_numpy_compatibility(elem)
- datasetops.dataset.allow_unique(max_num_duplicates=1) datasetops.types.Callable[[datasetops.types.Any], bool]
Predicate used for filtering/sampling a dataset classwise.
- Keyword Arguments:
max_num_duplicates {int} – max number of samples to take that share the same value (default: {1})
- Returns:
Callable[[Any], bool] – Predicate function
- datasetops.dataset.custom(elem_transform_fn: datasetops.types.Callable[[datasetops.types.Any], datasetops.types.Any], elem_check_fn: datasetops.types.Callable[[datasetops.types.Any], None] = None) datasetops.types.DatasetTransformFn
Create a user defined transform.
- Arguments:
fn {Callable[[Any], Any]} – A user defined function, which takes the element as only argument
- Keyword Arguments:
check_fn {Callable[[Any]]} – A function that raises an Exception if the elem is incompatible (default: {None})
- Returns:
DatasetTransformFn – [description]
- datasetops.dataset.reshape(new_shape: datasetops.types.Shape) datasetops.types.DatasetTransformFn
- datasetops.dataset.categorical(mapping_fn: datasetops.types.Callable[[datasetops.types.Any], int] = None) datasetops.types.DatasetTransformFn
Transform data into a categorical int label.
- Arguments:
mapping_fn {Callable[[Any], int]} – A function transforming the input data to the integer label. If not specified, labels are automatically inferred from the data.
- Returns:
DatasetTransformFn – A function to be passed to the Dataset.transform()
- datasetops.dataset.categorical_template(ds: Dataset, key: datasetops.types.Key) datasetops.types.Callable[[datasetops.types.Any], int]
Creates a template mapping function to be with one_hot.
- Arguments:
ds {Dataset} – Dataset from which to create a template for one_hot coding key {Key} – Dataset key (name or item index) on the one_hot coding is made
- Returns:
{Callable[[Any],int]} – mapping_fn for one_hot
- datasetops.dataset.one_hot(encoding_size: int, mapping_fn: datasetops.types.Callable[[datasetops.types.Any], int] = None, dtype='bool') datasetops.types.DatasetTransformFn
Transform data into a one-hot encoded label.
- Arguments:
encoding_size {int} – The size of the encoding mapping_fn {Callable[[Any], int]} – A function transforming the input data to an integer label. If not specified, labels are automatically inferred from the data.
- Returns:
DatasetTransformFn – A function to be passed to the Dataset.transform()
- datasetops.dataset.numpy() datasetops.types.DatasetTransformFn
- datasetops.dataset.image() datasetops.types.DatasetTransformFn
- datasetops.dataset.image_resize(new_size: datasetops.types.Shape, resample=Image.NEAREST) datasetops.types.DatasetTransformFn
- datasetops.dataset.zipped(*datasets: datasetops.abstract.AbstractDataset)
- datasetops.dataset.cartesian_product(*datasets: datasetops.abstract.AbstractDataset)
- datasetops.dataset.concat(*datasets: datasetops.abstract.AbstractDataset)
- datasetops.dataset._tf_compute_type(item: datasetops.types.Any)
- datasetops.dataset._tf_compute_shape(item: datasetops.types.Any)
- datasetops.dataset._tf_item_conversion(item: datasetops.types.Any)