Dataset Ops documentation

Friendly dataset operations for your data science needs. Dataset Ops provides declarative loading, sampling, splitting and transformation operations for datasets, alongside export options for easy integration with Tensorflow and PyTorch.

Dataset Ops pipeline

Illustration Dataset Ops Pipeline. Several built-in loaders makes it possible to load datasets stored in various formats. Several operators are provided that provide common pre-processing steps to be applied to the data quickly. Finally, the processed data can be used as is or exported in a format to be used with ML frameworks.

First Steps

Are you looking for ways to install the framework or do you looking for inspiration to get started?

Loaders and Transforms

Get an overview of the available loaders and transforms that can be used with your dataset.

It is also possible to implement your own loaders and transforms.

Custom Loaders and Transforms

Is your dataset structured in a way thats not compatible with any standard loaders? Or does your application require very specific and complex transformations to be applied to the data? The framework makes integration with custom loaders and transforms easy and clean. For how-to guides on how to do this see:

Performance And Optimizations

Are you looking for ways to reduce the time required to load and process big datasets? The library provides several mechanisms that can drastically reduce the time required.

API Reference

Examples

Looking for more concrete examples of how datasets may be loaded and transformed? See the example section:

Developer And Contributor Guide

Are you looking to contribute to the project or are you already a developer? Contributions of any size and form are always welcomed. Information on how to the codebase is tested, how it is published, and how to add documentation can below: