Accelerate the iteration over your machine learning dataset by up to 20 times !
mmap_ninja is a library for storing your datasets in memory-mapped files, which leads to a dramatic speedup in the training time.
The only dependencies are numpy and tqdm.
You can use mmap_ninja with any training framework (such as Tensorflow, PyTorch, MxNet), etc., as it stores your dataset as a memory-mapped numpy array.
A memory mapped file is a file that is physically present on disk in a way that the correlation between the file and the memory space permits applications to treat the mapped portions as if it were primary memory, allowing very fast I/O!
When working on a machine learning project, one of the most time-consuming parts is the model's training. However, a large portion of the training time actually consists of just iterating over your dataset and filesystem I/O!
This library, mmap_ninja provides high-level, easy to use, well tested API for using memory maps for your datasets, reducing the time needed for training.
Memory maps would usually take a little more disk space though, so if you are willing to trade some disk space for fast filesystem to memory I/O, this is your library!