- Added support to run multiple variants of the same network concurrently - Note: Does NOT support networks of different layer dimensions. Currently, that is too involved for the scope of this package - Sped up dataset import time at the cost of slightly slower kernels
0.0.8
- Instead of using as many arrays as dataset inputs on the GPU, datasets are implemented as 1 array for testing and 1 for training - Slower for inferencing operations, but far quicker for dataset import - Removed progress bar for dataset import
0.0.7
sourcemodule.get_function() to sourcemodule.prepared_call() - Significantly reduces kernel launch time cuda.malloc to pycuda.gpuarray - Includes a device to host functions called to_gpu(), and a host to device function called get() - Reduces required lines of code