Changed
- allow non-tuple problem_size for 1D grids
- changed default for grid_div_y from None to block_size_y
- converted the tutorial to a Jupyter Notebook
- CUDA backend prints device in use, similar to OpenCL backend
- migrating from nosetests to pytest
- rewrote many of the examples to save results to json files
Added
- full support for 3D grids, including option for grid_div_z
- separable convolution example