Changed
- (Needs simple test) Setting ``CUDA_VISIBLE_DEVICES`` variable before cuda is initialized, so that in ``--debug`` mode we can use the GPU with the least amount of used memory.
- Commented a couple of lines which forces OMP_NUM_THREADS to 1 and Pytorch threads to 1 as well. It seems we don't need them anymore.