(1) Fix an issue such that the augmentation may draw out-of-intended-boundary noise for concentration augmentation.
(2) Add optional least square solver (`torch.linalg.lstsq`) for concentration computation (`concentration_method='ls'`, besides the existing `'ista'` and `'cd'` options) when dealing with huge number of small image inputs (e.g., batches of `Nx3x256x256` input) on-the-fly. Note that while 'ls' is faster in such scenario, it does not have sparse constraints, and it may fail on GPU if the height/width of image is too large regardless of batch size, due to the limitation of `torch.linalg.lstsq`.
(3) Readme and demo update.