Added:
- Support for default CPU tinyBLAS (llamafile, sgemm) builds.
- Support for CPU OpenBLAS (GGML_OPENBLAS) builds.
Changed:
- Build scripts now have separate step/function `cuda_12_5_1_setup` which setups CUDA 12.5.1 env for build-time.
Fixed:
- Stop thread in `llama_generate` on `GeneratorExit`.
Removed:
- `callback` parameter in `llama_generate` and dependent functions.