Major changes
* set up github pages for docs https://docs.vectorch.com/
* set up whl repository to host published whls: https://whl.vectorch.com/
* support pip install with different versions: for example: `pip install scalellm -i https://whl.vectorch.com/cu121/torch2.3/`
* added latency and system metrics
* added initial monitoring dashboard.
* bug fix for decoder, rejection sampler, and default value for llama2
What's Changed
* ci: added workflow to publish docs to GitHub Pages by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/206
* docs: added docs skeleton by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/207
* docs: fixed source directory and added announcement by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/208
* feat: added monitoring docker compose for prometheus and grafana by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/209
* feat: Added prometheus metrics by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/210
* feat: added token related latency metrics by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/211
* fix: fix weight load issue for fused qkv and added more unittests for weight loading by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/213
* fix: use a consistent version for whl by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/214
* refactor: move setup.py to top level by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/217
* feat: carry over prompt to output for feature parity by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/218
* added missing changes for carrying over prompt by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/219
* fix: set correct default value of rope_theta for llama2 by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/223
* feat: convert pickle to safetensors for fast loading by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/224
* docs: add livehtml for docs development by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/225
* fix: use error instead of CHECK when prompt input is empty by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/226
* fix: avoid tensor convertion for converted ones. by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/228
* feat: added time_to_first_token and inter_token metrics for both stream and non-stream requests by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/227
* fix: decode ending tokens one by one to handle unfinished tokens by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/229
**Full Changelog**: https://github.com/vectorch-ai/ScaleLLM/compare/v0.1.1...v0.1.2