* Support accurate chat-formatting when using local LLMs via `oobabooga` (and possibly also `llama-cpp-python` ) . Once you spin up a local LLM at an endpoint like `localhost:5000/v1`, you can either use the chat-formatting done by the API server, or (to have more control over the formatting, and/or ensure accurate formatting), you can specify the formatter via `OpenAIGPTConfig.formatter`, e.g.
llm_config = OpenAIGPTConfig(
chat_model="local/localhost:5000/v1",
formatter="mistral-instruct-v0.2"
)
Langroid uses the formatter name to find the nearest matching model name on the HuggingFace hub, gets its corresponding `tokenizer`, and uses `tokenizer.apply_chat_template(chat)` to convert a chat into a single string.
HuggingFace tends to have the most reliable chat templates, so this will ensure accurate chat formatting that complies with how the local LLM was trained, and will typically produce better results compared to deviating from this format.
(For example when using `litellm` via `ollama` we found that the Mistral chats were being formatted differently from how MistralAI specified them).
As a convenience, the `formatter` can be included as a suffix at the end of `chat_model`, separated by `//`, e.g.
llm_config = OpenAIGPTConfig( chat_model = "local/localhost:5000/v1//mistral-instruct-v0.2")
This lets you easily run any of the example scripts that have a model-switch param `-m`, by simply adding
-m local/localhost:5000/v1//mistral-instruct-v0.2
Also any of the tests can be run against a local model by using the `--m <local_model>//<formatter>` option, e.g.
pytest -s -x tests/main/test_llm.py --m local/localhost:5000/v1//mistral-instruct-v0.2
* Add all OpenAI API params (such as `logprobs` etc) as `OpenAIGPTConfig.params` , which is a Pydantic object of class `OpenAICallParams` (Caution these params are not yet used anywhere in the code, other than `temperature`).