Breaking Changes
- llama-cpp-python has been bumped to 0.3.2. This allows for serving of Granite 3.0 GGUF Models. With this change, some previous handling of context window size has been modified to work with the 0.3.z releases of llama-cpp-python.
- `ilab train --pipeline=simple` no longer supports Intel Gaudi (`hpu`) devices. Simple training on Gaudi was experimental and limited to a single device.
- The results of MT-Bench and MT-Bench-Branch are now stored in `$XDG_DATA_HOME/eval/{mt_bench,mt_bench_branch}` respectively. Previously the results for both benchmarks were stored in `$XDG_DATA_HOME/eval`. `ilab config init` must be run to initialize the new evaluation results directories.
- `system_prompt` variable and `dk_bench` section have been added to the `evaluate` section of the configuration file. `ilab config init` should be run to initialize these new sections in the config file.
Features
- An experimental/preview implementation of Retrieval-Augmented Generation (RAG) is added. It is enabled only when an `ILAB_FEATURE_SCOPE` environment variable is set to `DevPreviewNoUpgrade`. For details see the [instructions in README.md](https://github.com/instructlab/instructlab/?tab=readme-ov-file#-configure-retrieval-augmented-generation-developer-preview).
- A new command-group `ilab rag` has been introduced. The group includes two new commands: `ilab rag convert` and `ilab rag ingest`. The former converts documents (e.g., PDF) into a structured form and the latter ingests them into a vector index file.
- A new argument `--rag` is added to the `ilab model chat` command that uses that index during chat to augment the generation. When that flag is sent, the chat functionality responds to each chat input by first retrieving text from the vector index and then providing that text to the model for its use in answering.
- A new command `ilab model upload` has been introduced so users can now upload their trained models to [Hugging Face](https://huggingface.co/), OCI registry endpoints, and [AWS S3](https://aws.amazon.com/s3/) buckets via the `ilab` CLI
- `ilab model serve` now has separate `--host` and `--port` options, replacing the `host_port` configuration. The default values are `127.0.0.1` for `--host` and `8000` for `--port`, allowing users to configure the server's binding address and port independently through the configuration file or command-line flags.
- Update vLLM to version 0.6.4.post1. As a requirement for this new vLLM version, PyTorch is updated to 2.5.1
- `--disable-accelerate-full-state-at-epoch` added for accelerated training. With this option only HuggingFace checkpoints are saved which are required for multi-phase training. However, if set this switch also disables resumeable training because "full resumeability" requires full-state checkpoints. This option should be used if storage is limited and/or resumeability isn't required.
- `ilab data generate` now stores generated data from each run in individually dated directories.
- `ilab data list` now organizes tables per dated run, and outputs a more detailed table that describes which model generated which dataset.
- `ilab model train` and `ilab model test` now search for training data in per-run directories in addition to the top-level directory, maintaining backwards compatibility with old datasets.
- `ilab model evaluate` now has support for DK-Bench (Domain Knowledge Bench). DK-Bench takes in a set of questions and reference answers provided from a user, gets responses from a model to those questions, and then uses a judge model to grade the response to each question on a 1-5 scale compared to the reference answer. The highest possible score for each question is a 5 (fully accurate, and completely aligned with the reference) and the lowest is a 1 (entirely incorrect, and irrelevant). To run the benchmark the environment variable OPENAI_API_KEY must be set. The judge model for DK-Bench is `gpt-4o` and any judge model provided for DK-Bench must be the name of an OpenAI model.
- `ilab model evaluate` now has `--skip-server` option to skip launching the server and evaluate directly with the HuggingFace model. This option supports mmlu and mmlu_branch benchmarks.