Instructlab-training

Latest version: v0.15.0

Safety actively analyzes 868211 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 8

0.15.0

What's New

Features

- **Vision-Language Model (VLM) Support for Text-Only Training** (693)
- Added automatic detection and loading of vision-language models for text-only training
- New `vlm_utils.py` module with utilities for identifying and extracting CausalLM text backbones from VLM wrappers
- Support for two VLM loading strategies: extracting the text backbone when a CausalLM sub-model exists, or direct VLM loading when no CausalLM variant is available
- Improved tokenizer/text-config reconciliation for VLMs where `vocab_size` lives under `text_config`

- **Mixed Attention Handling for VLMs** (693)
- Models with `timm` vision towers now use per-component attention: `eager` for vision, `flash_attention_2` or `sdpa` for text
- Automatic SDPA fallback for M-RoPE models (e.g. Qwen3.5 VL) which are incompatible with Flash Attention 2

Bug Fixes

- **FSDP Wrap Policy Robustness** (693)
- Fixed `_no_split_modules` resolution to handle models that declare module names for architectures not loaded (e.g. vision blocks when loading only the CausalLM)
- FSDP wrap policy now resolves all declared module names against both the wrapper and underlying HF model, filtering out unresolvable entries

- **GPT-OSS Attention Capability Detection** (693)
- `vllm-flash-attn3` is now gated behind a Hopper (SM 9.0+) GPU capability check, falling back to `eager` on older hardware

Improvements

- **Local Mamba Kernel Preference** (693)
- GraniteMoeHybrid models now pre-populate the Hub kernel cache with locally installed `mamba_ssm` and `causal_conv1d` to avoid PyTorch/CUDA ABI mismatches with Hub-provided kernel builds

What's Changed
* add support for qwen3.5 vl model by RobotSail in https://github.com/instructlab/training/pull/693

**Full Changelog**: https://github.com/instructlab/training/compare/v0.14.2...v0.15.0

0.14.2

What's Changed
* Add backwards compatibility for transformers v4.57 by Maxusmusti in https://github.com/instructlab/training/pull/684
* Adds Validation Adds validation loss + exposes it in the API by RobotSail in https://github.com/instructlab/training/pull/685


**Full Changelog**: https://github.com/instructlab/training/compare/v0.14.1...v0.14.2

0.14.1

What's Changed
* fix _no_split_modules subscript error for transformers v5 by Maxusmusti in https://github.com/instructlab/training/pull/683


**Full Changelog**: https://github.com/instructlab/training/compare/v0.14.0...v0.14.1

0.14.0

What's New

Features

- **MLflow Logging Backend** (680)
- Added `MLflowHandler` class for logging training metrics to MLflow
- New `TrainingArgs` fields: `mlflow_tracking_uri`, `mlflow_experiment_name`, `mlflow_run_name`
- Added `wandb_project`, `wandb_entity`, `wandb_run_name` fields for W&B configuration
- Added `tensorboard_log_dir` field for configurable TensorBoard log directory
- New optional install targets: `requirements-mlflow.txt`, `requirements-wandb.txt`, `requirements-tensorboard.txt`

- **Transformers v5 Compatibility** (681)
- Updated tokenizer API calls to use `extra_special_tokens` instead of `additional_special_tokens`
- Suppressed verbose httpx HTTP request logs from huggingface_hub

Bug Fixes

- **HYBRID_SHARD Failure Fix** (682)
- Added detection for when `world_size < num_devices_per_node` in FSDP configuration
- Automatically falls back to `FULL_SHARD` with a warning when `HYBRID_SHARD` would fail

Development

- **Tox-UV Integration** (676)
- Added `tox-uv` as a tox requirement with `uv-venv-runner`
- Updated GitHub workflows to use `uv` for package installation
- Replaced `pip install` with `uv pip install` in CI workflows

What's Changed
* adds integration for tox-uv and updates workflows to use tox-uv by RobotSail in https://github.com/instructlab/training/pull/676
* Add transformers v5 compatibility by Maxusmusti in https://github.com/instructlab/training/pull/681
* Fix HYBRID_SHARD failure when world_size < available GPUs by rtj1 in https://github.com/instructlab/training/pull/682
* Add MLflow support and expose logging configuration in TrainingArgs by RobotSail in https://github.com/instructlab/training/pull/680

New Contributors
* rtj1 made their first contribution in https://github.com/instructlab/training/pull/682 🎉

Files Changed

18 files changed with 482 insertions and 83 deletions:
- Core training modules: `logger.py`, `config.py`, `accelerator.py`, `data_process.py`, `tokenizer_utils.py`, `main_ds.py`
- New requirements files for optional logging backends
- Updated CI workflows and tox configuration

**Full Changelog:** https://github.com/instructlab/training/compare/v0.13.0...v0.14.0

0.13.0

What's New

Features

- **Pretraining Data Processing API** (672)
- Added new API for processing pretraining-style datasets
- Documents are now chunked by configurable `block_size`
- Chunks are treated as independent, fully-unmasked samples
- Updated training loop to ingest pretraining-style datasets
- Includes comprehensive test coverage (`test_pretraining_data_process.py`, `test_pretraining_mode.py`, `test_pretraining_sampler.py`)

- **AdamW Optimizer Configuration** (674)
- Exposed `weight_decay`, `betas`, and `eps` parameters in TrainingArgs
- Users can now tune AdamW hyperparameters through `run_training()` API
- Provides more control over optimizer behavior

- **Granite 4 Model Support** (669)
- Added support for Granite 4 models as Mixture of Experts (MoE) models in training

Bug Fixes

- **Process Timing Fix** (675)
- Fixed race condition where process wasn't completed by the time it was read

- **Variable Access Fix** (668)
- Fixed invalid variable access bug

Dependencies

- **Build Dependency Update** (670)
- Updated hynek build dependency

Files Changed

17 files changed with 1,642 insertions and 52 deletions:
- Core training modules: `data_process.py`, `main_ds.py`, `sampler.py`, `model.py`, `config.py`
- New test suites for pretraining functionality
- Updated README with new capabilities

Full Changelog

**All Changes:**
- 574f946 Exposes API for processing pretraining data (672)
- 638a753 fixes bug where process isn't completed by the time the process gets read (675)
- c495035 Expose AdamW optimizer parameters in training API (674)
- 3d05302 Handle granite 4 as MoE models in training (669)
- 781c36f fixes stray invalid variable access bug (668)
- 529c2f7 bumps hynek build dep (670)

**Full Diff:** `v0.12.1...v0.13.0`

0.12.1

What's Changed
* Update requirements-cuda.txt to increase liger-kernel minimum by Maxusmusti in https://github.com/instructlab/training/pull/659
* Adds mamba-ssm[causal-conv1d] to CUDA requirements by RobotSail in https://github.com/instructlab/training/pull/663
* Removes Numpy version cap by RobotSail in https://github.com/instructlab/training/pull/664
* fix(torchrun): Omit empty arguments and correct nproc_per_node type by szaher in https://github.com/instructlab/training/pull/661

New Contributors
* szaher made their first contribution in https://github.com/instructlab/training/pull/661

**Full Changelog**: https://github.com/instructlab/training/compare/v0.12.0...v0.12.1

Page 1 of 8

© 2026 Safety CLI Cybersecurity Inc. All Rights Reserved.