🎨 NeMo Data Designer v0.4.0 Release Notes
✨ What's New
- **Message Traces**: he full conversation history during LLM generation, giving you access to system prompts, rendered user prompts, and model reasoning for downstream use cases.. Enable per-column with `with_trace=True` or globally via `RunConfig`.
- **Multi-Image Support**: Pass multiple images per column in multi-modal contexts for richer vision-based generation.
- **Expanded Code Languages**: Added support for Bash, C, C++, C, and COBOL in `LLMCodeColumnConfig`.
- **Progress Logging**: Progress updates during LLM-column generation for better visibility into long-running jobs.
---
💥 Breaking Change: Import structure
The `essentials` module has been removed in favor of a cleaner import pattern. Configuration classes are now accessed via `data_designer.config` and the main interface via `data_designer.interface`.
Before (v0.3.x):
python
from data_designer.essentials import (
CategorySamplerParams,
DataDesigner,
DataDesignerConfigBuilder,
LLMTextColumnConfig,
SamplerColumnConfig,
SamplerType,
)
data_designer = DataDesigner()
config_builder = DataDesignerConfigBuilder()
After (v0.4.x):
python
import data_designer.config as dd
from data_designer.interface import DataDesigner
data_designer = DataDesigner()
config_builder = dd.DataDesignerConfigBuilder()
Configuration classes are accessed via the `dd` namespace
config_builder.add_column(
dd.SamplerColumnConfig(
name="category",
sampler_type=dd.SamplerType.CATEGORY,
params=dd.CategorySamplerParams(values=["A", "B"]),
)
)
💥 Breaking Change: Reasoning traces → Message traces
The automatic `__reasoning_trace` columns have been replaced with opt-in message traces that capture the full conversation history.
**Key changes:**
- Column postfix renamed from `__reasoning_trace` to `__trace`
- Traces are now **opt-in** rather than automatic
- Traces capture the full message history (system/user/assistant), including retry conversations
Before (v0.3.x):
Reasoning traces were automatically generated as side-effect columns for extended thinking models:
python
Traces were automatic - no configuration needed
Column "answer" would automatically produce "answer__reasoning_trace"
After (v0.4.x):
Enable traces explicitly per-column or globally:
**Per-column (recommended):**
python
import data_designer.config as dd
config_builder.add_column(
dd.LLMTextColumnConfig(
name="answer",
prompt="Answer: {{ question }}",
model_alias="nvidia-text",
with_trace=True, Opt-in to trace capture
)
)
Produces "answer" and "answer__trace" columns
**Global debug override:**
python
import data_designer.config as dd
from data_designer.interface import DataDesigner
data_designer = DataDesigner()
data_designer.set_run_config(
dd.RunConfig(debug_override_save_all_column_traces=True)
)
The trace data structure is now a `list[dict]` capturing the ordered message history:
python
[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is 2+2?"},
{"role": "assistant", "content": "4", "reasoning_content": None}
]
---
What's Changed
* feat: Add /create-pr skill for well-formatted GitHub PRs by johnnygreco in https://github.com/NVIDIA-NeMo/DataDesigner/pull/247
* docs: Fix mkdocs syntax and update person sampling documentation by johnnygreco in https://github.com/NVIDIA-NeMo/DataDesigner/pull/249
* refactor: slim package refactor into three subpackages by johnnygreco in https://github.com/NVIDIA-NeMo/DataDesigner/pull/240
* chore: add publish script and update license headers by johnnygreco in https://github.com/NVIDIA-NeMo/DataDesigner/pull/253
* chore: add CODEOWNERS for automatic PR review assignment by andreatgretel in https://github.com/NVIDIA-NeMo/DataDesigner/pull/251
* feat: allow skipping health checks by nabinchha in https://github.com/NVIDIA-NeMo/DataDesigner/pull/244
* chore: copy README to data-designer package during install by johnnygreco in https://github.com/NVIDIA-NeMo/DataDesigner/pull/256
* feat: support multiple images per column in image context by nabinchha in https://github.com/NVIDIA-NeMo/DataDesigner/pull/257
* fix: escape special characters in SchemaTransformProcessor JSON templates by andreatgretel in https://github.com/NVIDIA-NeMo/DataDesigner/pull/250
* chore: update telemetry by johntmyers in https://github.com/NVIDIA-NeMo/DataDesigner/pull/261
* feat: add /update-pr skill and improve /create-pr file linking by johnnygreco in https://github.com/NVIDIA-NeMo/DataDesigner/pull/258
* feat: Add /commit skill for conventional commit messages by johnnygreco in https://github.com/NVIDIA-NeMo/DataDesigner/pull/252
* fix: automate README sync for data-designer package builds by andreatgretel in https://github.com/NVIDIA-NeMo/DataDesigner/pull/266
* chore: simplify publish script by removing redundant rebuild step by johnnygreco in https://github.com/NVIDIA-NeMo/DataDesigner/pull/268
* feat: add job progress logging for cell-by-cell generation by eric-tramel in https://github.com/NVIDIA-NeMo/DataDesigner/pull/259
* feat: add message trace support for LLM generation by johnnygreco in https://github.com/NVIDIA-NeMo/DataDesigner/pull/272
* chore: add animated emoji progress indicators to progress tracker by johnnygreco in https://github.com/NVIDIA-NeMo/DataDesigner/pull/273
* feat: Add Phase 1 languages (Bash, C, C++, C, COBOL) to CodeLang by kirit93 in https://github.com/NVIDIA-NeMo/DataDesigner/pull/271
* fix: ensure 100% progress is logged exactly once by johnnygreco in https://github.com/NVIDIA-NeMo/DataDesigner/pull/276
**Full Changelog**: https://github.com/NVIDIA-NeMo/DataDesigner/compare/v0.3.8...v0.4.0