Data-designer

Latest version: v0.5.2

Safety actively analyzes 868211 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 4

0.5.2

What's Changed
* fix: repair notebook CI (dead model, missing API key, pyarrow type bug) by andreatgretel in https://github.com/NVIDIA-NeMo/DataDesigner/pull/348
* docs: Update top models usage chart for 1/24-2/24/2026 by kirit93 in https://github.com/NVIDIA-NeMo/DataDesigner/pull/353
* docs: add structured outputs SDG dev notes by dhruvnathawani in https://github.com/NVIDIA-NeMo/DataDesigner/pull/338
* feat: add processor plugin support by andreatgretel in https://github.com/NVIDIA-NeMo/DataDesigner/pull/299
* chore: plans for async generators and task-queue dataset builder by andreatgretel in https://github.com/NVIDIA-NeMo/DataDesigner/pull/347
* chore: plans for model facade overhaul by nabinchha in https://github.com/NVIDIA-NeMo/DataDesigner/pull/344
* fix: include seed dataset in builder repr for seed-only configs by johnnygreco in https://github.com/NVIDIA-NeMo/DataDesigner/pull/361
* chore: bump cryptography and pillow for security fixes by johnnygreco in https://github.com/NVIDIA-NeMo/DataDesigner/pull/364
* feat: add Streamable HTTP transport support for remote MCP providers by nabinchha in https://github.com/NVIDIA-NeMo/DataDesigner/pull/358
* docs: update README token badge to 150+ billion by johnnygreco in https://github.com/NVIDIA-NeMo/DataDesigner/pull/367
* docs: fix structure outputs blog format by johnnygreco in https://github.com/NVIDIA-NeMo/DataDesigner/pull/368
* chore: fix inaccuracies and improve AGENTS.md by nabinchha in https://github.com/NVIDIA-NeMo/DataDesigner/pull/369
* fix: include plugin column types in display_sample_record() by 3mei in https://github.com/NVIDIA-NeMo/DataDesigner/pull/365

New Contributors
* 3mei made their first contribution in https://github.com/NVIDIA-NeMo/DataDesigner/pull/365

**Full Changelog**: https://github.com/NVIDIA-NeMo/DataDesigner/compare/v0.5.1...v0.5.2

0.5.1

Data Designer now supports image generation!

What's Changed
* docs: Updated url by kirit93 in https://github.com/NVIDIA-NeMo/DataDesigner/pull/325
* docs: deep research trajectories with NDD and MCP tool use by eric-tramel in https://github.com/NVIDIA-NeMo/DataDesigner/pull/326
* refactor: callback-based processor design by andreatgretel in https://github.com/NVIDIA-NeMo/DataDesigner/pull/294
* feat: add image generation support with multi-modal context by nabinchha in https://github.com/NVIDIA-NeMo/DataDesigner/pull/317
* docs: add image generation documentation and image-to-image editing tutorial by nabinchha in https://github.com/NVIDIA-NeMo/DataDesigner/pull/319
* chore: move ArtifactStorage to engine/storage/ module by nabinchha in https://github.com/NVIDIA-NeMo/DataDesigner/pull/321
* chore: gitignore Cerebro knowledge base files by johnnygreco in https://github.com/NVIDIA-NeMo/DataDesigner/pull/328
* feat(engine): env-var switch for async-first models experiment by eric-tramel in https://github.com/NVIDIA-NeMo/DataDesigner/pull/280
* docs: Moved nav to left hand side by kirit93 in https://github.com/NVIDIA-NeMo/DataDesigner/pull/331
* feat: add --save-results option to preview command by johnnygreco in https://github.com/NVIDIA-NeMo/DataDesigner/pull/333
* chore: Improve CLI startup with lazy heavy import cleanup by johnnygreco in https://github.com/NVIDIA-NeMo/DataDesigner/pull/330
* feat: add allow_resize for 1:N and N:1 generation patterns by andreatgretel in https://github.com/NVIDIA-NeMo/DataDesigner/pull/286
* chore: address Andre's feedback on --save-results and CLI preview by johnnygreco in https://github.com/NVIDIA-NeMo/DataDesigner/pull/335
* chore: remove example_allow_resize.py from repo root by andreatgretel in https://github.com/NVIDIA-NeMo/DataDesigner/pull/337
* fix: make DropColumnsProcessorConfig idempotent and support reasoning columns by andreatgretel in https://github.com/NVIDIA-NeMo/DataDesigner/pull/334
* feat: add push_to_hub_from_folder classmethod for uploading saved datasets by nabinchha in https://github.com/NVIDIA-NeMo/DataDesigner/pull/340
* fix: handle bool, int, float in convert_to_row_element by dhruvnathawani in https://github.com/NVIDIA-NeMo/DataDesigner/pull/336
* feat: auto-detect ImageContext format for image-to-image generation by nabinchha in https://github.com/NVIDIA-NeMo/DataDesigner/pull/342

New Contributors
* dhruvnathawani made their first contribution in https://github.com/NVIDIA-NeMo/DataDesigner/pull/336

**Full Changelog**: https://github.com/NVIDIA-NeMo/DataDesigner/compare/v0.5.0...v0.5.1

0.5.0

🎨 NeMo Data Designer – v0.5.0 Release Notes

⚡Highlights

- 🛠️ MCP Tool Calling: ​​LLM columns can now call external tools during generation via MCP!!

- ⚛️ Functions as custom column generators: The custom_column_generator decorator that lets users write their own column generation logic and plug it directly into a pipeline.

- 🤗 Hugging Face Hub integration: You can now publish generated datasets directly to the Hugging Face Hub with auto-generated dataset cards: `results.push_to_hub()`.

- Huge thank you to davidberenstein1957 for starting the design and work on this feature, as well as davanstrien and wauplin for their help pushing it over the finish line!

- 💻 CLI generation commands: You can generate data from the CLI using the new `preview`, `create`, and `validate` commands.

- 🔍 LLM Observability: Use the new with_trace option on LLM configs to return the `TraceType.ALL_MESSAGE` or the `TraceType.LAST_MESSAGE`. You can also selectively extract reasoning content using `extract_reasoning_content=True`.

⚠️ Breaking Changes

- `with_trace` used to be a boolean. It is now a `TraceType` enum (`NONE` (default), `LAST_MESSAGE`, `ALL_MESSAGES`) instead of a boolean.

- `SingleColumnConfig` is now isolated in its own base module `data_designer.base.config` to protect against circular imports during plugin discovery.

What's Changed
* feat: MCP (Model Context Protocol) tool calling integration for LLM columns by eric-tramel in https://github.com/NVIDIA-NeMo/DataDesigner/pull/248
* fix: normalize license header year format in mcp module by johnnygreco in https://github.com/NVIDIA-NeMo/DataDesigner/pull/279
* chore: configure independent pytest settings per subpackage by johnnygreco in https://github.com/NVIDIA-NeMo/DataDesigner/pull/278
* fix: normalize trace content blocks to prevent parquet write crashes by eric-tramel in https://github.com/NVIDIA-NeMo/DataDesigner/pull/283
* feat: Add TraceType enum for granular trace control by eric-tramel in https://github.com/NVIDIA-NeMo/DataDesigner/pull/284
* docs: add deployment, performance tuning guides and streamline gettin… by kirit93 in https://github.com/NVIDIA-NeMo/DataDesigner/pull/277
* chore: update tutorial notebooks to use dd. notation consistently by andreatgretel in https://github.com/NVIDIA-NeMo/DataDesigner/pull/288
* feat: add extract_reasoning_content option to LLM columns by eric-tramel in https://github.com/NVIDIA-NeMo/DataDesigner/pull/285
* chore: add greptile.json to reduce review verbosity by andreatgretel in https://github.com/NVIDIA-NeMo/DataDesigner/pull/289
* feat: switch from hatch-vcs to uv-dynamic-versioning by johnnygreco in https://github.com/NVIDIA-NeMo/DataDesigner/pull/282
* revert: Remove RunConfig debug_trace_override by eric-tramel in https://github.com/NVIDIA-NeMo/DataDesigner/pull/290
* perf: implement lazy loading for config module exports by johnnygreco in https://github.com/NVIDIA-NeMo/DataDesigner/pull/291
* refactor: move SingleColumnConfig to config.base module by johnnygreco in https://github.com/NVIDIA-NeMo/DataDesigner/pull/287
* feat: Add CustomColumnGenerator for user-defined column generation by andreatgretel in https://github.com/NVIDIA-NeMo/DataDesigner/pull/254
* chore: standardize recipe script metadata and docstrings by johnnygreco in https://github.com/NVIDIA-NeMo/DataDesigner/pull/292
* chore: enable status check in greptile.json by dakshgup in https://github.com/NVIDIA-NeMo/DataDesigner/pull/295
* feat: add HuggingFace Hub integration for dataset publishing by nabinchha in https://github.com/NVIDIA-NeMo/DataDesigner/pull/275
* docs: Added images for deployment options by kirit93 in https://github.com/NVIDIA-NeMo/DataDesigner/pull/297
* docs: Add RQA dataset blog post and improve blog navigation by kirit93 in https://github.com/NVIDIA-NeMo/DataDesigner/pull/296
* chore: quiet tool call logs and add tool usage statistics by johnnygreco in https://github.com/NVIDIA-NeMo/DataDesigner/pull/293
* docs: Added documentation for seed datasets by kirit93 in https://github.com/NVIDIA-NeMo/DataDesigner/pull/300
* docs: updated usage chart by kirit93 in https://github.com/NVIDIA-NeMo/DataDesigner/pull/304
* docs: Update README.md by kirit93 in https://github.com/NVIDIA-NeMo/DataDesigner/pull/305
* chore: update HF card citation copy and add library version to builder config by johnnygreco in https://github.com/NVIDIA-NeMo/DataDesigner/pull/303
* chore: add tokens generated badge to README by johnnygreco in https://github.com/NVIDIA-NeMo/DataDesigner/pull/306
* test: add provider health checks script and CI workflow by andreatgretel in https://github.com/NVIDIA-NeMo/DataDesigner/pull/301
* chore: bump pytest, nbconvert, and pyjwt for vulnerability fixes by johnnygreco in https://github.com/NVIDIA-NeMo/DataDesigner/pull/312
* fix: allow BuilderConfig round-trip serialization by johnnygreco in https://github.com/NVIDIA-NeMo/DataDesigner/pull/311
* chore: export ConstraintType and InequalityOperator from config init by johnnygreco in https://github.com/NVIDIA-NeMo/DataDesigner/pull/308
* docs: restructure plugin docs with multi-file layout and seed reader type by johnnygreco in https://github.com/NVIDIA-NeMo/DataDesigner/pull/302
* docs: Added cat emoji sequence by kirit93 in https://github.com/NVIDIA-NeMo/DataDesigner/pull/316
* fix: use reasoning_effort for gpt-5 inference params by andreatgretel in https://github.com/NVIDIA-NeMo/DataDesigner/pull/315
* docs: New post on SDG design principles by kirit93 in https://github.com/NVIDIA-NeMo/DataDesigner/pull/318
* feat: add preview, create, and validate CLI commands by johnnygreco in https://github.com/NVIDIA-NeMo/DataDesigner/pull/313
* feat: support loading config files from HTTP(S) URLs by johnnygreco in https://github.com/NVIDIA-NeMo/DataDesigner/pull/323
* fix: include CUSTOM type in execution DAG and warn on generator errors by andreatgretel in https://github.com/NVIDIA-NeMo/DataDesigner/pull/324
* fix: trim LLM response content before parsing by johnnygreco in https://github.com/NVIDIA-NeMo/DataDesigner/pull/322

New Contributors
* dakshgup made their first contribution in https://github.com/NVIDIA-NeMo/DataDesigner/pull/295
* davanstrien
* davidberenstein1957
* wauplin

**Full Changelog**: https://github.com/NVIDIA-NeMo/DataDesigner/compare/v0.4.0...v0.5.0

0.4.0

🎨 NeMo Data Designer v0.4.0 Release Notes

✨ What's New

- **Message Traces**: he full conversation history during LLM generation, giving you access to system prompts, rendered user prompts, and model reasoning for downstream use cases.. Enable per-column with `with_trace=True` or globally via `RunConfig`.

- **Multi-Image Support**: Pass multiple images per column in multi-modal contexts for richer vision-based generation.

- **Expanded Code Languages**: Added support for Bash, C, C++, C, and COBOL in `LLMCodeColumnConfig`.

- **Progress Logging**: Progress updates during LLM-column generation for better visibility into long-running jobs.

---

💥 Breaking Change: Import structure

The `essentials` module has been removed in favor of a cleaner import pattern. Configuration classes are now accessed via `data_designer.config` and the main interface via `data_designer.interface`.

Before (v0.3.x):

python
from data_designer.essentials import (
CategorySamplerParams,
DataDesigner,
DataDesignerConfigBuilder,
LLMTextColumnConfig,
SamplerColumnConfig,
SamplerType,
)

data_designer = DataDesigner()
config_builder = DataDesignerConfigBuilder()


After (v0.4.x):

python
import data_designer.config as dd
from data_designer.interface import DataDesigner

data_designer = DataDesigner()
config_builder = dd.DataDesignerConfigBuilder()

Configuration classes are accessed via the `dd` namespace
config_builder.add_column(
dd.SamplerColumnConfig(
name="category",
sampler_type=dd.SamplerType.CATEGORY,
params=dd.CategorySamplerParams(values=["A", "B"]),
)
)



💥 Breaking Change: Reasoning traces → Message traces

The automatic `__reasoning_trace` columns have been replaced with opt-in message traces that capture the full conversation history.

**Key changes:**
- Column postfix renamed from `__reasoning_trace` to `__trace`
- Traces are now **opt-in** rather than automatic
- Traces capture the full message history (system/user/assistant), including retry conversations

Before (v0.3.x):

Reasoning traces were automatically generated as side-effect columns for extended thinking models:

python
Traces were automatic - no configuration needed
Column "answer" would automatically produce "answer__reasoning_trace"


After (v0.4.x):

Enable traces explicitly per-column or globally:

**Per-column (recommended):**
python
import data_designer.config as dd

config_builder.add_column(
dd.LLMTextColumnConfig(
name="answer",
prompt="Answer: {{ question }}",
model_alias="nvidia-text",
with_trace=True, Opt-in to trace capture
)
)
Produces "answer" and "answer__trace" columns


**Global debug override:**
python
import data_designer.config as dd
from data_designer.interface import DataDesigner

data_designer = DataDesigner()
data_designer.set_run_config(
dd.RunConfig(debug_override_save_all_column_traces=True)
)


The trace data structure is now a `list[dict]` capturing the ordered message history:

python
[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is 2+2?"},
{"role": "assistant", "content": "4", "reasoning_content": None}
]

---

What's Changed
* feat: Add /create-pr skill for well-formatted GitHub PRs by johnnygreco in https://github.com/NVIDIA-NeMo/DataDesigner/pull/247
* docs: Fix mkdocs syntax and update person sampling documentation by johnnygreco in https://github.com/NVIDIA-NeMo/DataDesigner/pull/249
* refactor: slim package refactor into three subpackages by johnnygreco in https://github.com/NVIDIA-NeMo/DataDesigner/pull/240
* chore: add publish script and update license headers by johnnygreco in https://github.com/NVIDIA-NeMo/DataDesigner/pull/253
* chore: add CODEOWNERS for automatic PR review assignment by andreatgretel in https://github.com/NVIDIA-NeMo/DataDesigner/pull/251
* feat: allow skipping health checks by nabinchha in https://github.com/NVIDIA-NeMo/DataDesigner/pull/244
* chore: copy README to data-designer package during install by johnnygreco in https://github.com/NVIDIA-NeMo/DataDesigner/pull/256
* feat: support multiple images per column in image context by nabinchha in https://github.com/NVIDIA-NeMo/DataDesigner/pull/257
* fix: escape special characters in SchemaTransformProcessor JSON templates by andreatgretel in https://github.com/NVIDIA-NeMo/DataDesigner/pull/250
* chore: update telemetry by johntmyers in https://github.com/NVIDIA-NeMo/DataDesigner/pull/261
* feat: add /update-pr skill and improve /create-pr file linking by johnnygreco in https://github.com/NVIDIA-NeMo/DataDesigner/pull/258
* feat: Add /commit skill for conventional commit messages by johnnygreco in https://github.com/NVIDIA-NeMo/DataDesigner/pull/252
* fix: automate README sync for data-designer package builds by andreatgretel in https://github.com/NVIDIA-NeMo/DataDesigner/pull/266
* chore: simplify publish script by removing redundant rebuild step by johnnygreco in https://github.com/NVIDIA-NeMo/DataDesigner/pull/268
* feat: add job progress logging for cell-by-cell generation by eric-tramel in https://github.com/NVIDIA-NeMo/DataDesigner/pull/259
* feat: add message trace support for LLM generation by johnnygreco in https://github.com/NVIDIA-NeMo/DataDesigner/pull/272
* chore: add animated emoji progress indicators to progress tracker by johnnygreco in https://github.com/NVIDIA-NeMo/DataDesigner/pull/273
* feat: Add Phase 1 languages (Bash, C, C++, C, COBOL) to CodeLang by kirit93 in https://github.com/NVIDIA-NeMo/DataDesigner/pull/271
* fix: ensure 100% progress is logged exactly once by johnnygreco in https://github.com/NVIDIA-NeMo/DataDesigner/pull/276


**Full Changelog**: https://github.com/NVIDIA-NeMo/DataDesigner/compare/v0.3.8...v0.4.0

0.3.8

👀 New Nemotron-Personas Datasets

`PersonSampler` supports two new locales:

- [Nemotron-Personas-Singapore](https://huggingface.co/datasets/nvidia/Nemotron-Personas-Singapore) (`locale = en_SG`)
- [Nemotron-Personas-Brazil](https://huggingface.co/datasets/nvidia/Nemotron-Personas-Brazil) (`locale = pt_BR`)

What's Changed
* fix: unblock generation when no from-scratch-generator is configured by nabinchha in https://github.com/NVIDIA-NeMo/DataDesigner/pull/231
* fix: do not attempt to deserialize llm text response by nabinchha in https://github.com/NVIDIA-NeMo/DataDesigner/pull/233
* docs: Updated recipe card by kirit93 in https://github.com/NVIDIA-NeMo/DataDesigner/pull/153
* fix: no api key warning on default model providers by nabinchha in https://github.com/NVIDIA-NeMo/DataDesigner/pull/238
* feat: Support for Claude Skills (DevX and Generation) by eric-tramel in https://github.com/NVIDIA-NeMo/DataDesigner/pull/239
* feat: Elevate non-LLM concurrency limits to `RunConfig` by eric-tramel in https://github.com/NVIDIA-NeMo/DataDesigner/pull/242
* feat: wire up pt_GB and en_SG personas by johnnygreco in https://github.com/NVIDIA-NeMo/DataDesigner/pull/245


**Full Changelog**: https://github.com/NVIDIA-NeMo/DataDesigner/compare/v0.3.7...v0.3.8

0.3.7

🎨 NeMo Data Designer v0.3.6 Release Notes
- Restores lazy load changes introduced in [PR-222](https://github.com/NVIDIA-NeMo/DataDesigner/pull/222) to `litellm_overrides.py` that led to intermittent import issues.

What's Changed
* fix: restore lazy load litellm overrides changes by nabinchha in https://github.com/NVIDIA-NeMo/DataDesigner/pull/229


**Full Changelog**: https://github.com/NVIDIA-NeMo/DataDesigner/compare/v0.3.6...v0.3.7

Page 1 of 4

© 2026 Safety CLI Cybersecurity Inc. All Rights Reserved.