Distilabel

Latest version: v1.1.1

Safety actively analyzes 631390 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 3

1.1.1

What's Changed
* Fix crash when using vLLM without structured generation by cg123 in https://github.com/argilla-io/distilabel/pull/658
* Fix error on `Pipeline.dry_run` without `parameters` by plaguss in https://github.com/argilla-io/distilabel/pull/655

New Contributors
* cg123 made their first contribution in https://github.com/argilla-io/distilabel/pull/658

**Full Changelog**: https://github.com/argilla-io/distilabel/compare/1.1.0...1.1.1

1.1.0

Two new tasks implemented!

`Genstruct` task (https://github.com/argilla-io/distilabel/pull/600)

You can now use `Genstruct` task as described in https://huggingface.co/NousResearch/Genstruct-7B, to generate synthetic instruction fine-tuning datasets from a raw document:

python
from distilabel.llms import TransformersLLM
from distilabel.pipeline import Pipeline
from distilabel.steps import KeepColumns, LoadDataFromDicts
from distilabel.steps.tasks import Genstruct

with Pipeline(name="harry-potter-genstruct") as pipeline:
load_hub_dataset = LoadDataFromDicts(
name="load_dataset",
data=[
{
"title": "Harry Potter and the Sorcerer's Stone",
"content": "An orphaned boy enrolls in a school of wizardry, where he learns the truth about himself, his family and the terrible evil that haunts the magical world.",
},
{
"title": "Harry Potter and the Chamber of Secrets",
"content": "Harry Potter lives his second year at Hogwarts with Ron and Hermione when a message on the wall announces that the legendary Chamber of Secrets has been opened. The trio soon realize that, to save the school, it will take a lot of courage.",
},
],
)

task = Genstruct(
name="task",
llm=TransformersLLM(
model="NousResearch/Genstruct-7B",
torch_dtype="float16",
chat_template="{{ messages[0]['content'] }}",
device="cuda:0",
),
num_generations=2,
group_generations=False,
output_mappings={"model_name": "model"},
)


`PrometheusEval` task (https://github.com/argilla-io/distilabel/pull/610)

A new `PrometheusEval` task, based on the recently published paper ["Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models"](https://arxiv.org/abs/2405.01535):

python
from distilabel.steps.tasks import PrometheusEval

with Pipeline(name="prometheus") as pipeline:
load_dataset = LoadHubDataset(
name="load_dataset",
repo_id="HuggingFaceH4/instruction-dataset",
split="test",
output_mappings={"prompt": "instruction", "completion": "generation"},
)

task = PrometheusEval(
name="task",
llm=vLLM(
model="prometheus-eval/prometheus-7b-v2.0",
chat_template="[INST] {{ messages[0]['content'] }}\n{{ messages[1]['content'] }}[/INST]",
),
mode="absolute",
rubric="factual-validity",
reference=False,
num_generations=1,
group_generations=False,
)

load_dataset >> task


Connect the steps in the pipeline with `>>` (https://github.com/argilla-io/distilabel/pull/490)

Now you can connect your steps using the *binary shift* operator in python:

python
from distilabel.pipeline import Pipeline
from distilabel.steps.generators.huggingface import LoadHubDataset
from distilabel.steps.task.evol_instruct.base import EvolInstruct
from distilabel.steps.combine import CombineColumns

with Pipeline(name="Pipe name") as pipeline:
load_hub_dataset = LoadHubDataset(name="load_dataset", batch_size=8)
evol_instruction_complexity_1 = EvolInstruct(
llm=OpenAILLM(model="gpt-3.5-turbo"),
)
evol_instruction_complexity_2 = EvolInstruct(
llm=InferenceEndpointsLLM(model_id="mistralai/Mixtral-8x7B-Instruct-v0.1"),
)

combine_columns = CombineColumns(
columns=["response"],
output_columns=["candidates"],
)

(
load_hub_dataset
>> [evol_instruction_complexity_1, evol_instruction_complexity_2]
>> combine_columns
)


Routing batch function (https://github.com/argilla-io/distilabel/pull/595)

Thanks to the new `routing_batch_function`, each batch of an upstream step can be routed conditionally to a list of specific downstream steps. In addition, we have included a `sample_n_steps` routing batch function, making easier replicating the definition of the original UltraFeedback paper:

python
import random
from distilabel.llms import MistralLLM, OpenAILLM, VertexAILLM
from distilabel.pipeline import Pipeline, routing_batch_function
from distilabel.steps import CombineColumns, LoadHubDataset
from distilabel.steps.tasks import TextGeneration

routing_batch_function()
def sample_two_steps(steps: list[str]) -> list[str]:
return random.sample(steps, 2)

with Pipeline("pipe-name", description="My first pipe") as pipeline:
load_dataset = LoadHubDataset(
name="load_dataset",
output_mappings={"prompt": "instruction"},
)

tasks = []
for llm in (
OpenAILLM(model="gpt-4-0125-preview"),
MistralLLM(model="mistral-large-2402"),
VertexAILLM(model="gemini-1.0-pro"),
):
tasks.append(
TextGeneration(name=f"text_generation_with_{llm.model_name}", llm=llm)
)

combine_generations = CombineColumns(
name="combine_generations",
columns=["generation", "model_name"],
output_columns=["generations", "model_names"],
)

load_dataset >> sample_two_steps >> tasks >> combine_generations


Generate structured outputs using `outlines` (https://github.com/argilla-io/distilabel/pull/601)

You can generate `JSON` or `regex` using `TransformersLLM`, `LlamaCppLLM` or `vLLM` thanks to the integration with `[outlines](https://github.com/outlines-dev/outlines)`

python
from enum import Enum

from distilabel.llms import LlamaCppLLM
from distilabel.pipeline import Pipeline
from distilabel.steps import LoadDataFromDicts
from distilabel.steps.tasks import TextGeneration
from pydantic import BaseModel, StringConstraints, conint
from typing_extensions import Annotated

class Weapon(str, Enum):
sword = "sword"
axe = "axe"
mace = "mace"
spear = "spear"
bow = "bow"
crossbow = "crossbow"

class Armor(str, Enum):
leather = "leather"
chainmail = "chainmail"
plate = "plate"
mithril = "mithril"

class Character(BaseModel):
name: Annotated[str, StringConstraints(max_length=30)]
age: conint(gt=1, lt=3000)
armor: Armor
weapon: Weapon

with Pipeline("RPG-characters") as pipeline:
system_prompt = (
"You are a leading role play gamer. You have seen thousands of different characters and their attributes."
" Please return a JSON object with common attributes of an RPG character."
)

load_dataset = LoadDataFromDicts(
name="load_instructions",
data=[
{
"system_prompt": system_prompt,
"instruction": f"Give me a character description for a {char}",
}
for char in ["dwarf", "elf", "human", "ork"]
],
)

text_generation = TextGeneration(
name="text_generation_rpg",
llm=LlamaCppLLM(
model_path="model/path", type: ignore
structured_output={"format": "json", "schema": Character},
),
)
load_dataset >> text_generation


New `GroqLLM` (https://github.com/argilla-io/distilabel/pull/583)

New integration with [groq](https://console.groq.com/docs/quickstart), special mention to kcentric which did the initial work prior to the refactor for 1.0.0

python
from distilabel.llms.groq import GroqLLM
from distilabel.pipeline import Pipeline
from distilabel.steps.tasks import TextGeneration

with Pipeline(name="text-generation-groq") as pipeline:
...
text_generation_with_groq = TextGeneration(
llm=GroqLLM(model="llama3-70b-8192"),
)
...


Easily test your pipeline doing a `dry_run` (https://github.com/argilla-io/distilabel/pull/635)

python
with Pipeline(...) as pipeline:
...
distiset = pipeline.dry_run(
parameters=...,  The same argument as `Pipeline.run`
batch_size=1 Optional, will be set to 1 by default.
)


python
[05/13/24 16:22:30] INFO ['distilabel.pipeline.local'] 🌵 Dry run mode local.py:103
INFO ['distilabel.pipeline.local'] 📝 Pipeline data will be ... local.py:125


**`Pipeline.log` file is dumped to the Hugging Face repository ([568](https://github.com/argilla-io/distilabel/pull/568))**

Now on when you call `distiset.push_to_hub`, the `pipeline.log` file will be automatically dumped to your dataset repository with the `pipeline.yaml` to keep track of the execution.

New `distilabel_metadata` column to store internal data (https://github.com/argilla-io/distilabel/pull/586)

You can now optionally enable the addition of a metadata column. This column can store other things in the future, but for the moment can be really handy to keep the raw output from an LLM, and in case it does some post processing via `format_output` , keep the original output to avoid lossing anything.

You can include the metadata at the task level as:

python
TextGeneration(..., add_raw_output=True|False)


And directly determine whether you want this column in your final `Distiset`:

python
with Pipeline(...,enable_metadata=True|False):
...


This way we can decide to remove all the column altogether.

All the changes in this PR

* Allow nested connect calls and overload rshift method to connect steps by plaguss in https://github.com/argilla-io/distilabel/pull/490
* Fix `llm_blender` installation by alvarobartt in https://github.com/argilla-io/distilabel/pull/557
* Warn user about unknown runtime parameters by plaguss in https://github.com/argilla-io/distilabel/pull/555
* Add missing `model_name`, update docstrings, and add `*.jinja2` templates to `Task` subclasses by alvarobartt in https://github.com/argilla-io/distilabel/pull/560
* Split `ChatGeneration` from `TextGeneration` by alvarobartt in https://github.com/argilla-io/distilabel/pull/558
* Set `extra="forbid"` in `{_Step,LLM}.model_config` by alvarobartt in https://github.com/argilla-io/distilabel/pull/577
* Infer step name by plaguss in https://github.com/argilla-io/distilabel/pull/575
* Change the context of subprocesses depending on the platform by plaguss in https://github.com/argilla-io/distilabel/pull/578
* Dump logs within a file in .cache/distilabel/pipelines dir by plaguss in https://github.com/argilla-io/distilabel/pull/568
* Fix empty batches causing missaligment when branching by gabrielmbmb in https://github.com/argilla-io/distilabel/pull/590
* Add `GroqLLM` by alvarobartt in https://github.com/argilla-io/distilabel/pull/583
* Add `Format{Chat,Text}Generation{DPO,SFT}` by alvarobartt in https://github.com/argilla-io/distilabel/pull/584
* Fix `title` in `RatingQuestion` of `PreferenceToArgilla` by alvarobartt in https://github.com/argilla-io/distilabel/pull/597
* Set `streaming=False` and add `num_examples` to `LoadHubDataset` by plaguss in https://github.com/argilla-io/distilabel/pull/565
* Make `pipeline` argument of `Step` optional by plaguss in https://github.com/argilla-io/distilabel/pull/566
* Extend `LLM` kwargs to align with counterparts by alvarobartt in https://github.com/argilla-io/distilabel/pull/594
* Add `Genstruct` task by alvarobartt in https://github.com/argilla-io/distilabel/pull/600
* Fix `num_examples` to be optional in `LoadHubDataset` by plaguss in https://github.com/argilla-io/distilabel/pull/603
* Fix `list_files_in_dir` returning unsorted files by gabrielmbmb in https://github.com/argilla-io/distilabel/pull/609
* Add `PrometheusEval` task by alvarobartt in https://github.com/argilla-io/distilabel/pull/610
* Update `ValueError` on missing inputs message by alvarobartt in https://github.com/argilla-io/distilabel/pull/617
* Add `routing_batch_function` by gabrielmbmb in https://github.com/argilla-io/distilabel/pull/595
* Fix `pipeline.log` inconsistency & include LLM info in signature by plaguss in https://github.com/argilla-io/distilabel/pull/598
* Add custom `rubrics` attribute to `PrometheusEval` by alvarobartt in https://github.com/argilla-io/distilabel/pull/621
* Update `UltraFeedback` paper replication to use `routing_batch_function` by gabrielmbmb in https://github.com/argilla-io/distilabel/pull/620
* Add `distilabel_metadata` column to the datasets to include general data by plaguss in https://github.com/argilla-io/distilabel/pull/586
* Add the option of passing the multiprocessing context via env var by plaguss in https://github.com/argilla-io/distilabel/pull/604
* Add name of the pipeline to group the hashed folders by it by plaguss in https://github.com/argilla-io/distilabel/pull/626
* Add `routing_batch_function` serialization by gabrielmbmb in https://github.com/argilla-io/distilabel/pull/628
* Excluding model path in serialization of llamacpp by ignacioct in https://github.com/argilla-io/distilabel/pull/633
* Fix problem with sorting method in `list_files_in_dir` function by plaguss in https://github.com/argilla-io/distilabel/pull/622
* Add `dry_run` method to the pipelines to run with a single example. by plaguss in https://github.com/argilla-io/distilabel/pull/635
* [FEATURE] Add structured outputs using `outlines` by plaguss in https://github.com/argilla-io/distilabel/pull/601
* Force pipeline stop after 2 SIGINT signals caught by plaguss in https://github.com/argilla-io/distilabel/pull/630
* Refactor and update `docs` by alvarobartt in https://github.com/argilla-io/distilabel/pull/634
* Export components info & components gallery in docs by gabrielmbmb in https://github.com/argilla-io/distilabel/pull/640
* Documentation updates by plaguss in https://github.com/argilla-io/distilabel/pull/646
* Refactor docs 1.1.0 by plaguss in https://github.com/argilla-io/distilabel/pull/650
* Fix routing batch function deadlocks and unordered batches by gabrielmbmb in https://github.com/argilla-io/distilabel/pull/649


**Full Changelog**: https://github.com/argilla-io/distilabel/compare/1.0.3...1.1.0

1.0.3

What's Changed
* Add `stop` and `stop_sequences` in `LLM.generate` subclasses by alvarobartt in https://github.com/argilla-io/distilabel/pull/585


**Full Changelog**: https://github.com/argilla-io/distilabel/compare/1.0.2...1.0.3

1.0.2

What's Changed

* Fix `RuntimeParamater` validation when provided as `_Step` attr by alvarobartt in https://github.com/argilla-io/distilabel/pull/564
* Add `seed` with `random.randint` to ensure cache is not used by alvarobartt in https://github.com/argilla-io/distilabel/pull/571

**Full Changelog**: https://github.com/argilla-io/distilabel/compare/1.0.1...1.0.2

1.0.1

What's Changed
* Fix typo in readme and remove the ToArgilla step by dvsrepo in https://github.com/argilla-io/distilabel/pull/548
* Fix `model_validator` in `InferenceEndpoints` due to `Pipeline` pickling by alvarobartt in https://github.com/argilla-io/distilabel/pull/552


**Full Changelog**: https://github.com/argilla-io/distilabel/compare/1.0.0...1.0.1

1.0.0

What's Changed
* Add `Step` abstract class and new `Pipeline` by gabrielmbmb in https://github.com/argilla-io/distilabel/pull/338
* Add runtime parameters validation by gabrielmbmb in https://github.com/argilla-io/distilabel/pull/345
* Pipeline local execution by gabrielmbmb in https://github.com/argilla-io/distilabel/pull/346
* Add `Task` (minimal implementation) by alvarobartt in https://github.com/argilla-io/distilabel/pull/347
* Refactor `_BatchManager` to have list of batches per step by gabrielmbmb in https://github.com/argilla-io/distilabel/pull/353
* Refactor getting parameters from `Step.process` method by gabrielmbmb in https://github.com/argilla-io/distilabel/pull/355
* Add `LLM`, `OpenAILLM`, `TransformersLLM`, and `LlamaCppLLM` by alvarobartt in https://github.com/argilla-io/distilabel/pull/354
* Fix `Task` and `TextGeneration` by alvarobartt in https://github.com/argilla-io/distilabel/pull/356
* Add `combine_dicts` function and `CombineColumns` class by alvarobartt in https://github.com/argilla-io/distilabel/pull/358
* Add `PushToHub` step and fix `typing` by alvarobartt in https://github.com/argilla-io/distilabel/pull/357
* Add serialization for the new components by plaguss in https://github.com/argilla-io/distilabel/pull/349
* Fix `OpenAILLM.api_key` due to `SecretStr` and `StepInput` wrong imports by alvarobartt in https://github.com/argilla-io/distilabel/pull/359
* Add `GlobalStep`, fix `_BatchManager`, and add `logging` by alvarobartt in https://github.com/argilla-io/distilabel/pull/362
* Migrate vllm to the new API by plaguss in https://github.com/argilla-io/distilabel/pull/361
* Update `_BatchManager` to work with `GlobalStep`s and `input_batch_size` per step by gabrielmbmb in https://github.com/argilla-io/distilabel/pull/366
* Clean up outdated / unused files by alvarobartt in https://github.com/argilla-io/distilabel/pull/369
* Add `input_mappings` and `output_mappings` attributes by gabrielmbmb in https://github.com/argilla-io/distilabel/pull/367
* Move batching from `Task` to `LLM`, fix `vLLM.generate` and add `DISTILABEL_LOG_LEVEL` by alvarobartt in https://github.com/argilla-io/distilabel/pull/371
* Improve runtime parameter definition by gabrielmbmb in https://github.com/argilla-io/distilabel/pull/372
* Add `AsyncOpenAI` and update `OpenAILLM` accordingly by alvarobartt in https://github.com/argilla-io/distilabel/pull/381
* Update serde by gabrielmbmb in https://github.com/argilla-io/distilabel/pull/382
* Add `MistralLLM` and add `generation_kwargs` as `RuntimeParameters` by alvarobartt in https://github.com/argilla-io/distilabel/pull/383
* Move `steps` out of `pipeline` by gabrielmbmb in https://github.com/argilla-io/distilabel/pull/384
* Add tests and docstring for `Task` and subclasses by alvarobartt in https://github.com/argilla-io/distilabel/pull/385
* Add `step` decorator by gabrielmbmb in https://github.com/argilla-io/distilabel/pull/387
* Add `input` propagation through `Task.process` by alvarobartt in https://github.com/argilla-io/distilabel/pull/399
* Improve `Pipeline` error handling by gabrielmbmb in https://github.com/argilla-io/distilabel/pull/400
* Fix `combine_dicts` and `StepInput` import in `PushToHub` by alvarobartt in https://github.com/argilla-io/distilabel/pull/401
* Improve `GlobalStep` error handling by gabrielmbmb in https://github.com/argilla-io/distilabel/pull/402
* Changed " by italics in EvolInstruct tutorial where one "" was missing by ignacioct in https://github.com/argilla-io/distilabel/pull/398
* Add `get_last_hidden_states` method and update `TransformersLLM` by gabrielmbmb in https://github.com/argilla-io/distilabel/pull/414
* docs: correct small typos in tutorial by sdiazlor in https://github.com/argilla-io/distilabel/pull/419
* docs: readme positioning by davidberenstein1957 in https://github.com/argilla-io/distilabel/pull/386
* Add `num_generations` and `group_generations` parameters to `Task` by gabrielmbmb in https://github.com/argilla-io/distilabel/pull/416
* Add `Argilla` and `PromptCompletionToArgilla` by alvarobartt in https://github.com/argilla-io/distilabel/pull/420
* Add `EvolInstruct` and `EvolInstructGenerator` tasks by alvarobartt in https://github.com/argilla-io/distilabel/pull/407
* Wrap optional `LLM` dependencies under `load` by alvarobartt in https://github.com/argilla-io/distilabel/pull/428
* Add `ComplexityScorer` task by gabrielmbmb in https://github.com/argilla-io/distilabel/pull/421
* Implement caching mechanism for the pipelines by plaguss in https://github.com/argilla-io/distilabel/pull/370
* Add method to Pipeline to handle keyboard interruptions via ctrl+c by plaguss in https://github.com/argilla-io/distilabel/pull/406
* Add `GenerateEmbeddings` task by gabrielmbmb in https://github.com/argilla-io/distilabel/pull/427
* Add `api_key` within `LLM.load` and add `llm_kwargs` as `RuntimeParameter` by alvarobartt in https://github.com/argilla-io/distilabel/pull/432
* Add `GeneratorStep.process` validation in `DAG` and smaller fixes by alvarobartt in https://github.com/argilla-io/distilabel/pull/435
* Add `EvolComplexity` task by davidberenstein1957 in https://github.com/argilla-io/distilabel/pull/415
* Add `QualityScorer` Task by ignacioct in https://github.com/argilla-io/distilabel/pull/425
* Add `CudaDevicePlacementMixin` class by gabrielmbmb in https://github.com/argilla-io/distilabel/pull/436
* Return `distiset` from `Pipeline.run` by plaguss in https://github.com/argilla-io/distilabel/pull/417
* Update README.md by strickvl in https://github.com/argilla-io/distilabel/pull/451
* Add `InferenceEndpointsLLM` by alvarobartt in https://github.com/argilla-io/distilabel/pull/439
* Fix `Distiset` after `PushToHub` and smaller fixes by alvarobartt in https://github.com/argilla-io/distilabel/pull/452
* Fix `Step.process_applying_mappings` by alvarobartt in https://github.com/argilla-io/distilabel/pull/453
* Add `AnyscaleLLM` by davidberenstein1957 in https://github.com/argilla-io/distilabel/pull/447
* Add general function to obtain schema for parquet writer by plaguss in https://github.com/argilla-io/distilabel/pull/454
* Add `TogetherLLM` by davidberenstein1957 in https://github.com/argilla-io/distilabel/pull/449
* Fix `LLM` subclasses based on `OpenAILLM` by alvarobartt in https://github.com/argilla-io/distilabel/pull/455
* Improve batching and caching by gabrielmbmb in https://github.com/argilla-io/distilabel/pull/457
* Add `EvolQuality` task by davidberenstein1957 in https://github.com/argilla-io/distilabel/pull/429
* Add `VertexAILLM` by davidberenstein1957 in https://github.com/argilla-io/distilabel/pull/445
* Add `use_cache` to `BasePipeline` by plaguss in https://github.com/argilla-io/distilabel/pull/463
* Add `AnthropicLLM` by sdiazlor in https://github.com/argilla-io/distilabel/pull/444
* Add `multiprocess` dependency by gabrielmbmb in https://github.com/argilla-io/distilabel/pull/467
* Add `UltraFeedback` by alvarobartt in https://github.com/argilla-io/distilabel/pull/464
* Add `OllamaLLM` by davidberenstein1957 in https://github.com/argilla-io/distilabel/pull/405
* Add `RuntimeParametersMixin` and `LLM` runtime parameters by gabrielmbmb in https://github.com/argilla-io/distilabel/pull/466
* Add `LiteLLM` by davidberenstein1957 in https://github.com/argilla-io/distilabel/pull/441
* Add CLI by gabrielmbmb in https://github.com/argilla-io/distilabel/pull/471
* Set `_batch_manager` to `None` after run by gabrielmbmb in https://github.com/argilla-io/distilabel/pull/473
* Add create_distiset function by plaguss in https://github.com/argilla-io/distilabel/pull/480
* Add `overload` to `step` decorator by gabrielmbmb in https://github.com/argilla-io/distilabel/pull/474
* Move Enum to Dict[str, str] to avoid serialization errors during caching by plaguss in https://github.com/argilla-io/distilabel/pull/482
* Include a dataset card and the `pipeline.yaml` on `Distiset.push_to_hub` by plaguss in https://github.com/argilla-io/distilabel/pull/479
* Add `PairRM` task for ranking responses by plaguss in https://github.com/argilla-io/distilabel/pull/450
* Update `_WriteBuffer` to write several parquet files by gabrielmbmb in https://github.com/argilla-io/distilabel/pull/483
* Extend `Argilla` integration `TextGeneration`, `Preference`, and more by alvarobartt in https://github.com/argilla-io/distilabel/pull/472
* Add `DeitaFiltering` step by gabrielmbmb in https://github.com/argilla-io/distilabel/pull/481
* Add `InstructionBacktranslation` by alvarobartt in https://github.com/argilla-io/distilabel/pull/486
* Fix huggingface_hub TextGenerationError import by Wauplin in https://github.com/argilla-io/distilabel/pull/485
* Improve azure openai support by BramVanroy in https://github.com/argilla-io/distilabel/pull/461
* Add `SelfInstruct` task by ignacioct in https://github.com/argilla-io/distilabel/pull/456
* Use `QueueHandler` for `Pipeline` logging by gabrielmbmb in https://github.com/argilla-io/distilabel/pull/489
* Improve `_stop` and `logging` by gabrielmbmb in https://github.com/argilla-io/distilabel/pull/491
* Fix creating empty `Dataset` in `create_distiset` function by gabrielmbmb in https://github.com/argilla-io/distilabel/pull/492
* Add imports from `__init__` modules by gabrielmbmb in https://github.com/argilla-io/distilabel/pull/493
* `batch_size` and `input_batch_size` runtime parameters by gabrielmbmb in https://github.com/argilla-io/distilabel/pull/495
* Update serialization method of _BatchManager to write each step on its own file by plaguss in https://github.com/argilla-io/distilabel/pull/496
* Fix `asyncio` in `AsyncLLM` to use the running event loop if any by alvarobartt in https://github.com/argilla-io/distilabel/pull/501
* Added authentication header to allow private/gated dataset use by bjoernpl in https://github.com/argilla-io/distilabel/pull/498
* Fix generator yielding batches all at once if `batch_size` == `input_batch_size` by gabrielmbmb in https://github.com/argilla-io/distilabel/pull/510
* Run output queue loop in thread and improve stop by gabrielmbmb in https://github.com/argilla-io/distilabel/pull/511
* Update `docs` for `distilabel` v1.0 with `mkdocs-material` by plaguss in https://github.com/argilla-io/distilabel/pull/476
* Add `CohereLLM` by gabrielmbmb in https://github.com/argilla-io/distilabel/pull/508
* `distilabel` v1.0 by alvarobartt in https://github.com/argilla-io/distilabel/pull/352
* Remove draft comment by plaguss in https://github.com/argilla-io/distilabel/pull/515
* Fix `docs/sections/papers/*.md` and add example in `docs/index.md` by alvarobartt in https://github.com/argilla-io/distilabel/pull/516
* Small fixes for the docs (images and nav bar) by gabrielmbmb in https://github.com/argilla-io/distilabel/pull/519
* Fix CTRL + C when still loading steps by gabrielmbmb in https://github.com/argilla-io/distilabel/pull/521
* Empty input queues when `CTRL + C` by gabrielmbmb in https://github.com/argilla-io/distilabel/pull/528
* Add `filelock` and `flash-attn` to `vllm` extra by alvarobartt in https://github.com/argilla-io/distilabel/pull/529
* Fix error in README.md when pushing the custom dataset card by plaguss in https://github.com/argilla-io/distilabel/pull/530
* Fix pipeline stuck when empty batches by gabrielmbmb in https://github.com/argilla-io/distilabel/pull/531
* Add `EvolQuality` to `tasks.__init__.py` by davidberenstein1957 in https://github.com/argilla-io/distilabel/pull/525
* Show information about subprocess exception by gabrielmbmb in https://github.com/argilla-io/distilabel/pull/532
* Update `TextGeneration.format_input` method to allow OpenAI format by gabrielmbmb in https://github.com/argilla-io/distilabel/pull/533
* Improve create_distiset by plaguss in https://github.com/argilla-io/distilabel/pull/534
* Fixes regarding `RuntimeParameter`s and `pydantic` model attributes by gabrielmbmb in https://github.com/argilla-io/distilabel/pull/535
* Fix parsing `LLM` generation kwargs by gabrielmbmb in https://github.com/argilla-io/distilabel/pull/537
* pass on Distiset's kwargs to Dataset.push_to_hub() by rasdani in https://github.com/argilla-io/distilabel/pull/522
* Set `config="default"` in `Distiset` when only one leaf `Step` by alvarobartt in https://github.com/argilla-io/distilabel/pull/540
* docs: update documentation for huggingface inference endpoints. by burtenshaw in https://github.com/argilla-io/distilabel/pull/539
* Remove `flash-attn` from `vllm` extra by alvarobartt in https://github.com/argilla-io/distilabel/pull/542
* Docs fix argilla imports by burtenshaw in https://github.com/argilla-io/distilabel/pull/541
* Fix not all exceptions being able to be pickled by gabrielmbmb in https://github.com/argilla-io/distilabel/pull/543
* Update CLI example by gabrielmbmb in https://github.com/argilla-io/distilabel/pull/544
* Check that `Step.name` doesn't contain dots or spaces by gabrielmbmb in https://github.com/argilla-io/distilabel/pull/545

New Contributors
* strickvl made their first contribution in https://github.com/argilla-io/distilabel/pull/451
* Wauplin made their first contribution in https://github.com/argilla-io/distilabel/pull/485
* BramVanroy made their first contribution in https://github.com/argilla-io/distilabel/pull/461
* bjoernpl made their first contribution in https://github.com/argilla-io/distilabel/pull/498
* rasdani made their first contribution in https://github.com/argilla-io/distilabel/pull/522

**Full Changelog**: https://github.com/argilla-io/distilabel/compare/0.6.0...1.0.0

Page 1 of 3

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.