What's Changed * Remove `.lock` download skipping, skip locks on force download by JackUrb in https://github.com/Lightning-AI/litdata/pull/519 * pre-release bump 0.2.44 by tchaton in https://github.com/Lightning-AI/litdata/pull/530
What's Changed * Fix: resume issues with resuming in combined streaming dataset in dataloader by bhimrazy in https://github.com/Lightning-AI/litdata/pull/507 * fix: s3 error by deependujha in https://github.com/Lightning-AI/litdata/pull/510 * Fix: unsigned s5cmd requests and also add option to disable s5cmd by bhimrazy in https://github.com/Lightning-AI/litdata/pull/513 * Turn on DEBUG logging based on DEBUG_LITDATA environment variable by ouj in https://github.com/Lightning-AI/litdata/pull/518 * Feat: Update indexing of parquet dataset and also add streaming support to huggingface datasets by bhimrazy in https://github.com/Lightning-AI/litdata/pull/505 * feat: correctly propagate storage_options by deependujha in https://github.com/Lightning-AI/litdata/pull/514 * fix: remove warnings for Streaming Dataset with hf dataset and shuffle enabled by bhimrazy in https://github.com/Lightning-AI/litdata/pull/520 * Revert '506 Add s5cmd' – as boto3 Outperforms s5cmd in Latest Benchmarks by bhimrazy in https://github.com/Lightning-AI/litdata/pull/521 * Upd/hf-dataset-get-format by bhimrazy in https://github.com/Lightning-AI/litdata/pull/522 * Update documentation on Streaming Parquet Datasets from Huggingface and other cloud providers by bhimrazy in https://github.com/Lightning-AI/litdata/pull/523 * Bump version to 0.2.43 by bhimrazy in https://github.com/Lightning-AI/litdata/pull/525 * fix package config by Borda in https://github.com/Lightning-AI/litdata/pull/526 * example: sine function model prediction with litdata & pytorch-lightning by deependujha in https://github.com/Lightning-AI/litdata/pull/517 * fixing package & releasing by Borda in https://github.com/Lightning-AI/litdata/pull/529
What's Changed * Add register function for downloader by ouj in https://github.com/Lightning-AI/litdata/pull/496 * Allow for more lenient state resume. by JackUrb in https://github.com/Lightning-AI/litdata/pull/497 * Slighy faster speed by tchaton in https://github.com/Lightning-AI/litdata/pull/503 * Add s5cmd by tchaton in https://github.com/Lightning-AI/litdata/pull/506 * Feat: add support for gcp by deependujha in https://github.com/Lightning-AI/litdata/pull/504 * Bump version 0.2.42 by tchaton in https://github.com/Lightning-AI/litdata/pull/508
New Contributors * ouj made their first contribution in https://github.com/Lightning-AI/litdata/pull/496
What's Changed * doc: improve dev doc by deependujha in https://github.com/Lightning-AI/litdata/pull/488 * Expose optimize dns by tchaton in https://github.com/Lightning-AI/litdata/pull/498 * Update `_get_folder_size`: Reduce Logs noise and switch to `os.scandir` by bhimrazy in https://github.com/Lightning-AI/litdata/pull/499 * Bump: version 0.2.41 by deependujha in https://github.com/Lightning-AI/litdata/pull/500
What's Changed * fix: `clean parquet dir cache` fixture by deependujha in https://github.com/Lightning-AI/litdata/pull/474 * Fix: Allow using `Machine` types in `map` by ethanwharris in https://github.com/Lightning-AI/litdata/pull/473 * 🛠️ Fix: Ensure `chunk_bytes` in `index.json` matches actual chunk file size by bhimrazy in https://github.com/Lightning-AI/litdata/pull/478 * fix: _get_folder_size fn by deependujha in https://github.com/Lightning-AI/litdata/pull/471 * Added boolean serialiser called by litdata.optimise() by DominiquePaul in https://github.com/Lightning-AI/litdata/pull/481 * Doc: improve dev doc & add ToDos by deependujha in https://github.com/Lightning-AI/litdata/pull/479 * upd: Add hf file download progress and update local file path by bhimrazy in https://github.com/Lightning-AI/litdata/pull/484 * fix: segmentation fault error in streaming tokens by bhimrazy in https://github.com/Lightning-AI/litdata/pull/485 * Warn user if `max_cache_size` is less than 25GB in StreamingDataset by bhimrazy in https://github.com/Lightning-AI/litdata/pull/489 * fix: Properly assign the chunks to the right worker by tchaton in https://github.com/Lightning-AI/litdata/pull/449 * Bump version to 0.2.40 by bhimrazy in https://github.com/Lightning-AI/litdata/pull/491
New Contributors * ethanwharris made their first contribution in https://github.com/Lightning-AI/litdata/pull/473 * DominiquePaul made their first contribution in https://github.com/Lightning-AI/litdata/pull/481
What's Changed * Feat: add support for HuggingFace datasets by deependujha in https://github.com/Lightning-AI/litdata/pull/462 * Using count-locks for multi-node-single-cache support by JackUrb in https://github.com/Lightning-AI/litdata/pull/468 * Bump version to 0.2.39 by tchaton in https://github.com/Lightning-AI/litdata/pull/470