New Upload API
We supercharged our upload API, so it's much easier to do more with your data at upload time.
Specify remote paths, tags, and other metadata for each individual uploading file
python
Define each individual uploading file, with their own paths and tags
animal_files = [
LocalFile(DATA_DIRECTORY / "cat_1.jpg", path="cats", tags=["shorthair"]),
LocalFile(DATA_DIRECTORY / "cat_2.jpg", path="cats", tags=["ragdoll"]),
LocalFile(DATA_DIRECTORY / "cat_3.jpg", path="cats", tags=["persian"]),
LocalFile(DATA_DIRECTORY / "cat_4.jpg", path="cats", tags=["sphynx"]),
LocalFile(DATA_DIRECTORY / "cat_5.jpg", path="cats", tags=["unknown"]),
LocalFile(DATA_DIRECTORY / "dog_1.jpg", path="dogs", tags=["labrador"]),
LocalFile(DATA_DIRECTORY / "dog_2.jpg", path="dogs", tags=["unknown"]),
LocalFile(DATA_DIRECTORY / "dog_3.jpg", path="dogs", tags=["german-shepherd"]),
LocalFile(DATA_DIRECTORY / "dog_4.jpg", path="dogs", tags=["beagle"]),
LocalFile(DATA_DIRECTORY / "dog_5.jpg", path="dogs", tags=["bulldog"])
]
Use your RemoteDataset object to push those files
dataset.push(animal_files)
Upload entire folders by keeping the tree structure intact on Darwin
python
dataset.push("/path/to/folder", preserve_folders=True)
The same can be done via the CLI!
bash
darwin dataset push team/dataset /path/to/folder --preserve-folders
Only upload files if they're not already represented in the Dataset
At upload time, you will be warned if some of your files have been previously uploaded in the specified path.
bash
$ ~/Downloads darwin dataset push team/dataset /path/to/images
Total progress ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 23 of 23
Skipped 2 files already in the dataset.
Sort files when listing them
We made it easy to sort files by `inserted_at`, `updated_at`, `file_size`, `filename` and `priority`. Of course, you can specify if you'd like to sort in ascending or descending order, all in the same argument.
python
dataset.fetch_remote_files(sort="priority:desc")
The same can be done via the CLI!
bash
darwin dataset files team/dataset --sort-by priority:desc
Note that the default sorting argument is set to `updated_at:desc`.
Miscellaneous
- We now use [Rich](https://github.com/willmcgugan/rich) to render every message, progress bar, error or table in the CLI
- Specify callbacks to be called at upload time to manage progress
- Get clear error messages if the uploaded files fail at any stage of the upload process