Datalad-next

Latest version: v1.5.0

Safety actively analyzes 687767 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 4

1.5.0

💫 New features

- new support subpackage for Git's pathspecs [[8d05f1cd]](https://github.com/datalad/datalad-next/commit/8d05f1cd)
- add pathspec support to `iter_(gitworktree|submodules)()` [[7622267c]](https://github.com/datalad/datalad-next/commit/7622267c)
- `GitWorktreeFileSystemItem`:
- method to convert `GitWorktreeItem` [[30858780]](https://github.com/datalad/datalad-next/commit/30858780)
- `iter_gitworktree`:
- added submodule recursion with pathspec support [[2851ec4c]](https://github.com/datalad/datalad-next/commit/2851ec4c)
- add ability to report untracked content only [[2763b10f]](https://github.com/datalad/datalad-next/commit/2763b10f)
- `iter_submodules`:
- add `match_containing` mode [[8025b7bd]](https://github.com/datalad/datalad-next/commit/8025b7bd)

🐛 Bug Fixes

- `iter_gitstatus`:
- support newly added submodules [[fd197ad6]](https://github.com/datalad/datalad-next/commit/fd197ad6)
- `iter_gitstatus`:
- rectify usage of unsupported `untracked=no` mode in tests [[673e6c22]](https://github.com/datalad/datalad-next/commit/673e6c22)
- Patches for DataLad (core):
- fix `create_sibling_ria` patching [[78e29b5d]](https://github.com/datalad/datalad-next/commit/78e29b5d)

📝 Documentation

- Contributing guide:
- expand description of expected commit messages [[04f68b49]](https://github.com/datalad/datalad-next/commit/04f68b49)

1.4.1

🐛 Bug Fixes

- dependencies: limit test patch import to test runs [[905b99bd]](https://github.com/datalad/datalad-next/commit/905b99bd)

📝 Documentation

- add note of Git >= v2.31 requirement for next-status [[093575d8]](https://github.com/datalad/datalad-next/commit/093575d8)
- state conventional-commits requirement [[a9180fc0]](https://github.com/datalad/datalad-next/commit/a9180fc0)

🛡 Tests

- fixture: add missing import (for non-WebDAV fallback) [[ddd66799]](https://github.com/datalad/datalad-next/commit/ddd66799)
- markers: extend list of implicit datalad-core markers [[59abbcfb]](https://github.com/datalad/datalad-next/commit/59abbcfb)
- markers: add missing datalad-core marker "integration" [[e4e60b99]](https://github.com/datalad/datalad-next/commit/e4e60b99)

1.4.0

🐛 Bug Fixes

- RIA over SSH access from Mac clients to Linux server was broken
due to an inappropriate platform check that assumed that local and
remote platform are identical.
Fixes https://github.com/datalad/datalad/issues/7536 via
https://github.com/datalad/datalad-next/pull/653 (by mih)

- `next-status` has received a number of fixes:

- It no longer issues undesirable modification reports
that are based on `mtime` changes alone (i.e., no content change).
Fixes https://github.com/datalad/datalad-next/issues/639 via
https://github.com/datalad/datalad-next/pull/650 (by mih)
- It now detects staged changes in repositories with no
commit.
Fixes https://github.com/datalad/datalad-next/issues/680 via
https://github.com/datalad/datalad-next/pull/681 (by mih)
- `next-status -r mono` now reports on new commits in submodules.
Previously this was ignored, leading to the impression of
clean datasets despite unsaved changes.
Fixes https://github.com/datalad/datalad-next/issues/645 via
https://github.com/datalad/datalad-next/pull/679 (by mih)

- `iter_annexworktree()` can now also be used on plain Git repos,
and would behave exactly as if reporting on non-annexed files
in a git-annex repo. Previously, a cryptic `iterable did not yield
matching item for route-in item, cardinality mismatch?` error was
issued in this case.
Fixes https://github.com/datalad/datalad-next/issues/670 via
https://github.com/datalad/datalad-next/pull/673 (by mih)

💫 Enhancements and new features

- `datalad_next.shell` provides a context manager for (long-running)
shell or interpreter subprocesses. Within the context any number of
commands can be executed in such a shell, and each command can
process input (iterables), and yield output (iterables). This feature
is suitable for running and controlling "remote shells" like a login
shell on a server via SSH. A range of utilities is provided to
employ this functionality for special purpose implementations
(e.g., accept fixed-length or variable-length process output).
A suite of operations like download/upload file to a remote shell is
provided for POSIX-compliant shells `datalad_next.shell.operations.posix`.
https://github.com/datalad/datalad-next/pull/596 (by christian-monch)

- A rewrite of `SSHRemoteIO`, the RIA SSH-operations implementation from
datalad-core is provided as a patch. It is based on the new `shell`
feature, and provides more robust operations. It's IO performance is
at the same level as `scp`-based down/uploads. In contrast to the
original implementation, it support fine-grained progress reporting
for uploads and downloads.
Via https://github.com/datalad/datalad-next/pull/655 (by mih)

- The `SpecialRemote` base class in datalad-core is patched to support
a standard `close()` method for implementing resource release and cleanup
operations. The main special remote entry point has been altered to
run implementations within a `closing()` context manager to guarantee
execution of such handlers.
Via https://github.com/datalad/datalad-next/pull/655 (by mih)

- A new `has_initialized_annex()` helper function is provided to
test for a locally initialized annex in a repo.
Via https://github.com/datalad/datalad-next/pull/673 (by mih)

- `iter_annexworktree()` can now also be used on plain Git repositories,
and it yields the same output and behavior as running on a git-annex
repository with no annex'ed content (just tracked with Git).
Fixes https://github.com/datalad/datalad-next/issues/670 via
https://github.com/datalad/datalad-next/pull/673 (by mih)

- `next-status` and `iter_gitstatus()` have been improved to
report on further modifications after a file addition has been
originally staged.
Fixes https://github.com/datalad/datalad-next/issues/637 via
https://github.com/datalad/datalad-next/pull/679 (by mih)

- `next-status` result rendering has been updated to be more markedly
different than git-status's. Coloring is now exclusively
determined by the nature of a change, rather than being partially
similar to git-status's index-updated annotation. This reduces
the chance for misinterpretations, and does not create an undesirable
focus on the Git index (which is largely ignored by DataLad).
Fixes https://github.com/datalad/datalad-next/issues/640 via
https://github.com/datalad/datalad-next/pull/679 (by mih)

- A large 3k-line patch set replaces almost the entire RIA implementation,
including the ORA special remote, and the `create-sibling-ria` command.
The new implementation brings uniform support for Windows clients, progress
reporting for uploads and downloads via SSH, and a faster and more
robust behavior for SSH-based operations (based on the new remote
shell feature).
Fixes https://github.com/datalad/datalad-next/issues/654 via
https://github.com/datalad/datalad-next/pull/669 (by christian-monch)

📝 Documentation

- Git-related subprocess execution helpers are now accessible in the
rendered documentation, and all supported file collections are now
mentioned in the `ls-file-collection` command help.
Fixes https://github.com/datalad/datalad-next/issues/668 via
https://github.com/datalad/datalad-next/pull/671 (by mih)

🛡 Tests

- Test setup has been improved to support a uniform, datalad-next
enabled environment for subprocesses too. This extends the scope
of testing to special remote implementations and other code that
is executed in subprocesses, and relies on runtime patches.
See https://github.com/datalad/datalad-next/pull/i665 (by mih)

1.3.0

💫 Enhancements and new features

- Code organization is adjusted to clearly indicate what is part of the
package's public Python API. Anything that can be imported directly from
the top-level of any sub-package is part of the public API.
As an example: `from datalad_next.runners import iter_git_subproc`
imports a part of the public API, but
`from datalad_next.runners.git import iter_git_subproc` does not.
See `README.md` for more information.
Fixes https://github.com/datalad/datalad-next/issues/613 via
https://github.com/datalad/datalad-next/pull/615 (by mih)
https://github.com/datalad/datalad-next/pull/617 (by mih)
https://github.com/datalad/datalad-next/pull/618 (by mih)
https://github.com/datalad/datalad-next/pull/619 (by mih)
https://github.com/datalad/datalad-next/pull/620 (by mih)
https://github.com/datalad/datalad-next/pull/621 (by mih)
https://github.com/datalad/datalad-next/pull/622 (by mih)
https://github.com/datalad/datalad-next/pull/623 (by mih)

- New `patched_env` context manager for patching a process'
environment. This avoids the for importing `unittest` outside
test implementations.
Via https://github.com/datalad/datalad-next/pull/633 (by mih)

- `call_git...()` functions received a new `force_c_locale`
parameter. This can be set whenever Git output needs to be parsed
to force running the command with `LC_ALL=C`. Such an environment
manipulation is off by default and not done unconditionally to
let localized messaging through in a user's normal locale.

🐛 Bug Fixes

- `datalad-annex::` Git remote helper now tests for a repository
deposit, and distinguishes an absent remote repository deposit
vs cloning from an empty repository deposit. This rectifies
confusing behavior (successful clones of empty repositories
from broken URLs), but also fixes handling of subdataset clone
candidate handling in `get` (which failed to skip inaccessible
`datalad-annex::` URLs for the same reason).
Fixes https://github.com/datalad/datalad-next/issues/636 via
https://github.com/datalad/datalad-next/pull/638 (by mih)

📝 Documentation

- API docs have been updated to include all top-level symbols
of any sub-package, or in other words: the public API.
See https://github.com/datalad/datalad-next/pull/627 (by mih)

🏠 Internal

- The `tree` command no longer uses the `subdatasets` command
for queries, but employs the recently introduced `iter_submodules()`
for leaner operations.
See https://github.com/datalad/datalad-next/pull/628 (by mih)

- `call_git...()` functions are established as the only used abstraction
to interface with Git and git-annex commands outside the use in
DataLad's `Repo` classes. Any usage of DataLad's traditional
`Runner` functionality is discontinued.
Fixes https://github.com/datalad/datalad-next/issues/541 via
https://github.com/datalad/datalad-next/pull/632 (by mih)

- Type annotations have been added to the implementation of the
`uncurl` git-annex remote. A number of unhandled conditions have
been discovered and were rectified.

1.2.0

🐛 Bug Fixes

- Fix an invalid escape sequence in a regex that caused a syntax warning.
Fixes https://github.com/datalad/datalad-next/issues/602 via
https://github.com/datalad/datalad-next/pull/603 (by mih)

💫 Enhancements and new features

- Speed up of status reports for repositories with many submodules.
An early presence check for submodules skips unnecessary evaluation
steps. Fixes https://github.com/datalad/datalad-next/issues/606 via
https://github.com/datalad/datalad-next/pull/607 (by mih)

🏠 Internal

- Fix implementation error in `ParamDictator` class that caused a test
failure. The class itself is unused and has been scheduled for removal.
See https://github.com/datalad/datalad-next/issues/611 and
https://github.com/datalad/datalad-next/pull/610 (by christian-monch)

🛡 Tests

- Promote a previously internal fixture to provide a standard
`modified_dataset` fixture. This fixture is sessions-scope, and
yields a dataset with many facets of modification, suitable for
testing change reporting. The fixture verifies that no
modifications have been applied to the testbed. (by mih)

- `iterable_subprocess` tests have been robustified to better handle the
observed diversity of execution environments. This addresseses, for example,
https://bugs.debian.org/1061739.
https://github.com/datalad/datalad-next/pull/614 (by christian-monch)

1.1.0

💫 Enhancements and new features

- A new paradigm for subprocess execution is introduced. The main
workhorse is `datalad_next.runners.iter_subproc`. This is a
context manager that feeds input to subprocesses via iterables,
and also exposes their output as an iterable. The implementation
is based on https://github.com/uktrade/iterable-subprocess, and
a copy of it is now included in the sources. It has been modified
to work homogeneously on the Windows platform too.
This new implementation is leaner and more performant. Benchmarks
suggest that the execution of multi-step pipe connections of Git
and git-annex commands is within 5% of the runtime of their direct
shell-execution equivalent (outside Python).
See https://github.com/datalad/datalad-next/pull/538 (by mih),
https://github.com/datalad/datalad-next/pull/547 (by mih).

With this change a number of additional features have been added,
and internal improvements have been made. For example, any
use of `ThreadedRunner` has been discontinued. See
https://github.com/datalad/datalad-next/pull/539 (by christian-monch),
https://github.com/datalad/datalad-next/pull/545 (by christian-monch),
https://github.com/datalad/datalad-next/pull/550 (by christian-monch),
https://github.com/datalad/datalad-next/pull/573 (by christian-monch)

- A new `itertools` module was added. It provides implementations
of iterators that can be used in conjunction with `iter_subproc`
for standard tasks. This includes the itemization of output
(e.g., line-by-line) across chunks of bytes read from a process
(`itemize`), output decoding (`decode_bytes`), JSON-loading
(`json_load`), and helpers to construct more complex data flows
(`route_out`, `route_in`).

- The `more_itertools` package has been added as a new dependency.
It is used for `datalad-next` iterator implementations, but is also
ideal for client code that employed this new functionality.

- A new `iter_annexworktree()` provides the analog of `iter_gitworktree()`
for git-annex repositories.

- `iter_gitworktree()` has been reimplemented around `iter_subproc`. The
performance is substantially improved.

- `iter_gitworktree()` now also provides file pointers to
symlinked content. Fixes https://github.com/datalad/datalad-next/issues/553
via https://github.com/datalad/datalad-next/pull/555 (by mih)

- `iter_gitworktree()` and `iter_annexworktree()` now support single
directory (i.e., non-recursive) reporting too.
See https://github.com/datalad/datalad-next/pull/552

- A new `iter_gittree()` that wraps `git ls-tree` for iterating over
the content of a Git tree-ish.
https://github.com/datalad/datalad-next/pull/580 (by mih).

- A new `iter_gitdiff()` wraps `git diff-tree|files` and provides a flexible
basis for iteration over changesets.

- `PathBasedItem`, a dataclass that is the bases for many item types yielded
by iterators now more strictly separates `name` property from path semantics.
The name is a plain string, and an additional, explicit `path` property
provides it in the form of a `Path`. This simplifies code (the
`_ZipFileDirPath` utility class became obsolete and was removed), and
improve performance.
Fixes https://github.com/datalad/datalad-next/issues/554 and
https://github.com/datalad/datalad-next/issues/581 via
https://github.com/datalad/datalad-next/pull/583 (by mih)

- A collection of helpers for running Git command has been added at
`datalad_next.runners.git`. Direct uses of datalad-core runners,
or `subprocess.run()` for this purpose have been replaced with call
to these utilities.
https://github.com/datalad/datalad-next/pull/585 (by mih)

- The performance of `iter_gitworktree()` has been improved by about
10%. Fixes https://github.com/datalad/datalad-next/issues/540
via https://github.com/datalad/datalad-next/pull/544 (by mih).

- New `EnsureHashAlgorithm` constraint to automatically expose
and verify algorithm labels from `hashlib.algorithms_guaranteed`
Fixes https://github.com/datalad/datalad-next/issues/346 via
https://github.com/datalad/datalad-next/pull/492 (by mslw adswa)

- The `archivist` remote now supports archive type detection
from `*E`-type annex keys for `.tgz` archives too.
Fixes https://github.com/datalad/datalad-next/issues/517 via
https://github.com/datalad/datalad-next/pull/518 (by mih)

- `iter_zip()` uses a dedicated, internal `PurePath` variant to report on
directories (`_ZipFileDirPath`). This enables more straightforward
`item.name in zip_archive` tests, which require a trailing `/` for
directory-type archive members.
https://github.com/datalad/datalad-next/pull/430 (by christian-monch)

- A new `ZipArchiveOperations` class added support for ZIP files, and enables
their use together with the `archivist` git-annex special remote.
https://github.com/datalad/datalad-next/pull/578 (by christian-monch)

- `datalad ls-file-collection` has learned additional collections types:

- The new `zipfile` collection type that enables uniform reporting on
the additional archive type.

- The new `annexworktree` collection that enhances the `gitworktree`
collection by also reporting on annexed content, using the new
`iter_annexworktree()` implementation. It is about 15% faster than a
`datalad --annex basic --untracked no -e no -t eval`.

- The new `gittree` collection for listing any Git tree-ish.

- A new `iter_gitstatus()` can replace the functionality of
`GitRepo.diffstatus()` with a substantially faster implementation.
It also provides a novel `mono` recursion mode that completely
hides the notion of submodules and presents deeply nested
hierarchies of datasets as a single "monorepo".
https://github.com/datalad/datalad-next/pull/592 (by mih)

- A new `next-status` command provides a substantially faster
alternative to the datalad-core `status` command. It is closely
aligned to `git status` semantics, only reports changes (not repository
listings), and supports type change detection. Moreover, it exposes
the "monorepo" recursion mode, and single-directory reporting options
of `iter_gitstatus()`. It is the first command to use `dataclass`
instances as result types, rather than the traditional dictionaries.
Git v2.31 or later is required.

- `SshUrlOperations` now supports non-standard SSH ports, non-default
user names, and custom identity file specifications.
Fixed https://github.com/datalad/datalad-next/issues/571 via
https://github.com/datalad/datalad-next/pull/570 (by mih)

- A new `EnsureRemoteName` constraint improves the parameter validation
of `create-sibling-webdav`. Moreover, the command has been uplifted
to support uniform parameter validation also for the Python API.
Missing required remotes, or naming conflicts are now detected and
reported immediately before the actual command implementation runs.
Fixes https://github.com/datalad/datalad-next/issues/193 via
https://github.com/datalad/datalad-next/pull/577 (by mih)

- `datalad_next.repo_utils` provide a collection of implementations
for common operations on Git repositories. Unlike the datalad-core
`Repo` classes, these implementations do no require a specific
data structure or object type beyond a `Path`.

🐛 Bug Fixes

- Add patch to fix `update`'s target detection for adjusted mode datasets
that can crash under some circumstances.
See https://github.com/datalad/datalad/issues/7507, fixed via
https://github.com/datalad/datalad-next/pull/509 (by mih)

- Comparison with `is` and a literal was replaced with a proper construct.
While having no functional impact, it removes an ugly `SyntaxWarning`.
Fixed https://github.com/datalad/datalad-next/issues/526 via
https://github.com/datalad/datalad-next/pull/527 (by mih)

📝 Documentation

- The API documentation has been substantially extended. More already
documented API components are now actually renderer, and more documentation
has been written.

🏠 Internal

- Type annotations have been extended. The development workflows now inform
about type annotation issues for each proposed change.

- Constants have been migrated to `datalad_next.consts`.
https://github.com/datalad/datalad-next/pull/575 (by mih)

🛡 Tests

- A new test verifies compatibility with HTTP serves that do not report
download progress.
https://github.com/datalad/datalad-next/pull/369 (by christian-monch)

- The overall noise-level in the test battery output has been reduced
substantially. INFO log messages are no longer shown, and command result
rendering is largely suppressed. New test fixtures make it easier
to maintain tidier output: `reduce_logging`, `no_result_rendering`.
The contribution guide has been adjusted encourage their use.

- Tests that require an unprivileged system account to run are now skipped
when executed as root. This fixes an issue of the Debian package.
https://github.com/datalad/datalad-next/pull/593 (by adswa)

Page 1 of 4

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.