Servicex

Latest version: v2.8.0

Safety actively analyzes 635692 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 9 of 14

2.4

Lots of things were fixed after trying to run against the 70 TB of data for CMS Run1.

* Supports Python 3.9
* Will now report supported return types (parquet, root)
* Long filenames are hashed in the local cache to avoid OS limitations
* A request tittle can be passed to services to "name" the transform
* Support lists of URLS or a single URL for a file source as well as the more traditional dataset identifiers
* Support deleting a single datafile or a query status file
* `api_endpoints` now have names not just types
* A local file can be written that matches query hashes with request id's, and can safely be checked into a repo in order to quickly re-use other people's queries.
* Better status updates during running and downloading, and support py widgets in vscode.

2.4b4

* Make sure `title` is ignored when calculating hash - it makes no difference in the way the data is calculated

2.4b3

* Add an "english" readable property to the servicex dataset that contains the name. This can be quite long, depending on the dataset.

2.4b2

Streaming/fix logic we introduced to parse through user's config files and integrate default values

2.4b1

This is the first beta of 2.4. While we believe it is feature complete, there is still some wider testing that needs to happen. The goal of this release is to support the full re-analysis of the CMS Run 1 Higgs.

New Features:

* You can specify a single `http://` or `root://` file as input for a single file dataset.
* You can specify a list of `http://` and/or `root://` files. They will be processed by ServiceX as long as it has permission to access the data.
* A title can be given to each transform
* Add the ability to query a dataset for what will be the data types back. This enables automatic data type discovery (required to keep the interface sensible in `coffea` and other upstream libraries).
* Python 3.9 now supported
* Add support for the cms run 1 aod backend `type`.

* Caching
* Analysis Cache - one can create/check in a `json` file that will map queries to backend `request-id`'s. This means that others can re-run and just download the data, rather than having to re-transform the data for the same queries.
* A user can delete a data file from the local cache and it will automatically be re-downloaded
* If a query status cache file is removed, it will be automatically re-fetched


* Configuration:
* End points now can have names rather than just types, supporting more than one backend of a single type (e.g. two `uproot` backends)

Bug Fixes:

* If the backend has _lost_ the data, automatically resubmit the query. This was broken when streaming URL's or files.
* Transforms that are marked `Fatal` are now correctly cleared from the local cache, so they can be re-run
* When a transform with lots of files fails, the error report will be truncated to the result from 20 different files, rather than... all 3000.
* When a notebook is run under visual studio code, the progress bars are correctly shown (for processing and download).
* `StreamInfoUrl` is now exported
* Protect against filenames that are so long that the OS can't handle them. In particular, fix the current implementation so it has a more robust hashing mechanism for the modified filename.

In Progress:

* Added logging information to support debugging the local machine downloading. We aren't saturating good connections and it isn't clear why that is happening yet.

2.4a4

Trying to track down import errors, cleaning up how we include other items

Page 9 of 14

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.