Cellarr

Latest version: v0.5.2

Safety actively analyzes 693883 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 2

0.5.1

- Support csc matrices in layers, although not common ran into a situation where the
anndata object was stored from R.

0.5.0

- Construct cellarr TileDB files on HPC environments based on slurm
(reference: [61](https://github.com/BiocPy/cellarr/pull/61))

0.4.0

- chore: Remove Python 3.8 (EOL).
- precommit: Replace docformatter with ruff's formatter.

0.3.2

- Functionality to iterate over samples and cells.
- Explicitly mention that slicing defaults to TileB's behavior, inclusive of upper bounds.

0.3.0

This version introduces major improvements to matrix handling, storage, and performance, including support for multiple matrices in H5AD/AnnData workflows and optimizations for ingestion and querying.

**Support for multiple matrices**:
- Both `build_cellarrdataset` and `CellArrDataset` now support multiple matrices. During ingestion, a TileDB group called `"assays"` is created to store all matrices, along with group-level metadata.

This may introduce breaking changes with the default parameters based on how these classes are used. Previously to build the TileDB files:

python
dataset = build_cellarrdataset(
output_path=tempdir,
files=[adata1, adata2],
matrix_options=MatrixOptions(matrix_name="counts", dtype=np.int16),
num_threads=2,
)


Now you may provide a list of matrix options for each layers in the files.

python
dataset = build_cellarrdataset(
output_path=tempdir,
files=[adata1, adata2],
matrix_options=[
MatrixOptions(matrix_name="counts", dtype=np.int16),
MatrixOptions(matrix_name="log-norm", dtype=np.float32),
],
num_threads=2,
)


Querying follows a similar structure:
python
cd = CellArrDataset(
dataset_path=tempdir,
assay_tiledb_group="assays",
assay_uri=["counts", "log-norm"]
)

`assay_uri` is relative to `assay_tiledb_group`. For backwards compatibility, `assay_tiledb_group` can be an empty string.

**Parallelized ingestion**:
The build process now uses `num_threads` to ingest matrices concurrently. Two new columns in the sample metadata, `cellarr_sample_start_index` and `cellarr_sample_end_index`, track sample offsets, improving matrix processing.
- Note: The process pool uses the `spawn` method on UNIX systems, which may affect usage on windows machines.

**TileDB query condition fixes**:
Fixed a few issues with fill values represented as bytes (seems to be common when ascii is used as the column type) and in general filtering operations on TileDB Dataframes.

**Index remapping**:
Improved remapping of indices from sliced TileDB arrays for both dense and sparse matrices. This is not a user facing function but an internal slicing operation.

**Get a sample**:
Added a method to access all cells for a particular sample. you can either provide an index or a sample id.

python
sample_1_slice = cd.get_cells_for_sample(0)


Other updates to documentation, tutorials, the README, and additional tests.

0.2.4

- Provide options to extract an expected set of cell metadata columns across datasets.
- Update documentation and tests.

Page 1 of 2

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.