Genvarloader

Latest version: v0.10.5

Safety actively analyzes 723158 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 9 of 9

0.1.12

0.1.11

0.1.10

Feat

- improve speed for short seq lengths, large batch sizes. fix: bug in reverse complementing.
- update README.
- bump version.
- build 0.1.1
- draft support for PGEN split by contig.
- bump version to 0.1.0
- tests for indel support.
- option to disable jittering of haplotypes that are longer than query regions.
- implement stranded regions.
- add option to jitter bed regions.
- passing strand info to read(), 1 for forward and -1 for reverse.
- initial (buggy) implementation drafting spliced, multiregion `read` functions.
- split non-overlapping ROIs into separate partitions.
- organize kwargs docstring.
- make SyncBuffer work. Several errors in construct haplotypes with indels.
- enable make concurrent reads again, but without single buffer allocation per process. This remains WIP. Comm overhead might make this a bad idea anyway.
- allocate a buffer once and only once for each call to iter(). Pass a sliced view of this buffer to readers to fill up. Not implemented for multi-process work.
- bump version.
- infer contig prefix for TileDB-VCF. Also deprecate TileDB-VCF for now given issues creating TileDB-VCF datasets (segfaults) and no implementation for indels from TileDB-VCF (yet).
- bump version for bugfix.
- add license info to pyproject.toml

Fix

- not all batch dims need be in every loader, handle this case.
- work-in-progress on wrong output lengths from fasta_variants.
- buffer length axis slicer was wrong length, should be computed by total length of *merged* regions.
- buffer length axis slicer was wrong length, should be computed by total length of regions.
- splice utils.
- attribute error.
- comment why versions are 0.0.0.
- dynamic versioning config.
- relax virtual data alignment constraints from exact join to inner join. feat: use dynamic versioning.
- make splicing util funcs generalize to n-dim arrays with length as the final axis.
- improve perf by not having batches as xr.Datasets.
- partitioned the wrong bed, jittered bed at wrong point in loop.
- forgot to cache jit function.
- batch_idx generation.
- computing relative starts for slicing buffers.
- computing max_ends.
- dim idx iteration.
- make strand column optional.
- forgot to pass contig to end_to_var_idx in Pgen read_for_hap.
- init Fasta.rev_strand_fn when alphabet is str. fix: using sample subsets with Pgen.
- uppercase alphabet when passed as a string.
- pass all tests! allow ploid kwarg in pgen reader, fix bugs with variant searching and max_end and end_idx calculation.
- pass all tests! allow ploid kwarg in pgen reader, fix bugs with variant searching and max_end and end idx calculation.
- consts for tracking buffer_idx column meanings. TODO reverse complement (or just reverse) data when slicing it from the buffer. Reverse complementing while constructing buffers requires partitions to be broken up more since only regions on the same strand can be merged.
- add pre-commit to dev dependencies.
- no variants in any query regions.
- ignore false positive dask.empty typing error.
- forgot to jit construct_haplotypes_with_indels().
- forgot comma.
- handle overlapping variants (i.e. genotype == ALT at same position in same sample) by only applying the first encountered.
- move ref_idx for deletions. feat: expose seed arg for fastavariants for determinism.
- checking that sample subset is all in pgen file.
- VLenAlleles slicing with None start. feat: cache pvar as arrow file.
- slicing VLenAlleles with start=None

0.1.9

Feat

- do not pre-compute diffs since it is only used once and this reduces memory usage.
- initial implementation of generalized haplotype construction for re-aligning tracks.
- switch buffer slicing alg depending on batch size.
- change type annotation to Mapping for covariance.
- better docstring.
- add method to get PyTorch dataset from GVL class.

Fix

- better docstring.
- return region index with return_index. feat: specify order of arrays in return_tuples.
- increment buffer idx_slice by amount actually copied from buffer instead of batch_size, which is sometimes too large.
- readers with no non-length dimensions. feat: allow Fasta to be in-memory.

0.0.2

Feat

- update README
- prepare to publish on pypi.
- reorganize loader code and add `set` method to update parameters that dont require re-initializing Ray Actors. fix: clean up BigWig docstring, plan for deprecation in favor of a method to convert to RLE table for big performance boost.
- reorganize code, minor changes.
- tested that ray-based loader runs.
- initial concurrent implementation with Ray.
- concurrent.futures based async buffering. Unfortunately, benchmarking shows this is slower than a single-threaded implementation.
- optimize buffer slicing. fix: setting uniform length.
- minor updates, increase fudge factor for memory usage.
- make libraries for different variant formats optional.
- reorganize, move loader into separate file.
- add pgen reader.
- optional lazy loading for RLE table.
- RLE table reader, corresponding to the BED5+ format.
- better docstring on Zarr reader.
- initial Zarr reader.
- more docstrings. fix: dtype conversion in bigwig.
- view_virtual_data to preview dimensions from combined readers and test that they are compatible. feat: weighted upsampling for entries that appear in batch dimensions.
- include GVL in __all__ imports. fix: Reader docstring.
- comments on how to implement async reads.
- return batch dim indices.

Fix

- change shuffle of GVL.partitioned_bed to respect deprecation of random.shuffle's second argument. feat: make GVL.readers a dict.
- add license, description, repo link.
- prep for poetry to pypi.
- poetry build issues.
- wrong dtype for Fasta without padding. feat: optimize construct haplotypes with indels, parallel helps. fix: pgen position from 1-based to 0-based. fix: indexing bugs from converting code for 1 contig to multiple contigs in pgen.read().
- dtype of variant sizes.
- make construct_haplotypes_with_indels jittable.
- relax ray version constraint by not using subscript ObjectRef type.
- passing sample subsets correctly and tracking buffer use.
- update bnfo-environment.yml.
- wrong col names.
- export view_virtual_data in __all__. fix: switch bigwig to shared memmap and joblib, ray docs on shared memmory were less clear.
- partial batches.
- make GVL a proper iterator.
- accessing and padding for out of bounds regions.
- batch_dim issues, return_index issues, drop_last issues.

Page 9 of 9

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.