Genvarloader

Latest version: v0.10.5

Safety actively analyzes 723152 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 5 of 9

0.5.0

Feat

- bump version
- multiprocess reading of genotypes, both VCF and PGEN. fix: bug in reading genotypes from PGEN

0.4.1

Fix

- bump version
- got number of regions from wrong array in get_reference

0.4.0

Feat

- deprecate old loader, worse performance. reorganize code.

Fix

- better documentation in README. feat!: rename write_transformed_tracks to write_transformed_track. feat: more ergonomic indexing.

0.3.3

Fix

- bump version
- wrong max_ends from SparseGenotypes.from_dense_with_length due to data races/incorrect parallel semantics for numba
- diffs need to be clipped and negated when computing shifts

Perf

- pad haplotypes on-the-fly to avoid extra copying of reference subsequences

0.3.2

Feat

- can convert Records back to a polars DataFrame with minimal copying via conversion of VLenAlleles to pyarrow buffers
- make open_with_settings the standard open function. fix: recognize .bgz extension for fasta files

Fix

- remove dynamic versioning table
- move cli to __main__ feat: generalize Variants to automatically identify whether vcf or pgen is passed
- move cli to script in python source directory, maturin limitation?
- wrong implementation of heuristic for extending genotypes.

Perf

- faster sparsifying genotypes. feat: log level for cli. fix: clip missing lengths for appropriate end extension.

0.3.1

Feat

- benchmark interval decompression on cpu with numba vs. cpu with taichi vs. gpu with taichi
- optionally decompress intervals to tracks on gpu
- initial support for stranded regions
- option to cache fasta files as numpy arrays.
- implement BigWig intervals as Rust extension.
- finishing touches on multi-track implementation. Block is cryptic issue where writing genotypes is somehow preventing joblib from launching new processes.
- stop overwriting by default, add option.
- transforms directly on tracks. feat: intervals as array of structs for better data locality.
- let extra tracks get added via paths
- let extra tracks get added via paths
- initial support for indels in tracks and WIP on also returning auxiliary genome wide tracks.
- initial sparse genos -> haplotypes and sparse hap diffs.
- wip sparse genotypes.
- properties for getting haplotypes, references, or tracks only.
- properties for getting haplotypes, references, or tracks only.
- encourage num_workers <= 1 with GVL dataloader.
- freeze gvl.Dataset to prevent user from accidentally introducing invalid states. feat: warn if any query contigs have either no variatns or intervals associated with them.
- warn instead of error when no reference passed and genos present.
- disable overwriting by default, have no args be help.
- also report number of samples.
- add .from_table constructor for BigWigs.
- move CLI to script, include in package.
- use a table to specify bigwigs instead. fix: jittering.
- add script to write datasets to disk.
- more quality of life improvements. relax dependency version constraints.
- with_seed method
- quality of life methods for subsetting and converting to dataloaders.
- torch convenience functions fix: ensure genotypes and intervals written in sorted order wrt the BED file.
- pre-computed implementation.

Fix

- dependency typo
- remove taichi interval to track implementation since it did not improve performance, even on GPU
- need to subset arrays to be reverse complemented
- change argument order of subset_to to match the rest of the API. fix: simplify subset implementation.
- remove python 3.10 type hints
- dimension order on subsets.
- make variant indices absolute on write.
- sparse genotypes layout
- sparse genotypes layout
- wrong layout out genotypes and wrong max ends computation.
- ragged array layouts for correct concatenation when writing datasets one contig at a time.
- bug where init_intervals would not initialize all available tracks.
- track_to_intervals had wrong n_intervals and thus, wrong offsets.
- track_to_intervals had wrong n_intervals and thus, wrong offsets.
- bug in computing max ends.
- match serde for genome tracks.
- bug in open state management.
- bug when writing genotypes where the chromosome of the requested regions is not present in the VCF.
- bug getting intersection of samples available.
- bug getting intersection of samples available.
- sum wrong axis in adjust multi index.
- make GVLDataset __getitem__ API match torch Dataset API (i.e. use raveled index)
- QOL improvements.
- incorrect genotypes returned from VCF when queries have overlapping ranges.
- wrong shape.
- wrong shape.

Refactor

- move construct virtual data to loader so utils import faster.
- move construct virtual data to loader so utils import faster.
- rename util to utils.
- rename util to utils.
- move write under dataset directory. perf?: move indexing operations into numba.
- move cli to script outside package, faster help message.
- break up dataset implementation into smaller files. refactor!: condense with_ methods into single with_settings() methods. feat: sel() and isel() methods for eager retrieval by sample and region.

Perf

- when opening witih settings and providing a reference, but return_sequences is false, don't load the reference into memory.

Page 5 of 9

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.