Feat
- update README
- prepare to publish on pypi.
- reorganize loader code and add `set` method to update parameters that dont require re-initializing Ray Actors. fix: clean up BigWig docstring, plan for deprecation in favor of a method to convert to RLE table for big performance boost.
- reorganize code, minor changes.
- tested that ray-based loader runs.
- initial concurrent implementation with Ray.
- concurrent.futures based async buffering. Unfortunately, benchmarking shows this is slower than a single-threaded implementation.
- optimize buffer slicing. fix: setting uniform length.
- minor updates, increase fudge factor for memory usage.
- make libraries for different variant formats optional.
- reorganize, move loader into separate file.
- add pgen reader.
- optional lazy loading for RLE table.
- RLE table reader, corresponding to the BED5+ format.
- better docstring on Zarr reader.
- initial Zarr reader.
- more docstrings. fix: dtype conversion in bigwig.
- view_virtual_data to preview dimensions from combined readers and test that they are compatible. feat: weighted upsampling for entries that appear in batch dimensions.
- include GVL in __all__ imports. fix: Reader docstring.
- comments on how to implement async reads.
- return batch dim indices.
Fix
- change shuffle of GVL.partitioned_bed to respect deprecation of random.shuffle's second argument. feat: make GVL.readers a dict.
- add license, description, repo link.
- prep for poetry to pypi.
- poetry build issues.
- wrong dtype for Fasta without padding. feat: optimize construct haplotypes with indels, parallel helps. fix: pgen position from 1-based to 0-based. fix: indexing bugs from converting code for 1 contig to multiple contigs in pgen.read().
- dtype of variant sizes.
- make construct_haplotypes_with_indels jittable.
- relax ray version constraint by not using subscript ObjectRef type.
- passing sample subsets correctly and tracking buffer use.
- update bnfo-environment.yml.
- wrong col names.
- export view_virtual_data in __all__. fix: switch bigwig to shared memmap and joblib, ray docs on shared memmory were less clear.
- partial batches.
- make GVL a proper iterator.
- accessing and padding for out of bounds regions.
- batch_dim issues, return_index issues, drop_last issues.