Glow.py

Latest version: v2.0.0

Safety actively analyzes 623518 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 3

2.0.0

What's Changed

Update supported Spark versions to 3.4 and 3.5
Remove Hail integration
Remove features that frequently cause incompatibilities between versions (aggregate_by_index, CSV pipe transformer). Workarounds will be documented in a follow-up PR.
Update Scala and Python dependencies to match recent DBR releases
Fix unit tests that are failing in master
Clean up test logs
In addition, various changes were made to prepare for the upcoming release of Spark 4.0.

Add build support for Scala 2.13 and make necessary changes to compile Glow
Fix known incompatibilities with 4.0 using the current Spark master branch and 4.0.0-SNAPSHOT maven artifacts
Set up scaffolding for running tests against Spark 4.0. The tests are currently disabled since there's no release artifact, but I've verified that they pass against current master.
Provide necessary configuration to run tests on JDK 17 (and simplify the test configuration along the way)
Add a separate conda development environment file for Python dependencies that will be updated.
Along with this, Scala and Python functions have been added for joining two DataFrames on an interval overlap condition. The two language APIs have the same functionality, but there's a bit more convenience functionality in the Python API.

Since [Databricks' range join optimization](https://docs.databricks.com/en/optimizations/range-join.html) doesn't support left joins for interval overlaps, we separate SNPs (intervals with length 1) and longer intervals from the left side and join them separately. This approach can have major performance benefits when there are many SNPs, as is common in genetics workloads.

On a dataset with 1B left rows and 1M right rows and varying percentages of SNPs in the left table (tested with 1 4 core executor due to quota):

Inner range join + left join, all SNP percentages: 4h
Glow join, 0% SNPs: 4h
Glow join, 50% SNPs: 2h9m
Glow join, 90% SNPs: 0h42m

Full commit list

* fix broken liftOver notebook link in docs by williambrandler in https://github.com/projectglow/glow/pull/504
* Optimize CircleCI Config by dvcastillo in https://github.com/projectglow/glow/pull/505
* Release of Glow v1.2.1 by williambrandler in https://github.com/projectglow/glow/pull/509
* move databricks docs to glow docs by williambrandler in https://github.com/projectglow/glow/pull/510
* using spark to orchestrate parallel processing of samples of regions of the genome by williambrandler in https://github.com/projectglow/glow/pull/511
* Update GloWGR documentation to fully reflect GloWGR API changes by kianfar77 in https://github.com/projectglow/glow/pull/365
* Fix Infinity/NaN parsing to allow full set of values from VCF specification by dtzeng in https://github.com/projectglow/glow/pull/519
* Minimal attempt to support Spark 3.3 by adding shims for CSVOptions methods by srowen in https://github.com/projectglow/glow/pull/524
* Documentation fixes for DBFS API by a0x8o in https://github.com/projectglow/glow/pull/516
* Halt Hail tests by a0x8o in https://github.com/projectglow/glow/pull/518
* Support whitespaces for variant datasources by williambrandler in https://github.com/projectglow/glow/pull/475
* Update documentation for GFF schema for Alias field by a-li in https://github.com/projectglow/glow/pull/522
* Update to Spark 3.4/3.5 by henrydavidge in https://github.com/projectglow/glow/pull/546
* Spark 4.0 preparation by henrydavidge in https://github.com/projectglow/glow/pull/547
* Update docs to reflect Glow 2.0 changes by henrydavidge in https://github.com/projectglow/glow/pull/548
* Update vulnerable dependencies; enable scala steward app; set up Python dependabot by henrydavidge in https://github.com/projectglow/glow/pull/549
* Update scala-logging to 3.9.5 by scala-steward-projectglow in https://github.com/projectglow/glow/pull/555
* Update hadoop-client to 3.3.6 by scala-steward-projectglow in https://github.com/projectglow/glow/pull/559
* Update scala-library to 2.12.18 by scala-steward-projectglow in https://github.com/projectglow/glow/pull/562
* Update sbt-header to 5.10.0 by scala-steward-projectglow in https://github.com/projectglow/glow/pull/557
* Update snakeyaml to 2.2 by scala-steward-projectglow in https://github.com/projectglow/glow/pull/571
* Update sbt-sonatype to 3.10.0 by scala-steward-projectglow in https://github.com/projectglow/glow/pull/570
* Report scala dependency graph by henrydavidge in https://github.com/projectglow/glow/pull/574
* Add explicit linkcheck timeout; ignore flaky 1kg link; fix setup.py syntax error by henrydavidge in https://github.com/projectglow/glow/pull/573
* Update hadoop-bam to 7.10.0 by scala-steward-projectglow in https://github.com/projectglow/glow/pull/567
* Update sbt-pgp to 2.2.1 by scala-steward-projectglow in https://github.com/projectglow/glow/pull/554
* Ignore .vscode directory by henrydavidge in https://github.com/projectglow/glow/pull/575
* Don't report staged release dependencies by henrydavidge in https://github.com/projectglow/glow/pull/576
* Make ignored modules lower case by henrydavidge in https://github.com/projectglow/glow/pull/577
* Update scalatest to 3.2.18 by scala-steward-projectglow in https://github.com/projectglow/glow/pull/580
* Rename things from master to main by henrydavidge in https://github.com/projectglow/glow/pull/581
* Left overlap join function by henrydavidge in https://github.com/projectglow/glow/pull/578
* Update univocity-parsers to 2.9.1 by scala-steward-projectglow in https://github.com/projectglow/glow/pull/586
* Update sbt-scalafmt to 2.5.2 by scala-steward-projectglow in https://github.com/projectglow/glow/pull/591
* Update sqlite-jdbc to 3.42.0.1 by scala-steward-projectglow in https://github.com/projectglow/glow/pull/595
* Update scalafmt-core to 2.7.5 by scala-steward-projectglow in https://github.com/projectglow/glow/pull/592
* Update jdbi to 2.78 by scala-steward-projectglow in https://github.com/projectglow/glow/pull/588
* Update sbt-assembly to 0.15.0 by scala-steward-projectglow in https://github.com/projectglow/glow/pull/582
* Update sqlite-jdbc to 3.45.1.0 by scala-steward-projectglow in https://github.com/projectglow/glow/pull/601
* Update sbt-assembly to 2.1.5 by scala-steward-projectglow in https://github.com/projectglow/glow/pull/598
* Update slf4j-api to 2.0.12 by scala-steward-projectglow in https://github.com/projectglow/glow/pull/594
* Update sbt to 1.9.9 by scala-steward-projectglow in https://github.com/projectglow/glow/pull/602
* Add script for simple building and installation by henrydavidge in https://github.com/projectglow/glow/pull/597
* Put function registration in a service provider by henrydavidge in https://github.com/projectglow/glow/pull/596
* Add example Dockerfile and documentation by henrydavidge in https://github.com/projectglow/glow/pull/604
* [WIP] Migrate tests to github actions by henrydavidge in https://github.com/projectglow/glow/pull/605
* [SPARK4] Finish migration to github actions by henrydavidge in https://github.com/projectglow/glow/pull/606
* Add status badge for github tests by henrydavidge in https://github.com/projectglow/glow/pull/607
* Update spark-catalyst, spark-core, ... to 3.5.1 by scala-steward-projectglow in https://github.com/projectglow/glow/pull/599
* Upload test coverage for python and scala by henrydavidge in https://github.com/projectglow/glow/pull/608
* [SPARK4] Update readme by henrydavidge in https://github.com/projectglow/glow/pull/610
* Update mockito-all to 1.10.19 by scala-steward-projectglow in https://github.com/projectglow/glow/pull/589
* Github action to create release tag and update version by henrydavidge in https://github.com/projectglow/glow/pull/611
* Update cut release action by henrydavidge in https://github.com/projectglow/glow/pull/612
* Fix release workflow by henrydavidge in https://github.com/projectglow/glow/pull/613
* make a pr to update version during release by henrydavidge in https://github.com/projectglow/glow/pull/614
* Update Python version file in cut release job by henrydavidge in https://github.com/projectglow/glow/pull/616
* Don't use sbt for cutting a release by henrydavidge in https://github.com/projectglow/glow/pull/618
* [SPARK4] Upload runnable artifacts as part of CI by henrydavidge in https://github.com/projectglow/glow/pull/620
* Workflow to push a tag to staging repositories by henrydavidge in https://github.com/projectglow/glow/pull/621
* Update readthedocs config by henrydavidge in https://github.com/projectglow/glow/pull/622
* Tell readthedocs to use conda by henrydavidge in https://github.com/projectglow/glow/pull/623
* Try avoiding perfect separation in logistic regression tests by henrydavidge in https://github.com/projectglow/glow/pull/624
* Only enable scoverage during tests by henrydavidge in https://github.com/projectglow/glow/pull/626
* Add documentation for maintaining private patches and rebasing on OSS Glow by henrydavidge in https://github.com/projectglow/glow/pull/625
* Suppress exception when closing stdin writer in pipe transformer by henrydavidge in https://github.com/projectglow/glow/pull/627
* Import GPG and sonatype credentials in staging release job by henrydavidge in https://github.com/projectglow/glow/pull/629

New Contributors
* dvcastillo made their first contribution in https://github.com/projectglow/glow/pull/505
* dtzeng made their first contribution in https://github.com/projectglow/glow/pull/519
* srowen made their first contribution in https://github.com/projectglow/glow/pull/524
* a-li made their first contribution in https://github.com/projectglow/glow/pull/522
* scala-steward-projectglow made their first contribution in https://github.com/projectglow/glow/pull/555

**Full Changelog**: https://github.com/projectglow/glow/compare/v1.2.1...v2.0.0

1.2.1

This release includes Java/Scala artifacts in [Maven Central](https://search.maven.org/search?q=g:io.projectglow) , and Python artifacts in [pypi](https://pypi.org/project/glow.py/). Docker containers `projectglow/open-source-glow:1.2.1`, `projectglow/databricks-glow:1.2.1`, `projectglow/databricks-glow:10.4` and `projectglow/databricks-hail:0.2.93` can be found in projectglow's [dockerhub](https://hub.docker.com/u/projectglow). The Glow notebook continous integration test now uses Databricks Runtime 10.4, which is on Spark 3.2.1 ([workflow definition json](https://github.com/projectglow/glow/blob/master/docs/dev/multitask-integration-test-config.json))

Glow leverages private catalyst APIs that have changed from Spark 3.1 to Spark 3.2. We wrote a Shim to maintain backwards compatibility. However, Spark 2 is end of life (EoL). Databricks, AWS EMR and Google Dataproc now depend on Hadoop 3.x, which is incompatible with Spark 2. So we are removing support for Spark 2, including the Spark 2 continuous integration tests (ci/cd) performed with circleci. _Glow version 1.1.2 is the last release that supports Spark 2_

The Spark 3 ci/cd tests depend on Hail, and these were failing since Hail does not yet support Spark 3.2, they are waiting on Google's Dataproc and AWS EMR to upgrade from Spark 3.1. So for now we expect the Spark 3 circleci tests to continue failing until we can resolve the hail tests. However, we moved forward with the new release as it is unclear when Dataproc or EMR will support Spark 3.2

Thanks to Alex Barreto, Jasser Abidi, Cameron Smith, Marcus Henry, Karen Feng, Joseph Bradley, and William Brandler for their contributions to this release

New Contributors
* cameronraysmith made their first contribution in https://github.com/projectglow/glow/pull/483
* JassAbidi and jkbradley made their first contributions in https://github.com/projectglow/glow/pull/501

**Full Changelog**: https://github.com/projectglow/glow/compare/v1.1.2...v1.2.1

1.1.2

Glow incorporates new functionality for quarantining records with the Glow pipe transformer in v1.1.2.

This release includes Java/Scala artifacts in Maven Central, and Python artifacts in PyPi and Conda Forge.

New Contributors
* dmoore247 made their first contribution in https://github.com/projectglow/glow/pull/408
* mah-databricks made their first contribution in https://github.com/projectglow/glow/pull/418

**Full Changelog**: https://github.com/projectglow/glow/compare/v1.1.1...v1.1.2

1.1.1

Glow incorporates new functionality for sample masking in GWAS v1.1.1, which has been documented as a quickstart guide. Nightly notebook tests are now dockerized, making it easier to integrate Glow with other bioinformatics libraries. VEP schema changes fixes a bug with indel parsing

This release includes Java/Scala artifacts in Maven Central, and Python artifacts in PyPi and Conda Forge.

What's Changed
* Dockerize ci tests by williambrandler in https://github.com/projectglow/glow/pull/414
* Releasev110 by williambrandler in https://github.com/projectglow/glow/pull/411
* adding codecov.yml by williambrandler in https://github.com/projectglow/glow/pull/413
* remove init script from nb test by williambrandler in https://github.com/projectglow/glow/pull/415
* Fix VEP parsing failures stemming from indels by bboutkov in https://github.com/projectglow/glow/pull/402
* Extending sample masking functionality in gwas linear regression by bcajes in https://github.com/projectglow/glow/pull/416
* fix bedtools path by williambrandler in https://github.com/projectglow/glow/pull/417
* add vep example by williambrandler in https://github.com/projectglow/glow/pull/382
* Docker containers for Glow runtime environment on Databricks by a0x8o in https://github.com/projectglow/glow/pull/420
* remove extraneous detail from quickstart docs by williambrandler in https://github.com/projectglow/glow/pull/428
* add data simulation doc page by williambrandler in https://github.com/projectglow/glow/pull/427
* fix pandas lmm notebook link by williambrandler in https://github.com/projectglow/glow/pull/430

New Contributors
* a0x8o made their first contribution in https://github.com/projectglow/glow/pull/420

Credits

Alex Barreto, Boris Boutkov, Brian Cajes, Karen Feng, William Brandler, dim de grave


**Full Changelog**: https://github.com/projectglow/glow/compare/v1.1.0...v1.1.1

1.1.0

Glow also now runs automated nightly testing of notebooks in the docs, making it easier for users to contribute code or algorithms to help others make use of Glow

This release includes Java/Scala artifacts in Maven Central, and Python artifacts in PyPi and Conda Forge.

Notable changes:
- Upgrade Spark dependency from 3.0.0 to 3.1.2 396
- Create integration test script 373
- Hail related enhancements 377
- Remove typecheck for numpy arrays 366

Minor changes include:
- Migrate from Bintray to Sonatype 367
- Test changed notebooks in branches 380

Credits: Brian Cajes, Karen Feng, William Brandler, dim de grave

1.0.1

This release includes Java/Scala artifacts in Maven Central, and Python artifacts in PyPi and Conda Forge.

Page 1 of 3

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.