The first release of our fork, which is integrated with [pgsc_calc](https://github.com/PGScatalog/pgsc_calc).
Our main aim for the fork is to improve functionality when processing data at scale, e.g. on 500,000 genomes at UK Biobank, and perform QC to make sure that the variants are identical (and oriented correctly) between the reference dataset and the study population you are projecting.
Improvements:
* Variant QC: added checks and minor fixes for variant matching, orientation, and sort order of ref/study variants to ensure results are consistent between the reference and study datasets
* refactor original scripts into python package
* added end to end test with pytest
* support batch-processing study samples without splitting the original dataset into multiple file (useful to parallelise large datasets)
Fixes:
* make output PCs have consistent precision
* deduplicate outputs when projecting study samples after the PCA space has been derived