- Fixed an issue in `.by_genotype()` in Cohort module which might generate incorrect cohort by genotype for
multi-allelic sites from _All of Us_ variant data or custom Hail matrix table.
For example, for a site that has 3 alleles \["A", "G", "C"] (reference allele, alt allele 1, alt allele 2, as displayed in Hail),
if user specifies "A" as `ref_allele`, "C" as `alt_allele`, and "0/1" as `case_gt`:
- Before this fix, given above user inputs, participants with A-C genotype would not be assigned as cases,
since the allele index of "C" is still "2", and therefore A-C genotype would be encoded as "0/2", even after multi-allelic split.
- After this fix, given above user inputs, participants with A-C genotype will be correctly assigned as cases,
since the allele index for "C" would be properly updated to "1" after multi-allelic split.
- This issue affects users who used method `.by_genotype()` to generate cohort:
- from _All of Us_ data, having ALL the criteria below:
- the genomic position was a multi-allelic site,
- the alternative allele of interest was NOT the first alternative allele ("G" in the above example).
- from custom unsplit matrix table or improperly split matrix table as input, having the same above criteria.
- Going forward, there is nothing changed in how user would use `.by_genotype()`,
i.e., "0" represents reference allele, and "1" represents alternative allele of interest.
- Users should uninstall any previous version, reinstall PheTK, and make sure current version is v0.1.43.
It is recommended for ___affected___ users to rerun `.by_genotype()` step and potentially subsequent steps.
***