Summary of changes:
1. Updates to better handle targeted data
- Filter reads on rq (>=0.99), if rq is present in input bam
- Add a `--targeted` option for targeted data to drop the assumption of uniform coverage across the genome
- Add two optional parameters for targeted data
- `--min-read-variant`: Partially controls the number of supporting reads for a variant for identifying variants used for phasing. The cutoff for variant-supporting reads is determined by min(this number, max(5, depth\*0.11)). Default is 20. At standard WGS depth, the default value is overwritten by max(5, depth*0.11).
- Use cases: 1) Set this number low for low-coverage data or to increase sensitivity. 2) For targeted data with high coverage, set this number relatively high to avoid picking up sequencing errors and to reduce run time.
- `--min-read-haplotype`: Minimum number of unique supporting reads for a haplotype. Default is 4. For targeted data with high coverage, this cutoff can be increased to reduce errors and to reduce run time.
2. Updates to target regions:
- Update coordinates of some target regions to include full genes whenever possible: `pms2,ikbkg,hba,DDT,MBD3L2,DEFA1,PRY,CHRNA7,DHX40,GOLGA8A,IQCK,NXF2,OTOA,PDPK1,POTEI,RGPD1,RGPD3,RSPH10B,SIK1,TMLHE,CBS,KCNE1,CASTOR2,NBPF4,RGPD5,GOLGA8N,POTEB,ANKRD20A1,NSF`
- Add TNXB as a region on its own so that the full gene can be genotyped (the RCCX region only includes part of TNXB)
3. Algorithmic changes
- Improve fusion calling in cases of homozygous deletion
- Add some homozygous sites to cover target regions evenly during phasing to improve read assignment to haplotypes and variant calling
- Update a few gene-specific callers
- `hba`: Add calling of 4.2 deletion/duplication
- `smn1`: If homozygous throughout region, default to CN =2 instead of 1; Drop carrier call if only one SMN1 haplotype is found but the total CN of SERF1A/B (neighboring locus) is larger than the total CN of SMN1/2
- `ikbkg`: Improve calling of the 11.7kb deletion; Update the config to genotype the entire gene
- `ncf1`: Drop carrier call if only one NCF1 haplotype is found but the total CN of GTF2I (neighboring locus) is larger than the total CN of NCF1 family
- `rccx`: Better handle homozygous deletion cases
- `pms2`: Update the config to genotype the entire gene
4. Other changes:
- Support cram as input
- Standardize haplotype naming across regions: `{gene name}_{haplotype name}`