- Make it possible to activate indel calling without recompilation. Mode discover-sequence-variants now accepts
the boolean argument --call-indels true/false.
- Preliminary support for calling indels with discover-sequence-variants. Candidate indels are now written
in the formats that use GenotypeOutputFormat (e.g., genotypes, compare_groups, allele_frequency).
The method of Krawitz et al is used to determine the equivalent indel region for each possible candidate.
After possible realignment, and filtering to remove possible errors, EIR are reported with their frequencies.
Please be advised that the VCF spec(s) are rather vague and as a result often interpreted differently by different
programmers. This is especially true of the parts of the specification(s) that describe how to report indels. As a
result of this situation, you might run into problems when trying to loading indel containing VCF files generated
with Goby into other tools.
- vcf-subset: Add ability to exclude positions at which all samples match the reference.
- Add a replacement for the VCF-tools VCF-subset program. The Goby tool is orders of magnitude faster.
- Improve vcf-compare mode. Now has the ability to provide a random samples of the positions that differ between the
files being compared. Random samples are calculated for each kind of difference (missing from one file, missing
one allele, two alleles, different genotypes)
- vcf-compare now outputs Ti/Tv ratios for each sample in input file (in the output file only).
- Fix scalability problem with local realignment code. Local realignment around indels would slow down as more entries
were processed. This is now fixed so that speed is constant across large alignments.
- Fixed index file writing. In some conditions, part of the alignment past the 2GB mark were not accessible
with skipTo when reading files larger than 2GB. Use the upgrade mode to fix old alignments at a specific time, or
use Goby as usual to have alignments upgraded on the fly.
- Add mechanism to upgrade/fix large alignments indices with Goby 1.9.8.2. The upgrade mechanism uses concatenate
alignment to rewrite an alignment index file if the size of the entries file exceeds 2GB. This is rather slow as
the process reads and writes large alignments, to produce the new index file. While slow, upgrading is still faster
than aligning the reads again. The process also requires approximately double the alignment size as the new alignment
files are written. Alignments smaller than 2GB are quietly ignored since they were not affected by the bug.
- Codecs: Add support to decode alignments with a codec in AlignmentReader.
- Improved ReadsReader to find a suitable decoder when several codecs exist.
- Prevents local realignment from running out of memory when processing positions where clonal reads create huge peaks.
- Make filterIndels remove from sample count info object, not just form list of bases.
- Fix VCF genotypes that could look like 0/0/1/1 to be 0/1 (seen with indels only).
- only write allele base count in VCF BC field when the count is not zero (useful with indels).