Major feature upgrade: SomaticSeq now supports the input of any arbitrary VCF files in addition to the callers we have explicitly incorporated, e.g., via `--arbitrary-snvs callerX_snv.vcf callerY_snv.vcf` and `--arbitrary-indels callerA_indel.vcf callerB_indel.vcf` options for the `somaticseq_parallel.py` command.
Must separate the SNVs and indels into separate VCF files before using them as input to SomaticSeq. If you have a VCF file that has combined SNV and indels, you may use this script included in our repo, i.e., `splitVcf.py -infile combined_variants.vcf -snv snvs.vcf -indel indels.vcf`. Input can be both `.vcf` or `.vcf.gz`. Output will be `.vcf`.
For the "arbitrary input VCF files," calls labeled as REJECT in the FILTER field will not be counted and will be assigned a value of _0_ in the `if_Caller_X` fields. Calls labeled as LowQual will be assigned a value of _0.5_. Calls without any filter label will be counted as a bona fide call for that particular VCF file and assigned a value of _1_, i.e., as though it is a PASS call. So modify your VCF files accordingly if needed.
seqc2_v1.2
* This is a special release for the Somatic Mutation Working Group of the SEQC2 Consortium to establish v1.2 of the somatic reference call set, i.e., [Fang, L.T., Zhu, B., Zhao, Y. _et al_. Establishing community reference samples, data and call sets for benchmarking cancer mutation detection using whole-genome sequencing. _Nat Biotechnol_ **39**, 1151-1160 (2021)](https://doi.org/10.1038/s41587-021-00993-6 "Fang LT, et al. Nat Biotechnol (2021)") / [PMID:34504347](http://identifiers.org/pubmed:34504347) / [SharedIt Link](https://rdcu.be/cxs3D) / [Youtube presentation](https://youtu.be/nn0BOAONRe8).
* This release is based on the older SomaticSeq v2.8.1. It contains many custom scripts specifically designed to complete SEQC2's somatic reference samples project.
* This release is **not** intended for general use.
* No code change, but updated NCBI's new FTP address in the README over the original [commit](https://github.com/bioinform/somaticseq/tree/2c642fc2849d7a9360bd4b1d166b90b3e63ec429). The FTP address for the SEQC2 Somatic Mutation Working Group can be found here [here](https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/seqc/Somatic_Mutation_WG), so navigate there if a file changed its original location.