Vcfvalidator

Latest version: v1.0.3

Safety actively analyzes 693883 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 2 of 3

0.7

The validator can now check fields specific of the gVCF extension. This includes <*> alternate alleles and how they relate to the END INFO field and sample genotypes.

Following some user reports (101, 102) of incorrect counts being expected for FORMAT fields with Number=G, we confirmed with the specification that their cardinality depends on the ploidy of each sample genotype and not on the ALT column. The issue should be solved now, but if you find any problems please open a new ticket!

This version also introduces some usability improvements. The biggest is a summary report in addition to the existing text and database outputs. This is human-readable and lists each type of error detected, the number of times it occurred, and the first line where it was observed.

The `--version` option now reports which version of the validator are you running. Please note that in vcf-validator 0.4 or previous this option was used to note which version of the *specification* the input file should match.

And finally, the validator now warns the user if the input is compressed, instead of reporting a confusing list of errors.

You can download the Linux binaries using the links, and also visit [this page](https://github.com/EBIvariation/vcf-validator/milestone/6?closed=1) if you are interested in the full list of changes.

0.6

It has been a really productive summer thanks to Anishka0107, the Google Summer of Code student who has improved the support for structural variants in the validator and the debugulator 😃

She has added new metadata validations to ensure that INFO and FORMAT fields match the header definition, **and** that said header matches the VCF specification itself. These validations apply not only to short variants but also to structural variation tags, which hadn't been fully supported until now!

She also expanded the checks (added to last version) that guarantee no duplicate values in the ID and FORMAT columns in a single line, to also include the FILTER and INFO columns. The debugulator can now automatically fix these duplicates, as well as the values assigned to some INFO tags (see https://github.com/EBIvariation/vcf-validator/pull/78 for more details).

The last phase of GSoC was more focused on the purely technical aspects of the project: cleaning up the code, improving the documentation and slightly simplifying the grammar that detects syntax errors.

Please download the Linux binaries using the links below, and visit [this page](https://github.com/EBIvariation/vcf-validator/milestone/3?closed=1) if you are interested in the full list of changes.

0.5

This version simplifies the integration of the validation tool in automated pipelines, detecting the version of the VCF file before running the validation. This also prevents errors from being raised due to involuntary mismatches between the command line argument and the file.

New checks have been also included, to guarantee that no duplicate values are present in the ID and FORMAT columns in a single line. These checks are only applicable to version 4.3 of the specification!

The binaries can be downloaded using the links below.

0.4.3

The VCF specification allows not to list the GT field in the FORMAT column, but if present it must the first field. This release solves an issue that was making the validator raise a misleading error if GT was not present.

0.4.2

This maintenance release solves a couple of issues reported for version 0.4.1:
- Only a single value was considered valid as CIGAR field in the INFO column, when it should be a list as long as the number of alternate alleles. Thanks sambrightman for your pull request!
- Errors due to the lack of newline characters and the end of the file were not properly reported.

0.4.1

This maintenance release solves memory issues reported for version 0.4.

New dependencies were added to make possible to detect more complex errors, but the amount of memory consumed grew indefinitely. This has been solved and memory usage now remains constant at less than 10 MB of RAM.

The new executables, compatible with any Linux version, can be downloaded using the links below.

Page 2 of 3

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.