This release features significant changes to the data processing pipeline, in testing this provides a substantial increase in processing speed for large datasets. The previous version of the pipeline effectively had a soft limit on the number of viewpoints that could be processed as each viewpoint of each partition of each sample was written to a TSV file. This was easily capable of hitting the maximum number of files possible to be stored in a single directory. To overcome this issue, slices (in silico digested reads) from all viewpoints from a partition are now stored by default in Apache Parquet format. This also has the advantage of decreasing the memory footprint for PCR duplicate removal and enables interaction counts to be determined in parallel. Furthermore, to increase the pipeline speed and reduce the disk size footprint, intermediate count files are no longer generated but instead counts are stored directly in cooler format. The generation of the statistics report has also been overhauled.
Bug Fixes
CLI: Fixed help option not being displayed when running capcruncher pipeline ([129](https://github.com/sims-lab/CapCruncher/issues/129)) ([9f09093](https://github.com/sims-lab/CapCruncher/commit/9f090935f3c20c5d78e01ab4f5b0248b325ee341))
Deduplication: Prevents excessive memory usage during duplicate removal ([136](https://github.com/sims-lab/CapCruncher/issues/136)) ([b175978](https://github.com/sims-lab/CapCruncher/commit/b17597884164ed074782370637a81732390ac48c))
Packaging: Fixed bug after updating pyyaml to latest version ([122](https://github.com/sims-lab/CapCruncher/issues/122)) ([7d76b5f](https://github.com/sims-lab/CapCruncher/commit/7d76b5f4976fe3c6f1bc09989df3db28c12ecce3))
Packaging: Added missing dependencies (seaborn and trackhub) to setup.cfg ([550a882](https://github.com/sims-lab/CapCruncher/commit/550a882af5e131c04b5d45bf0430ecc50ce15310))
Packaging: Fixed packaging long description. ([115](https://github.com/sims-lab/CapCruncher/issues/115)) ([6f716d1](https://github.com/sims-lab/CapCruncher/commit/6f716d182de705146333206a38d9c791de1a9227))
Pipeline: Fixes issue with tasks going over their allotted number of cores. ([133](https://github.com/sims-lab/CapCruncher/issues/133)) ([27cd193](https://github.com/sims-lab/CapCruncher/commit/27cd193c207409b96a0b28c079b9d689daaa61ee))
Pipeline: Fixes error during deduplication when using gzip compression ([134](https://github.com/sims-lab/CapCruncher/issues/134)) ([01ff56b](https://github.com/sims-lab/CapCruncher/commit/01ff56b88558af486d11b9f7544c8c5c6ca9f002))
Pipeline: Re-partition reporter slices after filtering ([124](https://github.com/sims-lab/CapCruncher/issues/124)) ([db72c56](https://github.com/sims-lab/CapCruncher/commit/db72c56875c13ed2762d44e916a8ed66f73324cc))
Reporter comparisons: Fixed an issue when no data exists for a viewpoint for a given sample ([139](https://github.com/sims-lab/CapCruncher/issues/139)) ([e720029](https://github.com/sims-lab/CapCruncher/commit/e7200299bf2453e719f28f95ed3658e9570b7ad5))
Storage: Fix link common cooler tables ([137](https://github.com/sims-lab/CapCruncher/issues/137)) ([4836fbe](https://github.com/sims-lab/CapCruncher/commit/4836fbe8e46ad268dda6d05f27104789f0c46e0d)[](https://github.com/sims-lab/CapCruncher#features))
Features
CLI: Enables pileup normalisation using a set of regions supplied as a bed file ([121](https://github.com/sims-lab/CapCruncher/issues/121)) ([9c587ff](https://github.com/sims-lab/CapCruncher/commit/9c587ff1a60f009c0b990952361810d61376a1c7))
Packaging: Moved all configuration from setup.py to setup.cfg. ([114](https://github.com/sims-lab/CapCruncher/issues/114)) ([4835da4](https://github.com/sims-lab/CapCruncher/commit/4835da44157132feda38e299bf9c67ca297c3d2d))
Pipeline: Expanded the number of viewpoints that can be processed ([128](https://github.com/sims-lab/CapCruncher/issues/128)) ([8fcb576](https://github.com/sims-lab/CapCruncher/commit/8fcb57657f108d78cdbb1e255a5eb85b7cb3e860))
Pipeline: Capability to normalise pileups (bedgraphs/bigwigs) by a set of supplied regions. ([125](https://github.com/sims-lab/CapCruncher/issues/125)) ([bab07ea](https://github.com/sims-lab/CapCruncher/commit/bab07eac1e524020d24c745dd88b749173d9d440))
Pipeline: Enable optional compression during fastq split and deduplicate ([131](https://github.com/sims-lab/CapCruncher/issues/131)) ([0c32b73](https://github.com/sims-lab/CapCruncher/commit/0c32b7320fcff5d95145a406996e9baf9f7aeebd))
Pipeline: Enabled the use of custom filtering orders ([119](https://github.com/sims-lab/CapCruncher/issues/119)) ([b57ebe8](https://github.com/sims-lab/CapCruncher/commit/b57ebe886fc767b8dcb12c7dfc45dd2e9a1ea1b3))
Pipeline: Reduced disk space required by pipeline by removing intermediate files ([135](https://github.com/sims-lab/CapCruncher/issues/135)) ([d6c4302](https://github.com/sims-lab/CapCruncher/commit/d6c4302a27c14b965c531b11242ef6dd152fc1a1))
Pipeline:: Reporter counting now performed in parallel on separate partitions before collating. ([117](https://github.com/sims-lab/CapCruncher/issues/117)) ([aae5356](https://github.com/sims-lab/CapCruncher/commit/aae5356d6268e71ae777ffb31fcbd98e76ccd8c2))
Pipeline: Reverted without_cluster for reporter comparisons ([140](https://github.com/sims-lab/CapCruncher/issues/140)) ([f847d28](https://github.com/sims-lab/CapCruncher/commit/f847d282f556d336be2a66023aced8c8dd082551))
Storage: Reduce disk space taken up by reporters (slices and counts) ([138](https://github.com/sims-lab/CapCruncher/issues/138)) ([7659a8c](https://github.com/sims-lab/CapCruncher/commit/7659a8c3fee15ec94c107313d16ce9c831f4ffbf))