Summary:
BuysDB (331):
Updated download URL
Added get_samples_from_bam function
Added get_sample_to_read_group_dict functions
Added sample extraction script
Moved extract_samples method to function which can be unit tested
Added tests for extract_samples
Clean up all files created during testing. (Some .bai files remained)
Removed tf requirement
bugfix: no newline after map index rows
Added scartrace module to Molecule
Added scartrace module to Fragment
Import submodules
Added scartrace to bamtagmultiome
scartrace: Check if read is mapped before looking into the alignment
Added allele cache flag
Bamfilter: fixed header formatting
Use cigar in deduplication, fixed 84
Added test for 84
Use --no_umi_cigar_processing to disable the new behaviour
Fixed test case
Fixed test case file
Simplified sorted_bam handle
Removed use of getPairGenomicLocations and allowed fragments to decide their span.
Added get_safe_span() method to fragment which reports the span excluding primers
Added region parameters to AlleleResolver to reduce memory footprint
Added documentation
Also use region parameters when fetching from cache
Added pileup module
Added check_eject_every=None option to MoleculeIterator
Updated example
Fixed module reference
Version increment
Extract base-calls for fragments mapping to multiple contigs
Use the contig of the random primer in IVT deduplication
Pass kwargs to pysam pileup and set higher max_depth
Variant masking tool now runs for multiple contigs in parallel and will not crash when the VCF does not match the fasta file completely
Reading the vcf using 4 threads per process
Added support for non-properly paired reads
Added resolve_unproperly_paired_reads to bamtagmultiome
Set more decompression threads and fixed description
Set program ID tag in PG header line
The BI tag was used to identify the cell index, but it clashes with GATK. It is now changed to lowercase bi.
Added forwards compat
Added --slurm flag to submission.py.
Set job name
Fixed BI tag compat
Automatically convert BI to bi tag
Fixed bug accessing tag dict
Started work on slurm/sge/local wrapper
submission.py is now slurm compatible, added API to sumbit and hold jobs
Added scheduler selection argument to bamtagmultiome
fixed import
Removed references to args
Return job_id
Added slurm wrapper for snakemake
Added description to iterator class inputs
Fixed typo
Added legacy scripts
Made legacy scripts PEP8 compliant
Added job_name argument to submit_job
job_alias is now optional
Changed passed arg
Parse scientific notated locations during bed parsing
Use chromosome index in job script name
Perform explicit cd to working dir
Set job name of final job
Fixed job_name
Display id of last job
Show job ids of intermediate jobs
Use after: in slurm dependency submission
Addiotion to previous
Check for hold being None
Strip hold input
Use afterok instead of after
Strip job ids
Use one UUID for a single bamtagmultiome run
Show holding command
Concatenate all job ids in one dependency command
Use : as job separator
Changed argument order
Pass None to API when hold is empty
Set job name when using CLI
Swapped prefix and hard job name
job_alias
Added utf8 header
Generate random job name if not specified
Prefix job for sge compat
Typo fixes
Job name is now properly set when supplied. File names are timestamped if not specified.
Demux.py: create unique glue job name
Added script to match bam file with bqsr report
More descriptive error message when autodetection fails
Added memory management parameters for molecule iteration
Added parameters to Molecule to cap the amount of associated fragments
Fixed? the SLURM wrapper for snakemake workflows
Added slurm wrapper to setup
Parse job runtime from resources
Added SLURM command example to scmo_workflow.py
Added MUTECT2 workflow
Tweaked resources
Added first pass variant calling
sge and slurm wrapper now use the same API calls
Report job id
Set correct index name
Fixes 101
Extraction
Added germline variant filters
Some syntax fixes
Added germline filter message and header
SNV filter
Added extra uuid4
Write intermediate results
Added -filterMP flag to bamToCountTable
Double dash
Fixed tests
Added threads to bamcnv
Updated test cases with blacklist argument
Added CS2 demux without hexamer
Set class name
Added CELSeq2_c8_u6_NH to strat loader
Added test case for CELSEQ demux. Fixed hexamer setting of NH.
Fall back on using qsub when sbatch is not available
Added workflow for SCMO (not featurecounts) celseq2 analysis
Fixed exon gtf script name in description
Added capture_locations argument
Added hash function to SingleEndTranscript (speed benefit)
Re-ordered demux methods
Added genomic plot class
Added bamFeatures module
Indentation fix
Fixed broken indent
Updates for chic
Added script to split bamfile by tag
Added skip_contig option to bamtagmultiome
Added demux tests
Added compat for already demultiplexed index
Fixed cell-readcount plot
Added get_contig_size to bamprocessing utils
Fast multi-processing count table generation
FeatureCountsFullLengthFragment fragment class added
Fixed variable declaration
Added linting script
Added full length featurecounts dedup option to bamtagmultiome (fl_feature_counts)
Removed unused imports
Allow pysam.FastaFile as argument
Added method to reset axis of a contig
Swapped dictionary indexing
Added key_tags argument
Allow pysam handle
Scale axis and despine
Fixed ax reference
Added dedup option
Added genome coverage plot to library stats
Added bam_is_processed_by_program function
Autodetect which bam file should be used if not supplied
Added more arguments to configure memory limits
Dont use multiprocessing when one thread is requested
Added variant extraction to workflow
Added cn clustermap
Added more comments and only check sample when read is used
Make sure the contigs are in the correct order
Added live counting function
Added lowess count correction
Added script for extracting and plotting cn
Added progress indication
Fix print statement
Removed incorrect argument
Added missing cariage return
Bugfix: Check if gc matrix needs to be computed
Added max_fragment_size threshold
Added option to set a single read group sample id per library
Added option to allow shift in cycle
Write rejection reason tag
Added parameter to expose setting to allow cycle shift
Added read group format setting to bamtagmultiome
Added allow_cycle_shift to bamtagmultiome
allow_cycle_shift=False by default
Added test case and updated other test cases
Added overflow support to MoleculeIterator
Raise overflow error when too many fragments are being associated with a molecule
Added association limit parameters
Added callback function to MoleculeIterator to monitor progress and state
Added performance logging methods to bamtagmultiome
Correctly handle yield_invalid flag for overflow reads
Added yield_overflow parameter to MoleculeIterator
Added --no_overflow parameter to bamtagmultiome
Optimized ordering of progress indication and shows percentage deleted reads
Added verbosity settings
Added integrity status files and testing. Fixes 65
Added input_is_sorted argument
Refactored read group code
Added script to convert read group format of bam file
Made bamtagmultiome use the new read group protocol
Added get_read_group_from_read function to bamprocessing
Prevent duplicate program IDs
Added get_read_group_format function
Refactoring
Demux.py is now twice as fast.
Bugfix: pass keyword arguments in all Fragment classes
Fix kwargs
Set variant key to include ref and alt base
Added variants module
Added variant wrapper class which can be pickled
Start of postprocessing module
Added fast_compression flags to multiple functions
Added test case for writing with faster compression
Formatting
Added prototype bamtagmultiome script which uses multiple CPUs and automatically blacklists regions (scCHiC only for now)
Added more command line accessible arguments
Added functions to combine overlapping ranges
Added function to clip a list of regions between set boundaries
Added function to generate overlapping ranges excluding blacklisted regions
Added test case for blacklisted binning
Added blacklist option to bamtagmultiome_multi
Added statsmodels dependency
Minor tweaks
Bugfix: assume average GC for a region with only Ns in the reference sequence
Bugfixes
Added min_mapping_qual and debug_job_bin_bed arguments
Added min_mapping_qual to molecule iterator
Bugfix, always define total_commands
Filter fragments with large homopolymers in chic
Added allele freq filter script
Optimisations for slow disks
Bugfix: dont drop bins due to last bin with no molecules
Optimisations, smaller bins bigger clusters of jobs better tracking of blacklisted bins
Bugfix: use the paths specified by the user
Added get_read_group_to_sample_dict method
Added additional demultiplexing module for chic
Bugfix: --norejects was not always working
Added "auto" option to submission.py
Added tu tag for second UMI
Added chic with gene annotations to bamtagmultiome
Move calculate_consensus method
Create tag_multiome_single_thread method, such that it can be replaced by the multi version later
Started on read group support
Fall back to heatmap when clustermap fails
Added normalisation method argument
Refactored use of consensus model
Refactoring
Reset feature tags when re-appling
Added offset argument, which is by default -1
Refactoring
Added bp_chunked method
Allow lower version of numpy
Move merge_bams to bamProcessing
Created separate file for tagging methods
Handle tagging import and refactor
DeCamaLise
Added prefetching capability, for pre-loading genomic information for defined/current genomic region
Refactoring prefetching
Verify arguments
AlleleResolver can now be pickled
Added more tests
Added region arguments
Added verification method
Added read_all argument
Added prefetching to mapability reader
Added prefetch switch flag
Added Uninitialised class, which can be used to pass classes with cython code to Pool workers
Removed reference attribute from TAPS class. The reference handle is now accessed using the Molecule object
Added missing statsmodels dependency
Parse input arguments for unitialised classes and initialise if necessary
Added TF as total assoc fragment tag. Resolves 132
Type hints
Added tasks generation and use of blacklist 131
First working prototype! 131
Ignore molecukes without a defined cut-site. This needs some improvement later.
Use pickleable handles
Add some testcases for CHiC and multiprocessed CHiC
Added fasta handle which is pickleable
Write program header when using multiprocessing 131
Added replace_bam_header method
Added white and blacklisting for contigs 131
Added missing files
Updated setup.py
Added get_contigs_with_reads method and test-cases
Optimisation: only generate jobs for contigs which have reads mapped to them
Solves 131
Do not resolve mate pairs in qflag mode, and dont perform deduplication
Added method to calculate consensus based on majority vote
Bugfix: use reference handle which can be pickled
Bugfix (contig might not be set)
Write temp files to current directory by default
Added mj consensus method
Sortedbam: allow name of origin bam file to be passed as argument
Reset JN tag
Bugfix: prefetcher should copy instead of write to args
Additions
Added default settings for ct
Allow uncertain bases
Bufgix: CG. Fixed CHiC+TAPS
Usability fix: automatically append byValue tag if it is not specified
Solves 141. Set quality of read with no aligned bases to 0.
Added taps strand as setting to Taps
Changed default param value
Color reads based on CpG methylation
Extract single CpG calls using multiple processes
Update available tagging options
Use kwargs for all vars
Fixed nonsensical defaults
Bugfix comparison statement
Bugfix: prefetch in single thread mode
Added kwargs passing and known variant masking
More informative debug message
Optimisation, reduce iterations
Added option to write debug bam file
Fix, used not existing argument
Started on wrapper script to get methylation calls
Added readgroup tests
Formatting
Bugfix: header was not updated properly
Added WIG writer
Added test case for readgrouping in multiprocess mode
Added methylationt track csv and wig export
Removed incorrect -c flag
Handle cases where kwargs arguments are defined but None
Auto import
Fixes 143
Added methylation module
Refactoring
Added tests
Supress numpy warnings and remove rows with nans
Wig and distance matrix writing
Add bamProcessing/bamToMethylationCalls.py to setup
Set names of all columns
Set better defaults
solves 145
Marloes (27):
change bowtie2 (and bwa) reference file
Add reference file for mapper
scartrace workflow
cs_feature_counts was named 'from_featurecounts_tagged'
config.json for celseq workflow
Snakemake file for celseq workflow
Delete Snakefile
Snakefile for celseq workflow
added librarystatistics plots
Correct reference parameter in mapper (bwa/bowtie2)
For editing heterozygous SNPs to homozygous SNPs observed in data
Update heterozygousSNPedit.py
Update setup.py
Improved BWA mapping for 2x150bp paired end reads
Improved BWA mapping for 2x150bp paired end reads
improved insert size filtering
Update config.json
Update config.json
Update Chic snakemake workflow
Jake Yeung (4):
Add script to split bam by cluster
Add blacklist option for binned mode
Allow blackilst to create count table. Last commit before I copy over a test file.
Copy over test file. This should work for bins and beds
Maria Florescu (2):
adding BAM splitting script
working splitting of BAM file with double signal and optional linear interpolation