===========
New scripts
-----------
* ``observation_metadata_correlation.py``: Allows the calculation of correlations between feature abundances and continuous-valued metadata. This script replaces the continuous-valued correlation functionality that was in ``otu_category_significance.py`` in QIIME 1.7.0 and earlier.
* ``compare_trajectories.py``: Allows analysis of volatility using different algorithms.
* ``compute_taxonomy_ratios.py``: Implements the microbial dysbiosis index (MD-index) from [Gevers et al 2014](http://www.ncbi.nlm.nih.gov/pubmed/24629344).
* ``collapse_samples.py``: Allows collapsing groups of samples in BIOM tables and mapping files based on their metadata (see [1678](https://github.com/biocore/qiime/issues/1678)). This can be used, for example, to collapse samples belonging to a replicate group. This also has replaced ``summarize_otu_by_cat.py`` (see discussion on [1798](https://github.com/biocore/qiime/issues/1798)).
* ``multiple_split_libraries_fastq.py``, ``multiple_join_paired_ends.py``, and ``multiple_extract_barcodes.py``: Facilitate initial QIIME processing of already-demultiplexed fastq files, as these are commonly being provided by sequencing centers.
* ``differential_abundance.py``: Supplements ``group_significance.py`` to support metagenomeSeq's fitZIG algorithm and DESeq2's negative binomial algorithm. The input for this is an unnormalized, raw BIOM table.
* ``normalize_table.py``: Adds support for BIOM table normalization algorithms in addition to rarefaction. Supported methods are metagenomeSeq's CSS and DESeq's variance stabilizing transformation.
* ``start_parallel_jobs_slurm.py``: Allows for parallel job submission using [slurm](https://computing.llnl.gov/linux/slurm/).
* ``split_libraries_lea_seq.py``: Allows for demultiplexing of sequences using the LEA-Seq protocol, described in [Faith et al. (2013)](http://www.sciencemag.org/content/341/6141/1237439). This script should be considered to be in **beta testing status**.
* ``extract_reads_from_interleaved_file.py``: Splits an interleaved FASTQ file (like the ones produced by JGI) into forward and reverse reads. See [this section](http://qiime.org/tutorials/processing_illumina_data.htmlprocessing-joint-genome-institute-fastq-files) of the Illumina data preparation tutorial for more details.
* ``parallel_pick_otus_sortmerna.py``: Perform parallel OTU picking with SortMeRNA ([Kopylova et al. (2012)](http://www.ncbi.nlm.nih.gov/pubmed/23071270).
Features
--------
* ``split_otu_table.py`` now allows multiple fields to be passed to split a biom table, and optionally a mapping file. Check out the new documentation for the naming conventions (which have changed slightly) and an example.
* Added new options to ``make_otu_heatmap.py``:
* ``--color_scheme``, which allows users to choose from different color schemes [here](http://matplotlib.org/examples/color/colormaps_reference.html)
* ``--observation_metadata_category``, which allows users to select a column other than taxonomy to use when labeling the rows
* ``--observation_metadata_level``, which allows the user to specify which level in the hierarchical metadata category to use in creating the row labels.
* ``-g``/``--imagetype``, ``--dpi``, ``--width``, and ``--height``, which offer more control over the generation of heatmap figures.
* ``-m/--mapping_fps`` is no longer required for split_libraries_fastq.py. The mapping file is not required when running with ``--barcode_type 'not-barcoded'``,but the mapping file would fail to validate when passing multiple sequence files and sample ids but a mapping file without barcodes (see [1400](https://github.com/biocore/qiime/issues/1400)).
* Added alphabetical sorting option (based on boxplot labels) to ``make_distance_boxplots.py``. Sorting by boxplot median can now be performed by passing ``--sort median`` (this was previously invoked by passing ``--sort``). Sorting alphabetically can be performed by passing ``--sort alphabetical``.
* Scripts that write an OTU table will now write BIOM files in HDF5 format if HDF5 is installed. This improves performance for very large OTU tables.
* ``merge_mapping_files.py`` can now take an argument to convert the header names to upper case, so it will merge for example a category named `treatment` and another one named `TREATMENT` from two different mapping files.
* The script ``make_distance_histograms.py`` has been removed. This functionality should be accessed through ``make_distance_boxplots.py``.
* Beta support has been added for performing OTU picking with open source software:
* subsampled open reference OTU picking using SortMeRNA ([Kopylova et al. (2012)](http://www.ncbi.nlm.nih.gov/pubmed/23071270) (for the closed-reference steps) and [SumaClust](http://metabarcoding.org/sumatra) (for the open reference steps). This can be accessed with ``pick_open_reference_otus.py -m sortmerna_sumaclust``.
* closed-reference OTU picking using SortMeRNA ([Kopylova et al. (2012)](http://www.ncbi.nlm.nih.gov/pubmed/23071270). This can be accessed with ``pick_closed_reference_otus.py -p params.txt`` where params.txt includes the line ``pick_otus:otu_picking_method sortmerna``.
* de novo OTU picking using [SumaClust](http://metabarcoding.org/sumatra) or swarm ([Mahe et al. (2014)](https://peerj.com/articles/593/)). This can be accessed with ``pick_de_novo_otus.py -p params.txt`` where params.txt includes the line ``pick_otus:otu_picking_method sumaclust`` or ``pick_otus:otu_picking_method swarm``.
* sumaclust v1.0.00, swarm 1.2.19, and sortmerna 2.0 are now optional dependencies (see the [QIIME install docs](http://qiime.org/install/install.html) for details).
* Renamed ``split_fasta_on_sample_ids_to_files.py`` to ``split_sequence_file_on_sample_ids.py``, which now supports splitting FASTQ files, as well. Added a parameter, ``--file_type``, which is used to specify the type of the input file.
* Added ``--assign_taxonomy`` option to ``pick_closed_reference_otus.py`` to allow taxonomy assignment using a classifier, rather than the default of using the taxonomic assignment of the cluster centroid.
* Added ``--suppress_taxonomy_assignment`` option to ``pick_closed_reference_otus.py``.
* Updated output of ``identify_paired_differences.py`` to include more information in the pseudo-mapping file that it generates. This includes the "pre" and "post" values for all of the analysis categories on a per-subject basis. This is useful for plotting with other tools, or for generating legends for the plots that are currently generated by the script (see [issue 1707](https://github.com/biocore/qiime/issues/1707)).
* Added ``pick_otus_reference_seqs_fp`` to the QIIME config file. This is a filepath to reference sequences to use with QIIME's OTU picking scripts/workflows. See the [QIIME config docs](http://qiime.org/install/qiime_config.html) and [1696](https://github.com/biocore/qiime/issues/1696) for more details.
* The QIIME config settings ``assign_taxonomy_id_to_taxonomy_fp``, ``assign_taxonomy_reference_seqs_fp``, ``pick_otus_reference_seqs_fp``, and ``pynast_template_alignment_fp`` now default to reference data files in the [qiime-default-reference project](http://github.com/biocore/qiime-default-reference).
* Installing QIIME via ``pip install qiime`` now works out-of-the-box by providing a functioning QIIME minimal (base) install (see [1696](https://github.com/biocore/qiime/issues/1696)).
* ``cluster_jobs_fp`` in the QIIME config file now defaults to ``start_parallel_jobs.py``. ``seconds_to_sleep`` now defaults to 1.
* Added ``--negate_sample_id_fp`` option to ``filter_samples_from_otu_table.py`` (see [1117](https://github.com/biocore/qiime/issues/1117)).
* Added ``--percent_variation_below_one`` flag to ``make_2d_plots.py`` for when the percent variation is actually below 1 and not a relative measure.
* The default confidence threshold for the Naive Bayes taxonomy assigners (RDP Classifier and mothur) is now ``0.50``, as [recommended by the RDP Classifier developers](https://rdp.cme.msu.edu/classifier/class_help.jspconf) for partial sequences.
Usability enhancements
----------------------
* Simplified and improved QIIME install documentation.
* Errors raised by scripts are easier to read and include a supplementary message on how to get help (see [1794](https://github.com/biocore/qiime/issues/1794)).
* QIIME is now easier to install! Removed ``qiime_scripts_dir``, ``python_exe_fp``, ``working_dir``, ``cloud_environment``, and ``template_alignment_lanemask_fp`` from the QIIME config file. If these values are present in your QIIME config file, they will be flagged as unrecognized by ``print_qiime_config.py -t`` and will be ignored by QIIME. QIIME will now use the ``python`` executable and QIIME scripts that are found in your ``PATH`` environment variable, and ``temp_dir`` will be used in place of ``working_dir`` (this value was used by some parts of parallel QIIME previously). ``filter_alignment.py`` will now use the 16S alignment Lane mask (Lane, D.J. 1991) by default if one is not provided via ``--lane_mask_fp``.
* ``--tail_type`` option in ``compare_distance_matrices.py`` now accepts "two-sided" instead of "two sided" for specifying a two-sided alternative hypothesis. The new name is easier to specify via the command-line (quotes aren't needed because it is a single word).
* ``print_qiime_config.py -t`` now tests a QIIME minimal (base) install instead of a QIIME full install. ``print_qiime_config.py -tf`` tests a QIIME full install.
* Standardized use of underscores in option longnames. Affected scripts and options:
* ``scripts/demultiplex_fasta.py``
* `start-numbering-at` is now `start_numbering_at`
* ``scripts/denoiser.py``
* `low_cut-off` is now `low_cut_off`
* `high_cut-off` is now `high_cut_off`
* ``scripts/multiple_rarefactions.py``
* `num-reps` is now `num_reps`
* ``scripts/multiple_rarefactions_even_depth.py``
* `num-reps` is now `num_reps`
* ``scripts/parallel_multiple_rarefactions.py``
* `num-reps` is now `num_reps`
* ``scripts/plot_rank_abundance_graph.py``
* `no-legend` is now `no_legend`
* ``scripts/split_libraries.py``
* `min-seq-length` is now `min_seq_length`
* `max-seq-length` is now `max_seq_length`
* `trim-seq-length` is now `trim_seq_length`
* `min-qual-score` is now `min_qual_score`
* `keep-primer` is now `keep_primer`
* `keep-barcode` is now `keep_barcode`
* `max-ambig` is now `max_ambig`
* `max-homopolymer` is now `max_homopolymer`
* `max-primer-mismatch` is now `max_primer_mismatch`
* `barcode-type` is now `barcode_type`
* `dir-prefix` is now `dir_prefix`
* `max-barcode-errors` is now `max_barcode_errors`
* `start-numbering-at` is now `start_numbering_at`
* Removed ``--output_dir`` optional option from ``make_otu_heatmap.py`` and replaced it with the required option ``--output_fp``.
* The parameters ``--uclust_min_consensus_fraction`` and ``--uclust_similarity`` in ``*_assign_taxonomy_*`` scripts have been changed to ``--min_consensus_fraction`` and ``--similarity`` since both of these parameters apply to the SortMeRNA taxon assigner as well.
* Several changes were made to ``alpha_diversity.py`` metric names:
* ``ACE`` is now ``ace``
* ``chao1_confidence`` is now ``chao1_ci``
* Added ``observed_otus``, which is equivalent to ``observed_species`` but is generally a more accurate name. ``observed_species`` is retained for backward-compatibility.
* SortMeRNA 2.0, SUMACLUST 1.0.00, and swarm 1.2.19 are now installed automatically when QIIME is installed (e.g., via `pip install qiime`).
Bug fixes
---------
* Relaxed sanity tests for ``compare_categories.py --method adonis`` so that unique values are only checked for categories that are non-numeric (see [issue 1316](https://github.com/biocore/qiime/issues/1360)).
* ``core_diversity_analyses.py`` now requires ``--tree_fp`` unless ``--nonphylogenetic_diversity`` is passed (see [1671](https://github.com/biocore/qiime/issues/1671)).
* Fixed bug in ``assign_taxonomy.py -m blast`` and ``parallel_assign_taxonomy_blast.py`` that prevented multiple instances of either to run at the same time (see [1768](https://github.com/biocore/qiime/issues/1768)).
* Fixed bug where ``--phred_offset`` in ``split_libraries_fastq.py`` was ignored (see [1656](https://github.com/biocore/qiime/issues/1656)).
* Spaces in taxa will not cause an error when using ``--assignment_method=mothur`` in ``assign_taxonomy.py``.
* Fixed bug where long axis labels were cut off in heatmaps generated by ``make_otu_heatmap.py`` (see [1571](https://github.com/biocore/qiime/issues/1571)).
* Fixed bug where ``-S``/``--suppress_submit_jobs`` was being ignored by several of the parallel scripts (e.g. ``parallel_pick_otus_uclust_ref.py``) (see [1665](https://github.com/biocore/qiime/issues/1665)).
* Fixed bug where ``make_distance_comparison_plots.py`` would create empty groups (see [1627](https://github.com/biocore/qiime/issues/1627)).
* ``qiime/workflow/pick_open_reference_otus.py`` no longer copies the permission bits of the reference file which caused a file permission failure in some cases.
* Fixed bug in ``make_rarefaction_plots.py`` where ``--generate_per_sample_plots`` wasn't working (see [1475](https://github.com/biocore/qiime/issues/1475)).
* Fixed bug that resulted in samples being mislabeled in ``make_otu_heatmap.py`` when one of the following options was passed: ``--category``, ``--map_fname``, ``--sample_tree``, or ``--suppress_column_clustering``. This is discussed in [1790](https://github.com/biocore/qiime/issues/1790).
Removal of outdated and unsupported functionality
-------------------------------------------------
* Removed ``-Y``/``--python_exe_fp`` and ``-N`` options from ``parallel_merge_otu_tables.py`` script as these are not available in any of the other parallel QIIME scripts and we do not have good reason to support them (see QIIME 1.6.0 release notes below for more details).
* Removed ``insert_seqs_into_tree.py``. This code needs additional testing and documentation, and was not widely used. We plan to add this support back in the future, and progress on that can be followed on [1499](https://github.com/biocore/qiime/issues/1499).
* ``summarize_otu_by_cat.py`` has been replaced with ``collapse_samples.py``.
* Removed options ``-c``/``--ci_type``, ``-a``/``--alpha``, and ``-f``/``--f_ratio`` from ``conditional_uncovered_probability.py`` as these weren't being used by the script (i.e., supplying different values didn't change the computed CIs because the default were always used).
* Removed ``tax2tree`` as a method in ``assign_taxonomy.py``.
* Fasttree v1.x is no longer supported by ``make_phylogeny.py`` (see [issue 1516](https://github.com/biocore/qiime/issues/1516)).
* Removed ``submit_to_mgrast.py`` script (see [1780](https://github.com/biocore/qiime/issues/1780)).
* Removed ``make_otu_heatmap_html.py`` in favor of ``make_otu_heatmap.py`` (see discussion on [1724](https://github.com/biocore/qiime/issues/1724)).
* Removed ``-m``/``--include_html_counts`` option from the ``plot_taxa_summary.py`` script as the behavior was no longer useful or accurate.
Performance enhancements
------------------------
* Changed default parameters for uclust-based OTU picking: ``max_accepts`` is now 1 (was 20), ``max_rejects`` is now 8 (was 500), ``stepwords`` is now 8 (was 20), and ``word_length`` is now 8 (was 12). These changes greatly reduce runtime, with minimal effect on the results. See Rideout et al., 2014 ([PeerJ pre-print](https://peerj.com/preprints/411/)) for more details.
* Disabled the prefilter by default in ``pick_open_reference_otus.py``. This change greatly reduces runtime, with minimal effect on the results. See Rideout et al., 2014 ([PeerJ pre-print](https://peerj.com/preprints/411/)) for more details.
* The alpha diversity measures available in QIIME (e.g., ``alpha_diversity.py``) are now powered by [scikit-bio](http://scikit-bio.org/), and several of these methods are now considerably faster! See the scikit-bio docs on [alpha diversity](http://scikit-bio.org/docs/latest/generated/skbio.diversity.alpha.html) for more details on the methods.
* ANOSIM and PERMANOVA (available in ``compare_categories.py``) are now powered by [scikit-bio](http://scikit-bio.org/) and are approximately 1000 times faster than previous implementations. These additionally now provide more useful information in the output file. See the scikit-bio docs for [ANOSIM](http://scikit-bio.org/docs/latest/generated/generated/skbio.stats.distance.anosim.html) and [PERMANOVA](http://scikit-bio.org/docs/latest/generated/generated/skbio.stats.distance.permanova.html) for more detail.
* Renamed ``compare_categories.py``'s BEST method to BIO-ENV to match the name used in R's vegan package (``vegan::bioenv``) and the name of the program in the original paper. Use ``compare_categories.py --method bioenv`` instead of ``compare_categories.py --method best``. The underlying implementation has also been rewritten and is considerably faster than before, and the output more closely matches the vegan package, as environmental variables are now scaled before computing Euclidean distances. See the scikit-bio docs for [BIO-ENV](http://scikit-bio.org/docs/latest/generated/generated/skbio.stats.distance.bioenv.html) for more detail.
* The Mantel test (``--method mantel``) and Mantel correlogram (``--method mantel_corr)`` in ``compare_distance_matrices.py`` are considerably faster than previous implementations. See the scikit-bio docs for [Mantel](http://scikit-bio.org/docs/latest/generated/generated/skbio.stats.distance.mantel.html) for more detail.