Updates to MDS gap-filling
The community-standard MDS gap-filling method for eddy covariance ecosystem fluxes (e.g., CO2 flux) is now integrated
into the `FluxProcessingChain`. MDS is used during gap-filling in flux Level-4.1.
- **Example notebook** using MDS as part of the flux processing chain where it is used together with random
forest: [Flux Processing Chain](/notebooks/FluxProcessingChain/FluxProcessingChain.ipynb)
- **Example notebook** using MDS as stand alone class
`FluxMDS`: [MDS gap-filling of ecosystem fluxes](/notebooks/GapFilling/FluxMDSGapFilling.ipynb)
The `diive` implementation of the MDS gap-filling method adheres to the descriptions in Reichstein et al. (2005) and
Vekuri et al. (2023), similar to the standard gap-filling procedures used by FLUXNET, ICOS, ReddyProc, and other similar
platforms. This method fills gaps by substituting missing flux values with average flux values observed under comparable
meteorological conditions.

Background: different flux levels
- The class `FluxProcessingChain` in `diive` follows the flux processing steps as shown in
the [Flux Processing Chain](https://www.swissfluxnet.ethz.ch/index.php/data/ecosystem-fluxes/flux-processing-chain/)
outlined by [Swiss FluxNet](https://www.swissfluxnet.ethz.ch/).
-
- The flux processing chain uses different levels for different steps in the chain:
- Level-0: preliminary flux calculations, e.g. during the year,
using [EddyPro](https://www.licor.com/products/eddy-covariance/eddypro)
- Level-1: final flux calculations, e.g. for complete year,
using [EddyPro](https://www.licor.com/products/eddy-covariance/eddypro)
- Level-2: quality flag expansion (flagging)
- Level-3.1: storage correction (using one point measurement only, from profile not included by default)
- Level-3.2: outlier removal (flagging)
- Level-3.3: USTAR filtering (constant threshold, must be known, detection process not included by default) (
flagging)
- Following Level 3.3, a comprehensive quality flag (`QCF`) is generated by combining individual quality flags.
Prior to subsequent processing steps, low-quality data (flag=2) is removed. Medium-quality data (flag=1) can be
retained if necessary, while the highest quality data (flag=0) is always kept.
- Level-4.1: gap-filling (MDS, long-term random forest)
Changes
- Changes in `FluxMDS`:
- Added parameter `avg_min_n_vals` in MDS gap-filling
- Renamed tolerance parameters for MDS gap-filling to `*_tol`
- (`diive.pkgs.gapfilling.mds.FluxMDS`)
- When reading a parquet file, sanitizing the timestamp is now optional (`diive.core.io.files.load_parquet`)
- The function for creating lagged variants is now found in `diive.pkgs.createvar.laggedvariants.lagged_variants`
Additions
- Added more text output for fill quality during gap-filling with MDS (`diive.pkgs.gapfilling.mds.FluxMDS`)
- Added MDS gap-filling to flux processing chain (
`diive.pkgs.fluxprocessingchain.fluxprocessingchain.FluxProcessingChain`)
- Allow fitting to unbinned data (`diive.pkgs.fits.fitter.BinFitterCP`)
- Added parameter to edit y-label (`diive.core.plotting.dielcycle.DielCycle`)
- Added preliminary USTAR filtering for NEE to quick flux processing chain (
`diive.pkgs.fluxprocessingchain.fluxprocessingchain.QuickFluxProcessingChain`)
- `FileSplitter`:
- Added parameter to directly output splits as `parquet` files in `FileSplitter` and `FileSplitterMulti`. These two
classes split longer time series files (e.g., 6 hours) into several smaller splits (e.g., 12 half-hourly files).
Usage of parquet speeds up not only the splitting part, but also the process when later re-reading the files for
other processing steps.
- After splitting, missing values in the split files are numpy NAN (`diive.core.io.filesplitter.FileSplitter`)
- Added parameter to hide default plot when called. The method `defaultplot` is used e.g. by outlier detection methods
to plot the data after outlier removal, to show flagged vs. unflagged values. (
`diive.core.base.flagbase.FlagBase.defaultplot`)
- Added new filetype `ETH-SONICREAD-BICO-MOD-CSV-20HZ`
- Added `fig` property that contains the default plot for outlier removal methods. This is useful when the default plot
is needed elsewhere, e.g. saved to a file. At the moment, the parameter `showplot` must be `True` for the property to
be accessible. (`diive.core.base.flagbase.FlagBase`)
- Example for class `zScoreRolling`:
zsr = zScoreRolling(..., showplot=True, ...)
zsr.calc(repeat=True)
fig = zsr.fig Contains the figure instance
fig.savefig(...) Figure can then be saved to a file etc.
Notebooks
- Added notebook example for creating lagged variants of variables (
`notebooks/CalculateVariable/Create_lagged_variants.ipynb`)
- Updated flux processing chain notebook to `v9.0`: added option for MDS gap-filling, more descriptions
- Bugfix: import for loading from `Path` was missing in flux processing chain notebook
- Updated MDS gap-filling notebook to `v1.1`, added more descriptions and example for `min_n_vals_nt` parameter
- Updated quick flux processing chain notebook
Unittests
- Added test case `tests.test_createvar.TestCreateVar.test_lagged_variants`
- Updated test case `tests.test_gapfilling.TestGapFilling.test_fluxmds`
- Updated test case `tests.test_fluxprocessingchain.TestFluxProcessingChain.test_fluxprocessingchain`
- 53/53 unittests ran successfully
Bugfixes
- The setting for features that should not be lagged was not properly implemented (
`diive.pkgs.fluxprocessingchain.fluxprocessingchain.FluxProcessingChain._get_ml_feature_settings`)
- Fixed bug when plotting (`diive.pkgs.outlierdetection.localsd.LocalSD`)