Diive

Latest version: v0.86.0

Safety actively analyzes 723650 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 2 of 16

0.84.0

New features

- New class `BinFitterCP` for fitting function to binned data, includes confidence interval and prediction interval (
`diive.pkgs.fits.fitter.BinFitterCP`)

![DIIVE](images/BinFitterCP_diive_v0.84.0.png)

Additions

- Added small function to detect duplicate entries in lists (`diive.core.funcs.funcs.find_duplicates_in_list`)
- Added new filetype (`diive/configs/filetypes/ETH-MERCURY-CSV-20HZ.yml`)
- Added new filetype (`diive/configs/filetypes/GENERIC-CSV-HEADER-1ROW-TS-END-FULL-NS-20HZ.yml`)

Bugfixes

- Not directly a bug fix, but when reading EddyPro fluxnet files with `LoadEddyProOutputFiles` (e.g., in the flux
processing chain) duplicate columns are now automatically renamed by adding a numbered suffix. For example, if two
variables are named `CUSTOM_CH4_MEAN` in the output file, they are automatically renamed to `CUSTOM_CH4_MEAN_1` and
`CUSTOM_CH4_MEAN_2` (`diive.core.dfun.frames.compare_len_header_vs_data`)

Notebooks

- Added notebook example for `BinFitterCP` (`notebooks/Fits/BinFitterCP.ipynb`)
- Updated flux processing chain notebook to `v8.6`, import for loading EddyPro fluxnet output files was missing

Tests

- Added test case for `BinFitterCP` (`tests.test_fits.TestFits.test_binfittercp`)
- 51/51 unittests ran successfully

0.83.2

From now on Python version `3.11.10` is used for developing Python (up to now, version `3.9` was used). All unittests
were successfully executed with this new Python version. In addition, all notebooks were re-run, all looked good.

[JupyterLab](https://jupyterlab.readthedocs.io/en/4.2.x/index.html) is now included in the environment, which makes it
easier to quickly install `diive` (`pip install diive`) in an environment and directly use its notebooks, without the
need to install JupyterLab separately.

Environment

- `diive` will now be developed using Python version `3.11.10`
- Added [JupyterLab](https://jupyterlab.readthedocs.io/en/4.2.x/index.html)
- Added [jupyter bokeh](https://github.com/bokeh/jupyter_bokeh)

Notebooks

- All notebooks were re-run and updated using Python version `3.11.10`

Tests

- 50/50 unittests ran successfully with Python version `3.11.10`

Changes

- Adjusted flags check in QCF flag report, the progressive flag must be the same as the previously calculated overall
flag (`diive.pkgs.qaqc.qcf.FlagQCF.report_qcf_evolution`)

0.83.1

Changes

- When detecting the frequency from the time delta of records, the inferred frequency is accepted if the most frequent
timedelta was found for more than 50% of records (`diive.core.times.times.timestamp_infer_freq_from_timedelta`)
- Storage terms are now gap-filled using the rolling median in an expanding time window (
`FluxStorageCorrectionSinglePointEddyPro._gapfill_storage_term`)

Notebooks

- Added notebook example for using the flux processing chain for CH4 flux from a subcanopy eddy covariance station (
`notebooks/Workbench/CH-DAS_2023_FluxProcessingChain/FluxProcessingChain_NEE_CH-DAS_2023.ipynb`)

Bugfixes

- Fixed info for storage term correction report to account for cases when more storage terms than flux records are
available (`FluxStorageCorrectionSinglePointEddyPro.report`)

Tests

- 50/50 unittests ran successfully

0.83.0

MDS gap-filling

Finally it is possible to use the `MDS` (`marginal distribution sampling`) gap-filling method in `diive`. This method is
the current default and widely used gap-filling method for eddy covariance ecosystem fluxes. For a detailed description
of the method see Reichstein et al. (2005) and Pastorello et al. (2020; full references given below).

The implementation of `MDS` in `diive` (`FluxMDS`) follows the description in Reichstein et al. (2005) and should
therefore yield results similar to other implementations of this algorithm. `FluxMDS` can also easily output model
scores, such as r2 and error values.

At the moment it is not yet possible to use `FluxMDS` in the flux processing chain, but during the preparation of this
update the flux processing chain code was already refactored and prepared to include `FluxMDS` in one of the next
updates.

At the moment, `FluxMDS` is specifically tailored to gap-fill ecosystem fluxes, a more general implementation (e.g., to
gap-fill meteorological data) will follow.

New features

- Added new gap-filling class `FluxMDS`:
- `MDS` stands for `marginal distribution sampling`. The method uses a time window to first identify meteorological
conditions (short-wave incoming radiation, air temperature and VPD) similar to those when the missing data
occurred. Gaps are then filled with the mean flux in the time window.
- `FluxMDS` cannot be used in the flux processing chain, but will be implemented soon.
- (`diive.pkgs.gapfilling.mds.FluxMDS`)

Changes

- **Storage correction**: By default, values missing in the storage term are now filled with a rolling mean in an
expanding time window. Testing showed that the (single point) storage term is missing for between 2-3% of the data,
which I think is reason enough to make filling these gaps the default option. Previously, it was optional to fill the
gaps using random forest, however, results were not great since only the timestamp info was used as model features.
Plots generated during Level-3.1 were also updated, now better showing the storage terms (gap-filled and
non-gap-filled) and the flag indicating filled values (
`diive.pkgs.fluxprocessingchain.level31_storagecorrection.FluxStorageCorrectionSinglePointEddyPro`)

Notebooks

- Added notebook example for `FluxMDS` (`notebooks/GapFilling/FluxMDSGapFilling.ipynb`)

Tests

- Added test case for `FluxMDS` (`tests.test_gapfilling.TestGapFilling.test_fluxmds`)
- 50/50 unittests ran successfully

Bugfixes

- Fixed bug: overall quality flag `QCF` was not created correctly for the different USTAR scenarios (
`diive.core.base.identify.identify_flagcols`) (`diive.pkgs.qaqc.qcf.FlagQCF`)
- Fixed bug: calculation of `QCF` flag sums is now strictly done on flag columns. Before, sums were calculated across
all columns in the flags dataframe, which resulted in erroneous overall flags after USTAR filtering (
`diive.pkgs.qaqc.qcf.FlagQCF._calculate_flagsums`)

Environment

- Added [polars](https://pola.rs/)

References

- Pastorello, G. et al. (2020). The FLUXNET2015 dataset and the ONEFlux processing pipeline
for eddy covariance data. 27. https://doi.org/10.1038/s41597-020-0534-3
- Reichstein, M., Falge, E., Baldocchi, D., Papale, D., Aubinet, M., Berbigier, P., Bernhofer, C., Buchmann, N.,
Gilmanov, T., Granier, A., Grunwald, T., Havrankova, K., Ilvesniemi, H., Janous, D., Knohl, A., Laurila, T., Lohila,
A., Loustau, D., Matteucci, G., … Valentini, R. (2005). On the separation of net ecosystem exchange into assimilation
and ecosystem respiration: Review and improved algorithm. Global Change Biology, 11(9),
1424–1439. https://doi.org/10.1111/j.1365-2486.2005.001002.x

0.82.1

Notebooks

- Added notebook showing an example for `LongTermGapFillingRandomForestTS` (
`notebooks/GapFilling/LongTermRandomForestGapFilling.ipynb`)
- Added notebook example for `MeasurementOffset` (`notebooks/Corrections/MeasurementOffset.ipynb`)

Tests

- Added unittest for `LongTermGapFillingRandomForestTS` (
`tests.test_gapfilling.TestGapFilling.test_gapfilling_longterm_randomforest`)
- Added unittest for `WindDirOffset` (`tests.test_corrections.TestCorrections.test_winddiroffset`)
- Added unittest for `DaytimeNighttimeFlag` (`tests.test_createvar.TestCreateVar.test_daytime_nighttime_flag`)
- Added unittest for `calc_vpd_from_ta_rh` (`tests.test_createvar.TestCreateVar.test_calc_vpd`)
- Added unittest for `percentiles101` (`tests.test_analyses.TestAnalyses.test_percentiles`)
- Added unittest for `GapFinder` (`tests.test_analyses.TestAnalyses.test_gapfinder`)
- Added unittest for `SortingBinsMethod` (`tests.test_analyses.TestAnalyses.test_sorting_bins_method`)
- Added unittest for `daily_correlation` (`tests.test_analyses.TestAnalyses.test_daily_correlation`)
- Added unittest for `QuantileXYAggZ` (`tests.test_analyses.TestCreateVar.test_quantilexyaggz`)
- 49/49 unittests ran successfully

Bugfixes

- Fixed bug that caused results from long-term gap-filling to be inconsistent *despite* using a fixed random state. I
found the following: when reducing features across years, the removal of duplicate features from a list of found
features created a list where the order of elements changed each run. This in turn produced slightly different
gap-filling results each time the long-term gap-filling was executed. Used Python version where this issue occurred
was `3.9.19`.
- Here is a simplified example, where `input_list` is a list of elements with some duplicate elements:
- Running `output_list = list(set(input_list))` generates `output_list` where the elements would have a different
output order each run. The elements were otherwise the same, only their order changed.
- To keep the order of elements consistent it was necessary to `output_list.sort()`.
- (`diive.pkgs.gapfilling.longterm.LongTermGapFillingBase.reduce_features_across_years`)
- Corrected wind direction could be 360°, but will now be 0° (
`diive.pkgs.corrections.winddiroffset.WindDirOffset._correct_degrees`)

0.82.0

Long-term gap-filling

It is now possible to gap-fill multi-year datasets using the class `LongTermGapFillingRandomForestTS`. In this approach,
data from neighboring years are pooled together before training the random forest model for gap-filling a specific year.
This is especially useful for long-term, multi-year datasets where environmental conditions and drivers might change
over years and decades.

Why random forest? Because it performed well and to me it looks like the first choice for gap-filling ecosystem fluxes,
at least at the moment.

Long-term gap-filling using random forest is now also built into the flux processing chain (Level-4.1). This allows to
quickly gap-fill the different USTAR scenarios and to create some useful plots (I
hope). [See the flux processing chain notebook for how this looks like](https://github.com/holukas/diive/blob/main/notebooks/FluxProcessingChain/FluxProcessingChain.ipynb).

In a future update it will be possible to either directly switch to `XGBoost` for gap-filling, or to use it (and other
machine-learning models) in combination with random forest in the flux processing chain.

Example

Here is an example for a dataset containing CO2 flux (`NEE`) measurements from 2005 to 2023:

- for gap-filling the year 2005, the model is trained on data from 2005, 2006 and 2007 (*2005 has no previous year*)
- for gap-filling the year 2006, the model is trained on data from 2005, 2006 and 2007 (same model as for 2005)
- for gap-filling the year 2007, the model is trained on data from 2006, 2007 and 2008
- ...
- for gap-filling the year 2012, the model is trained on data from 2011, 2012 and 2013
- for gap-filling the year 2013, the model is trained on data from 2012, 2013 and 2014
- for gap-filling the year 2014, the model is trained on data from 2013, 2014 and 2015
- ...
- for gap-filling the year 2021, the model is trained on data from 2020, 2021 and 2022
- for gap-filling the year 2022, the model is trained on data from 2021, 2022 and 2023 (same model as for 2023)
- for gap-filling the year 2023, the model is trained on data from 2021, 2022 and 2023 (*2023 has no next year*)

New features

- Added new method for long-term (multiple years) gap-filling using random forest to flux processing chain (
`diive.pkgs.fluxprocessingchain.fluxprocessingchain.FluxProcessingChain.level41_gapfilling_longterm`)
- Added new class for long-term (multiple years) gap-filling using random forest (
`diive.pkgs.gapfilling.longterm.LongTermGapFillingRandomForestTS`)
- Added class for plotting cumulative sums across all data, for multiple columns (
`diive.core.plotting.cumulative.Cumulative`)
- Added class to detect a constant offset between two measurements (
`diive.pkgs.corrections.measurementoffset.MeasurementOffset`)

Changes

- Creating lagged variants creates gaps which then leads to incomplete features in machine learning models. Now, gaps
are filled using simple forward and backward filling, limited to the number of values defined in *lag*. For example,
if variable TA is lagged by -2 value this creates two missing values for this variant at the start of the time series,
which then are then gap-filled using the simple backwards fill with `limit=2`. (
`diive.core.dfun.frames.lagged_variants`)

Notebooks

- Updated flux processing chain notebook to include long-term gap-filling using random forest (
`notebooks/FluxProcessingChain/FluxProcessingChain.ipynb`)
- Added new notebook for plotting cumulative sums across all data, for multiple columns (
`notebooks/Plotting/Cumulative.ipynb`)

Tests

- Unittest for flux processing chain now includes many more methods (
`tests.test_fluxprocessingchain.TestFluxProcessingChain.test_fluxprocessingchain`)
- 39/39 unittests ran successfully

Bugfixes

- Fixed deprecation warning in (`diive.core.ml.common.prediction_scores_regr`)

Page 2 of 16

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.