Geodata-harvester

Latest version: v1.1.1

Safety actively analyzes 623567 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

1.1.0

We are excited to announce the new release of Geodata-Harvester v1.1.0. This update brings substantial improvements in documentation and notebooks, and bug fixes, largely inspired by the constructive feedback and suggestions from our users and the Journal of Open Source Software (JOSS) reviewers.

---

Code Improvements

- **Slope and Aspect-Ratio computation:** removal of gdal dependency for terrain calculations in getdata_dem.py and added new efficient Python functions for calculating slope and aspect-ratio instead of relying on gdal wrappers. (commit 5f9901e46f4b8e0d974ffa183d217641cf7f405f)
- **Update of test functions:** updated test functions and reduced download sizes to speed-up automated test (commits: 74e86796960ca5a6159bc8eac6a3b26ec730426a, f2092a5ccd81ffacfa323557806c30930d492814, 6fc40847429b7580c8d01be936979a1e9a1a8e36, 07bc8e5c81fb6d71799a6781d6a812399ae66bb7).

Notebook Updates

Our notebooks have been updated with more examples demonstrating the capabilities of the package. The existing notebooks have also been refined for better clarity and user-friendliness. The following key changes have been made:

- **Notebook w/o Google Earth Engine authentication:** removed Google Earth Engine (GEE) authorisation for basic notebook example settings_harvest.yaml (commit c2fc3e2f2bffe0db6ccbd4e86dafc8730d26582a).
- **Notebook with Google Earth Engine demonstration:** new notebook for demonstrating GEE including authentication check in settings_harvest_withGEE.ipynb (commit b36abafe451a947ceb576fbbe1bf0cd25f14a91d).
- **Step-by-step notebook:** new notebook with step-by-step walk through the processes and all download modules (commit cacd041c04cb858eff464daae9c7c1751d1dee24). This notebook also showcases GEE initialisation and processing
- **Include settings display:** add functionality to display the settings file within the notebook (commit 5c721a860f7d89dfdb35a4c738a7cd5e006ff776).
- **Improved auto run function documentation:** added documentation for harvest.run function amd process steps in notebooks (commit c2fc3e2f2bffe0db6ccbd4e86dafc8730d26582a]).
- **Streamlined settings files:** updated settings yaml files to provide less download-heavy datasets, simplify settings, and add more documentation to settings.
- **Download-time warnings** added disclaimer for data downloading parts if longer download times are expected.
(d9feb0cd0f7318474f9f562369c92d4737bd8bb9).

Documentation Updates

The documentation has been reworked and expanded:

- **Updated README:** key features are now linked to feature code snippets; added instructions for running test scripts.
- **Updated API docs** updated API reference reflecting the changes in the package's methods and classes have been added (commit 537859ffcdcea0c9640f4850720eadd47022da46).
- **Updated Paper** expanded on the Statement of Need and added comparison to other packages (commit 6b122bd6883d8a6c0db5e564208f10a14053a787).

Updated Jupyter Widgets

The documentation and performance of the existing widgets have been improved with the following changes:

- **Improved documentation** added description in setting widgets and updated notebook for widget demonstration (commit 206df453e6685a09e80785d11c5f45c690596268).
- **Bounding box calculation for empty widget setting**added automatic inference of bounding box if empty (commit 57021ceb2ba0aabbf757efaec284a8f07c47de8c and 8fdb48983553260d38c7d899bc90bd2b415bd015).

Bug Fixes

Several bugs related to the data collection process and the Jupyter widgets have been fixed. These fixes aim at ensuring a smoother and more accurate data collection and visualization experience.

- Fixed settings.temp_intervals in widget created settings.temp_intervals (commit 57021ceb2ba0aabbf757efaec284a8f07c47de8c).
- Fixed utils.aggregate_rasters function (commit fa346bd36ff724087e007281d6b90a164308a99e).
- Fixed bug if no GEE image found for time interval (commit a096c657b8d194461b043cc0115340a2e1d26da6).

Installation & Issue Reports
You can find the source code and installation instructions for Geodata-Harvester v1.1.0 on our [GitHub repository](https://github.com/Sydney-Informatics-Hub/geodata-harvester).

For any questions or issues, please refer to our [issue tracker](https://github.com/Sydney-Informatics-Hub/geodata-harvester/issues).

Happy data harvesting!

1.0.0

We are excited to announce the new release of Geodata-Harvester v1.0.0! This release brings several new features, performance improvements, and bug fixes that enhance the overall experience of using the Geodata-Harvester.

🌟 New Features

- **Time-Series Extraction**: The long-awaited extraction of time-series data is now available and integrated in the auto harvest.run function! This enhancement provides users with the ability to process image collections for multiple time intervals and to automatically extract temporal aggregated data, including for climate data (SILO), Digital Earth Australia (DEA) post-processed satellite data, and Google Earth Engine data sources.
- **Multi-band raster data queries**: With this feature, multi-band data can now be extract from raster images into data tables, including automatic generation and labeling of data columns for each band or channel per image.
- **FAQ Chatbot**: The new FAQ chatbot on the [Github page](https://sydney-informatics-hub.github.io/geodata-harvester/#what-is-it) provides users with a quick and easy way to access the Geodata-Harvester documentation. The FAQbot leverages OpenAI's GPT and uses vector embeddings of the Geodata-Harvester reference material and code documentation.

🚀 Performance Improvements

- **Raster query optimisation**: Performance gains were achieved by leveraging (rio)xarray for data extraction from raster images, resulting in faster execution times for all users.
- **Temporal processing optimisation**: The temporal aggregation process for stats extraction has been optimized to reduce the time required to extract temporal aggregated data from large image collections.

🐛 Main Bug Fixes

- **Fix missing data values for aggregation**: Resolved an issue where temporal processing aggregates over missing data values, which needed to be identified from image header via ase-insensitive search for nodata value names in header and replaced with nan values. This fix ensures that missing data values in images are not corrupting aggregated stats.
- **Fix name objects**: Fixed naming conventions in image labeling and data table generation to ensure that all objects are named correctly and consistently.
- **Fix data table generation**: Fixed an issue where data tables were not generated correctly for multi-band raster data queries.
- **Fix datetime labeling**: Fixed an issue where extracted image dates are not added to metadata in images and labels.
- **Fix potential issue in geopackage writing**: Fixed the potential issue of duplicate column names in geopackage writing, in case of identical named pre-existing data in result folder.
- **Cleanup results csv file**: Reordered columns in CSV and removed geometry column from csv since Lat, Lng columns already exist.
- **Fix xarray2tif function due to rioxarray upgrade**: Fixed an issue in xarray2tif where the rioxarray upgrade (from version '0.13.1' to '0.13.3') was not working anymore for writing multi-channel xarray data as geotiff.


📚 Documentation and Notebooks

- Added new notebooks to demonstrate the new temporal processing features
- Updated documentation to reflect the new features
- Added [Geodata-Harvester summary paper](https://github.com/Sydney-Informatics-Hub/geodata-harvester/blob/main/paper/paper.md)
- Added contribution guidelines


📦 Download & Installation

You can find the source code and installation instructions for Geodata-Harvester v1.0.0 on our [GitHub repository](https://github.com/Sydney-Informatics-Hub/geodata-harvester).

For any questions or issues, please refer to our [issue tracker](https://github.com/Sydney-Informatics-Hub/geodata-harvester/issues).

Happy coding! 🎉

0.2.2

Description of work or change:

- add support for multiple collections and reduce options for GEE in settings
- add compatibility with latest eeharvest dependency updates
- update docs (README, settings, docstrings)
- add API code reference docs
- fix notebooks with updated settings
- Clean up repo
- add JOSS paper

0.2.0

First Geodata-Harvester release
----------------------------------

This release of the Geodata-Harvester package includes pip/conda ready installation packages and a range of example workflows for automatic data extraction from a wide range of geodata sources including support for Google Earth Engine data layers.

The following data sources are currently integrated:

- Soil and Landscape Grid of Australia (SLGA)
- SILO Climate Database (Australia)
- National Digital Elevation Model (DEM)
- Digital Earth Australia (DEA) Geoscience Earth Observations
- Radiometric Data (Australia)
- Google Earth Engine Data (account needed)

Core features currently supported in the Geodata-Harvester:

- automatic data retrieval from geospatial APIs for given locations and dates
- data experimentation frontends via Jupyter and R notebooks
- enables reusable workflows via YAML files to save/load settings.
- interactive Jupyter notebook widgets for selecting settings options
- automatic geospatial-temporal processing
- support for multiple temporal aggregation options
- automatic extraction of retrieved data into aligned maps and ready-made dataframes for ML
- preview of data map layers

Main contributors:
sebhaan
natbutter
januarharianto

Links

Releases

Has known vulnerabilities

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.