Geodata-Harvester

Jumpstart your analysis with a ready-made set of spatial-temporal aligned raster maps and dataframes.

Geodata-Harvester logo

What is it?

The Geodata-Harvester project enables researchers with reusable workflows and provides open-source software for automatic data extraction from a wide range of data sources including spatial-temporal processing. User provided data is auto-completed with a suitable set of spatial- and temporal-aligned covariates as a ready-made dataset for machine learning models. All data layer maps are automatically extracted and aligned for a specific region and time period.

Data Sources

The following data sources are currently integrated:

  • Soil and Landscape Grid of Australia (SLGA)
  • SILO Climate Database (Australia)
  • National Digital Elevation Model (DEM)
  • Digital Earth Australia (DEA) Geoscience Earth Observations
  • Radiometric Data (Australia)
  • Google Earth Engine Data (account needed)

Functionality

The main goal of the Data Harvester is to enable researchers with reusable workflows for automatic data extraction and processing:

  1. Retrieve: automatically access geospatial and soil data sources, minimal handling of individual APIs
  2. Process: Spatial and temporal processing, filter, mask, reduce and convert data
  3. Output: download data as GeoTIFF and ready-made data frames for use in additional modelling and machine learning workflows

Data-Harvester is designed as a modular and maintainable project in the form of a multi-stage pipeline by providing explicit boundaries among tasks. To encourage interaction and experimentation with the pipeline, we provide multiple frontend notebooks and use case scenarios as Jupyter and R notebooks, as well as standalone Python and R packages. The core features are:

  • automatic data retrieval from geospatial APIs for given locations and dates
  • data experimentation frontends via Jupyter and R notebooks
  • enables reusable workflows via interactive widgets and YAML files to save/load settings.
  • automatic geospatial-temporal processing
  • support for multiple temporal aggregation options
  • automatic extraction of retrieved data into ready-made dataframes for ML training
  • automatic generation of ready-made aligned maps and dataframes for ML prediction models
  • preview of data map layers

If you would like to learn more about the Geodata-Harvester, please visit our Workshop webpage.

Programming Languages supported