Settings Overview

Overview and Description of Settings for the AgReFed Data-Harvester

The following documentation outlines the available settings for the Data Harvester

Table of Contents

YAML File Format

The settings are specified by the user in a .yaml settings file (see e.g., settings/settings_v0.3.yaml). A YAML file is a Unicode based language and is designed for human interaction and to work well with modern programming languages, and is typically used for configuration settings and reusable workflows. YAML uses the .yaml extension (alternatively .yml) for its files. Its syntax is independent of a specific programming language.

Templates for the .yaml settings file are provided in the folder settings. More information about YAML Syntax can be found here.

Jupyter Settings Widget

Alternatively, settings can be selected in the interactive widget of the Jupyter Notebook, which also automatically saves all settings for a run in a .yaml file as well. The interactive widgets are powered by ipywidgets and are currently supported for the Jupyter Notebooks. The widget also allows the user to load a saved .yaml file.

Note for developers: To make changes to the functionality of the widgets (e.g, extending with new settings or options), please see the script harvesterwidgets.py in the folder widgets.

Settings Validation

The settings file can be validated and checked for correct options (e.g. valid schema, data types, and data ranges) via the function validate in validate_settings.py, e.g.:

fname_settings = 'settings_harvest.yaml'
import validate_settings
validate_settings.validate(fname_settings)

Note for developers: Please update validate_settings.py and version if new data layers or options are added to the Data-Harvester.

Input and Output Settings

The input file name is specified in infile and is a .csv file that and must include at least point coordinates. The Data Harvester will download new data for these coordinates and align with any given data in the input file. Th column names for the latitude and longitude coordinates are selected by the settings colname_lat and colname_lng, respectively.

All data results and images will be saved in the output directory as specified in the settings outpath.

Example:

#Input File:
infile: ../testdata/Pointdata_Llara.csv

#Output Path:
outpath: ../../dataresults/

#Headername of Latitude in input file:
colname_lat: Lat

#Headername of Longitude in input file:
colname_lng: Long

Spatial and Temporal Settings

The spatial extent of the requested images can be given as bounding box list in the settings target_bbox, in the order: lng_min, lat_min, lng_max, lat_max (left, bottom, right, top corner of box). If no bounding box is provided, Data-Harvetser will automatically infer a padded bounding box based on the extent of the coordinates given in the input file.

The spatial resolution of the requested images is specified in target_res and given in arcsec (1 arcsec corresponds to roughly 30m on the Equator, please see arc2meter.pyfor calculating exact conversion of meter to arcsec and vice versa).

The years for the requested data is specified via target_dates and can be one specific year or a list of multiple years.

TBD: - The temporal resolution specifies the length of the time (in days) for which data is aggregated. The date range will then be subdivided in n bins = maximum year - minimum year divided by temporal resolution - Spatial buffer

Example:

#Bounding Box as (lng_min, lat_min, lng_max, lat_max):
target_bbox: ''

#Select start date:
date_min: : 2023-01-10

#Select end date:
date_min: : 2023-01-01

#Spatial Resolution [in arcsec]:
target_res: 6.0

#Temporal buffer window (in days)
temp_buffer: 2

# Number of time interval slices in given date range
temp_intervals: 4

Data Selection Settings

The requested layers are specified in the settings target_sources. The following data sources are currently supported:

Satellite data from Digital Earth Australia:

These are pre-processed and national calibrated satellite image layers provided Digital Earth Australia (DEA) Geoscience Earth Observations. Multiple layers can be given as list in the settings. For more details see Data Overview DEA.

Digital Elevation Model (DEM):

The DEM data is given by the National Digital Elevation Model 1 Second Hydrologically Enforced. Options are: ‘DEM’, ‘Slope’, and ‘Aspect’. For more info see Data Overview DEM.

Landscape from SLGA

Landscape data can be retrieved from SLGA. For an overview of all available layers see Data Overview Landscape.

Radiometric

For an overview of the radiometric layer options see Data Overview Radiometric.

SILO Climate Database

SILO is containing continuous daily climate data for Australia. An overview of the available data layers is provided in Data Overview SILO.

For each requested SILO data layer, at least one temporal aggregation method has to be provided, which will be applied to aggregate climate data over the specified temporal range. The following aptions are available: ‘mean’, ‘median’, ‘sum’, ‘std’, ‘perc95’, ‘perc5’, ‘max’, ‘min’

Soil data from SLGA

An overview of the soil attributes is given in in Data Overview SLGA.

Each soil attribute has six depth layers (plus their upper and lower confidence limits), with the following options:‘0-5cm’, ‘5-15cm’, ‘15-30cm’, ‘30-60cm’, ‘60-100cm’ and ‘100-200cm’.

Google Earth Engine Data

An overview of the available Google Earth Engine (GEE) data and options is provided in Data Overview GEE

Example:

target_sources:
  #Satellite data from Digital Earth Australia
  DEA:
  - landsat_barest_earth

  #National Digital Elevation Model (DEM) 1 Second
  DEM:
  - DEM
  
  #Landscape Data 
  Landscape:
  - Slope
  - Aspect
  - Relief_300m

  #Radiometric Data
  Radiometric:
  - radmap2019_grid_dose_terr_awags_rad_2019
  - radmap2019_grid_dose_terr_filtered_awags_rad_2019

  # SILO Climate Data
  # temporal aggregation options: 'mean', 'median', 'sum', 'std', 'perc95', 'perc5', 'max', 'min'
  SILO:
    max_temp:
    - Median
    min_temp:
    - Median
    monthly_rain:
    - Total

  #Soil data from SLGA
  SLGA:
   Bulk_Density:
    - 0-5cm
   Clay:
    - 0-5cm

  #Satellite data layers from Google Earth Engine
  GEE: 
    preprocess:

      ### collection as defined in the Earth Engine Catalog
      collection: LANDSAT/LC09/C02/T1_L2

      #### circular buffer in metres (optional)
      buffer: null

      #### convert buffer into square bounding box instead (optional)
      bound: null

      #### cloud masking option
      mask_clouds: True

      #### Set probability for mask cloud (between 0 to 1), optional
      mask_probability: null

      #### composite image based on summary stat provided
      reduce: median

      #### spectral indices to calculate via Awesome Spectral Indices site
      spectral:
        - NDVI
        - NDWI

    download:
      bands: 
        - NDVI
        - SR_B2
        - SR_B3
        - SR_B4
      scale: 100   # in metres
      format: tif  # available: tif, png