import geodata_harvester as gh
Session 2 - Modifying a settings.yaml file for a harvest
The settings file
The Data-Harvester uses a YAML configuration file to perform bulk downloads in a single command.
Basic usage
Below is the YAML file for basic_config.yaml
which can be found in the AgReFed-Workshop/data
folder on the Jupyter Hub Server:
outpath: results_basic/
target_bbox: [149.769345, -30.335861, 149.949173, -30.206271]
target_res: 6.0
date_min : 2022-10-01
date_max : 2022-10-30
target_sources:
DEA:
- landsat_barest_earth
To run the configuration file, we can use the harvest.run()
function. The function takes a single argument, path_to_config
, which is the path to the YAML file. The path can be either relative or absolute.
Making sure that your path is correct (and it should be, since you are on the Jupyter Lab server), run the code below:
"AgReFed-Workshop/data/basic_config.yaml") gh.harvest.run(
Starting the data harvester -----
βΉ Found the following 1 sources: ['DEA']
Downloading from API sources -----
β Downloading DEA data...
β landsat_barest_earth.tif already exists, skipping download
π π π Harvest complete π π π
If you run the same config again, the function recognises that a file has been downloaded and will not re-download it. This is useful if you want to re-run the configuration file to add more data sources β we will see this in action later.
Adding multiple data sources
Additional data sources can be added to the configuration file by adding a new key to the target_sources
list. Below we have added sources from DEM, Landscape, SILO and SLGA collections. Copy the new lines and add them to basic_config.yaml
:
outpath: results_basic/
target_bbox: [149.769345, -30.335861, 149.949173, -30.206271]
target_res: 6.0
date_min : 2022-10-01
date_max : 2022-11-30
target_sources:
DEA:
- landsat_barest_earth
# add the new lines below --------------------
DEM: [DEM]
Landscape: [Relief_300m]
SILO:
monthly_rain: [sum]
SLGA:
Bulk_Density: [0-5cm]
Clay: [0-5cm]
Letβs run the above configuration file and preview the output data. You should already have the DEA
data downloaded, so the function will only download the new data sources. This time, the argument preview = True
will be added to the harvest.run()
function, which will allow us to preview the data that will be downloaded.
gh.harvest.run("AgReFed-Workshop/data/basic_config.yaml",
= "multi_config",
log_name = True
preview )
Starting the data harvester -----
βΉ Found the following 5 sources: ['DEA', 'DEM', 'Landscape', 'SILO', 'SLGA']
Downloading from API sources -----
β Downloading DEA data...
β landsat_barest_earth.tif already exists, skipping download
β Downloading DEM data...
β Retrieving coverage from WCS server 3.5s
β DEM_SRTM_1_Second_Hydro_Enforced_2023_01_20.tif already exists, skipping download
β Downloading Landscape data...
β Landscape_Relief_300m.tif already exists, skipping download
β Downloading SILO data...
β monthly_rain for 2022 already exists, skipping download
β Downloading SLGA data...
β SLGA_Bulk_Density_0-5cm.tif already exists, skipping download
β SLGA_Bulk_Density_0-5cm_5percentile.tif already exists, skipping download
β SLGA_Bulk_Density_0-5cm_95percentile.tif already exists, skipping download
β SLGA_Clay_0-5cm.tif already exists, skipping download
β SLGA_Clay_0-5cm_5percentile.tif already exists, skipping download
β SLGA_Clay_0-5cm_95percentile.tif already exists, skipping download
π π π Harvest complete π π π
You should see a figure similar to the one above. Explore the results_basic
folder that has been generated. Can you see the log file in there too?
Google Earth Engine
Initialise GEE
The first step is to make sure you have a Google Earth Engine account. If you had not already set one up with your Google account, please follow these instructions. In the next step we must do is authorise the geodata_harvester to use the Google Earth Engine API. This is a labourious step, but if you follow the prompts on screen 1-by-1 the process is okay. See a preview of the process here. Then run the following code to do it for yourself:
from eeharvest import harvester
='notebook') harvester.initialise(auth_mode
β Initialising Earth Engine... 3.6s
β Earth Engine authenticated
NOTE: You only have to perform this authorisation ONCE. Or at least you only have to do it once per βconnectionβ or if you use an icognito window.
Now back to the contentβ¦
GEE YAML
We will continue to use the same configuration file, basic_config.yaml
, to download data from Google Earth Engine. Below, we add a new key to the target_sources
list, called GEE
. Here is a preview for how the YAML configuration file can be used download data from GEE.
outpath: results_basic/
target_bbox: [149.769345, -30.335861, 149.949173, -30.206271]
target_res: 6.0
date_min : 2022-10-01
date_max : 2022-10-30
target_sources:
DEA:
- landsat_barest_earth
DEM: [DEM]
Landscape: [Relief_300m]
SILO:
monthly_rain: [sum]
SLGA:
Bulk_Density: [0-5cm]
Clay: [0-5cm]
# add the new lines below --------------------
GEE:
preprocess:
collection: LANDSAT/LC09/C02/T1_L2
mask_clouds: True
reduce: median
spectral: NDVI
download:
bands: NDVI
The configuration will download Landsat 9 images for the year 2021, composite them all to a single image using the median
reducer, and calculate the NDVI
spectral index. The output image will be downloaded as a GeoTIFF file, with the NDVI
spectral index and the RGB bands (SR_B2
, SR_B3
, SR_B4
) includein the raster image.
Run the harvest as before:
gh.harvest.run("AgReFed-Workshop/data/basic_config.yaml",
= "multi_config",
log_name = True
preview )
Exercise 1: Adding even more sources
Exercise 2: GEE
Wrapping up
In this session we have covered setup and using the YAML configuration file to download from multiple API sources. In the next session, we will cover how to use individual download functions to create custom workflows for downloading data.
See you there!