Session 1 – Running a harvest with a settings.yaml file


The Geoata Harvester (formerly Data Harvester) enables researchers with reusable workflows for automatic data extraction from a range of data sources. User provided data is auto-completed with a suitable set of spatial- and temporal-aligned covariates as a ready-made dataset for machine learning and agriculture models. In addition, all requested data layer maps are automatically extracted and aligned for a specific region and time period.

This session introduces you to the geodata_harvester package by showing you how to download files with minimal code input, which is achieved by running the harvest.run() function.

Let’s test this out.

Firstly we needs the data for today’s workshop. Let’s use a git command to put it onto our server:

!git clone https://github.com/Sydney-Informatics-Hub/AgReFed-Workshop && cd AgReFed-Workshop && git checkout data

Next, we will have to install the geodata-harvester package on the hosted platform. Run the following code block:

!conda install -c conda-forge geodata-harvester --yes

Now we can simply import the package and execute the harvest.run function:

import geodata_harvester as gh

df = gh.harvest.run("AgReFed-Workshop/data/settings_session1.yaml", preview=True, return_df=True)
Starting the data harvester -----
ℹ Found the following 5 sources: ['DEM', 'Landscape', 'Radiometric', 'SILO', 'SLGA']

Downloading from API sources -----

⌛ Downloading DEM data...
⊙ Retrieving coverage from WCS server 0.7s                                                                     
⊙ Downloading DEM_SRTM_1_Second_Hydro_Enforced_2023_01_31.tif 0.7s                                             

⌛ Downloading Landscape data...
⊙ Downloading Landscape_Slope.tif 3.0s                                                                         
⊙ Downloading Landscape_Aspect.tif 8.8s                                                                        
⊙ Downloading Landscape_Relief_300m.tif 1.0s                                                                   

⌛ Downloading Radiometric data...
⊙ Downloading radmap2019_grid_dose_terr_awags_rad_2019 1.1s                                                    
⊙ Downloading radmap2019_grid_dose_terr_filtered_awags_rad_2019 0.9s                                           

⌛ Downloading SILO data...
⊙ Downloading daily_rain for 2022 39.0s                                                                        
⊙ Downloading max_temp for 2022 41.5s                                                                          
⊙ Downloading min_temp for 2022 32.9s                                                                          
⊙ Downloading monthly_rain for 2022 1.1s                                                                       
Layer name processing not valid. Choose from: dict_keys(['daily_rain', 'monthly_rain', 'max_temp', 'min_temp', 'vp', 'vp_deficit', 'evap_pan', 'evap_syn', 'evap_comb', 'evap_morton_lake', 'radiation', 'rh_tmax', 'rh_tmin', 'et_short_crop', 'et_tall_crop', 'et_morton_actual', 'et_morton_potential', 'et_morton_wet', 'mslp'])
Error: Number of filenames does not match number of layernames. Dataframe not updated.

⌛ Downloading SLGA data...
⊙ Downloading SLGA_Bulk_Density_0-5cm.tif 0.5s                                                                 
⊙ Downloading SLGA_Bulk_Density_0-5cm_5percentile.tif 5.9s                                                     
⊙ Downloading SLGA_Bulk_Density_0-5cm_95percentile.tif 0.6s                                                    
⊙ Downloading SLGA_Clay_0-5cm.tif 10.4s                                                                        
⊙ Downloading SLGA_Clay_0-5cm_5percentile.tif 5.9s                                                             
⊙ Downloading SLGA_Clay_0-5cm_95percentile.tif 0.3s                                                            

Extracting data points for example-site_llara.csv  -----
⊙ • DEM_SRTM_1_Second_Hydro_Enforced_2023_01_31 | pixel size: (78, 108) 0.0s                                   
⊙ • Landscape_Slope | pixel size: (78, 108) 0.0s                                                               
⊙ • Landscape_Aspect | pixel size: (78, 108) 0.0s                                                              
⊙ • Landscape_Relief_300m | pixel size: (78, 108) 0.0s                                                         
⊙ • radiometric_radmap2019_grid_dose_terr_awags_rad_2019 | pixel size: (466, 647) 0.0s                         
⊙ • radiometric_radmap2019_grid_dose_terr_filtered_awags_rad_2019 | pixel size: (466, 647) 0.0s                
⊙ • SLGA_Bulk_Density_0-5cm | pixel size: (156, 216) 0.0s                                                      
⊙ • SLGA_Clay_0-5cm | pixel size: (156, 216) 0.0s                                                              
✔ Data points extracted to output/results.gpkg


🎉 🎉 🎉 Harvest complete 🎉 🎉 🎉

Done!

Congratulations you have just downloaded data from several sources and defined a consistent dataset over an area of interest ready for further analysis!

Now let’s look at what we did in the Next Session.