Setting up RStudio Desktop

Important

Choosing this option means that you have decided to install R and RStudio on your computer. Note this is not recommended for the workshop as there may be issues to troubleshoot we cannot support. If you are unsure, please use RStudio Cloud.

You must know how to debug issues if you choose to use RStudio Desktop. Please set up RStudio before the workshop so that you can debug early and focus on the content (see below).

If you already have R and RStudio installed, make sure you have the latest versions. At a minimum, you will need R v4.1 and RStudio v2022.07.2.

Install the "dataharvester" package from GitHub. You will need either the "devtools" or "remotes" package (recommended) installed to do this.

Warning

If you are a Windows user and choose to use "devtools", you may need to install Rtools first.

install.packages("remotes")
remotes::install_github("sydney-informatics-hub/dataharvester")

Don’t forget to load the package:

library(dataharvester)

The dataharvester package has a number of dependencies that you will need to install, since the code is in Python. Luckily, there is a function to install these dependencies for you. Run the following code, which will install dependencies in a conda environment called “r-reticulate”:

initialise_harvester("r-reticulate")

This function must always be run prior to using dataharvester functions. It may take a few minutes to run the first time. For subsequent sessions, initialise_harvester() should take only a few seconds to run.

Important

During this step, you may be asked to install r-miniconda, which is basically miniconda. This is a package that allows you to install and use Python packages within R through the Conda binary package manager.

If you already have Python installed, R may still ask you to install miniconda. This is because miniconda is a separate package from Python, and is not the same as the Python package manager pip. Importantly, miniconda is almost necessary for Windows users as it allows you to install certain Python and non-Python packages without having to install a full version of Python and running installation commands in the command line. These include the GDAL binary which is a translator library for raster and vector geospatial data formats, and the PROJ library which is a cartographic projection and coordinate transformation library.

Note that the above code will try to install the GDAL binary for you when you install a Python package called reticulate. If you are a Windows user, you may need to install the GDAL binary manually. You can do this by following the instructions here.

If you have a Google Earth Engine account, you can initialise the package with your credentials. This will allow you to access Google Earth Engine data.

To activate Google Earth Engine, simply include earthengine = TRUE when initialising:

initialise_harvester("r-reticulate", earthengine = TRUE)

Follow the instructions carefully, as Google Earth Engine requires you to authenticate with a Google account. You will need to copy and paste a link into your browser, and then copy and paste the resulting code back into R.

If you are authenticating from a server environment or RStudio Cloud, you may need to use a different authentication method. See the Troubleshooting section and look for the section “Google Earth Engine authentication”.

Need help?

If you are stuck at any point, we have a dedicated Troubleshooting section that you can refer to. This section will be updated as we receive more questions.

What’s next?

Now that you have dataharvester installed, you are ready to start the workshop. Workshop links are available in the sidebar. You should also check out the landing page for the R Workshop for updates.