Overview - Transform

Machine learning models for Geoscience¶

In this tutorial, we’ll run a fairly basic random forest prospectivity analysis workflow applied to tin-tungsten (Sn-W) deposits in northeastern Tasmania.

We'll use open data sets provided by Mineral Resources Tasmania and Geoscience Australia, all of which are available to download from our public Google Drive. The roadmap for the tutorial is as follows:

Load and inspect data sets
- mineral occurrence point data sets with geopandas
- gravity, magnetic and radiometric data sets with rasterio
Combine data sets to build a labeled N_pixel, N_layers array for model training
- inspect differences between proximal vs. distal to mineralisation pixels
Train a random forest classifier and apply to all pixels, visualise results
- evaluate performance with a randomly selected testing subset
- repeat with stratified classes
Develop a checkerboard data selection procedure, train and evaluate models
- discuss effects of spatially separated testing data
Investigate occurrence holdout models with a spatially clustered approach

Instructors¶

Thomas Ostersen - Datarock Applied Science
Tom Carmichael - Datarock Applied Science

Prerequisites¶

Knowledge of Python is assumed and all coding will be done within a Jupyter notebook
We'll use numpy for data handling and matplotlib for data visualisation
Point data sets are handled with geopandas, a pandas-like library for vector GIS processing
Rasterio is used to read and write gridded raster data sets
The scikit-learn implementation of the random forest algorithm is used for all modelling
Class stratification in modelling procedures use the imbalanced-learn library

Setup¶

There are a few things you'll need to follow the tutorial:

A working Python installation (Anaconda or Miniconda)
The Geospatial ML tutorial conda environment installed
A web browser that works with Jupyter notebooks (basically anything except Internet Explorer)

To get things setup, please do the following.

Windows users: When you see "terminal" in the instructions, this means the "Anaconda Prompt" program for you.

Step 1¶

Install a Python distribution:

In this tutorial we will be using the Anaconda Python distribution along with the conda package manager. If you already have Anaconda or Miniconda installed, you can skip this step.

If not, please follow Matt Hall's video tutorial from Transform2020: youtube instructions

Step 2¶

Create the t22-mon-ml-models conda environment:

Download the environment.yml file from here (right-click and select "Save page as" or similar)
Make sure that the file is called environment.yml. Windows sometimes adds a .txt to the end, which you should remove
Open a terminal (Anaconda Prompt if you are running Windows). The following steps should be done in the terminal
Navigate to the folder that has the downloaded environment file
Create the conda environment by running conda env create --file environment.yml (this will download and install all of the packages used in the tutorial)

Step 3¶

Download the zipped data set from our public Google drive, this should look like the following screenshot

Once downloaded, unzip the data set and copy it to your working directory of choice

Step 4¶

Start JupyterLab:

Windows users: Make sure you set a default browser that is not Internet Explorer.
Activate the conda environment: conda activate t22-mon-ml-models
Start the JupyterLab server: jupyter lab
Jupyter should open in your default web browser. We'll start from here in the tutorial and create a new notebook together.

IF EVERYTHING ELSE FAILS¶

If you really can't get things to work on your computer, you can run the code online through Google Colab (you will need a Google account). A starter notebook that installs all the tutorial dependencies and downloads the tutorial data can be found here:

https://colab.research.google.com/drive/1jAW8A4hDdFn4An3I3jtVJiTxzNn08oRU?usp=sharing

To save a copy of the Colab notebook to your own account, click on the "Open in playground mode" and then "Save to Drive". You might be interested in this tutorial for an overview of Google Colab.

Data License¶

All data presented in this tutorial were derived from open data sets made available through Mineral Resources Tasmania and Geoscience Australia.

LICENCE CONDITIONS

By exporting this data you accept and comply with the terms and conditions set out below:

Creative Commons Attribution 3.0 Australia

You are free to:

Share — copy and redistribute the material in any medium or format
Adapt — remix, transform, and build upon the material for any purpose, even commercially.

Under the following terms:

Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use. “

Acknowledgments¶

This tutorial borrowed HEAVILY from Santiago Soler, Andrea Balza Morales and Agustina Pesce's superb Harmonica tutorial from Transform2021, also documented on github here: https://github.com/fatiando/transform21.

T22 - Machine learning models for geoscience

Tutorial Notebook