Skip to content

A short tutorial for creating a species distribution model using QGIS, R, and MaxEnt.

Notifications You must be signed in to change notification settings

DimEvil/tutorial-qgis-maxent

 
 

Repository files navigation

output
pdf_document html_document
default
default

A Short Species Distribution Modeling Tutorial

This repository is a fork from (@dondealban) and contains a short tutorial for creating a species distribution model using QGIS, R, and MaxEnt. The original tutorial was prepared for the skills training sessions during the lab retreat of the Applied Plant Ecology Lab, Department of Biological Sciences, National University of Singapore held on 25-28 September 2017 in Malacca, Malaysia. This version is an adapted version for the QGIS, R and Maxent part of the data use workshop organized by the Belgian Biodiversity Platform.

Table of Contents

Files

All the files used and processed in this tutorial can be downloaded here, just in case an exercise is not working. We suggest you try for yourself first.

Scripts

R script Papilio

R script Maxent_example

Introduction

The presentation can be accessed here

Workflow

workflow

Download and Installation

Software

For this tutorial, download and install QGIS, Rstudio, OpenRefine and MaxEnt, all of which are free and open-source software. For QGIS and R, download the versions compatible with your machine's operating system. MaxEnt is a Java-based application and runs using various operating systems. The procedures shown in this tutorial uses a Windows platform but it should be applicable to other operating systems.

Data

There are two types of datasets required in MaxEnt; the species occurrence records and environmental covariates. Occurrence records are geographic points (i.e. coordinates) of species observation while environmental covariates are set of data that contains continuous or categorical values such as temperature, precipitation and land cover (for details see Pearson, 2007). To perform the modeling in MaxEnt, species occurrence should be in comma separated values (CSV) and covariates should be in raster Arc/Info ASCII Grid format.

  1. Species occurrence data. The species occurrence records are the geographic point locations or coordinates of species observations. For this exercise, we will use data downloaded from GBIF. For this tutorial we used this dataset data (249 KB, CSV file) Papilio machaon occurrences in Flanders, Belgium. The dataset only holds occurrences between 1960 and 1990, because of the WorldClim data available.

    • create a folder for csv's and place the occurrence data there. Name your folder [csv]
    • in GBIF, query for species: Papilio machaon
    • in GBIF, query for country or area: Belgium
    • in GBIF, query for year between 1960 and 1990
    • download and name your file Papilio_machaon

You can also find the file here.

  1. Environmental predictors. The environmental covariates consist of raster data that contain either continuous or categorical values such as precipitation, temperature, elevation, etc. We will be using the WorldClim raster datasets. WorldClim is a set of gridded global climate data layers, which can be used for mapping and ecological modeling. For this exercise, we will use WorldClim v.1.4 Current conditions (or interpolations of observed data from 1960-1990). We will need the highest resolution data available provided at 30 arc-seconds (~1 km); You can read Hijmans et al. (2005) for more information about the climate data layers. The WorldClim 0.5 (Bio16_zip) dataset for Europe can be downloaded here for present data. You will need to convert the Bio16 data from .bil format to .tif format. You can do this by opening the .bil file in Qgis and save the fila as a .tif

One variable, used frequently in niche modeling is the altitude. The altitude is not included in the WorldClim dataset. We can use the GTOPO30 dataset. It will provide us the altitude information on a similar resolution. The GTOP30 data can be downloaded here. You should go to earthExplorer. You have to create a login and download the data.

  • create folder for your raster data.Name your folder [rasters]
  • next, we will unzip the bio_16.zip file in the appropriate folder.
  • download the topodata from earthExplorer or alternatively find it here:

data-prep

To prepare the datasets, we will also need boundary data. We can use the administrative boundary vector data from the Global Administrative Database. On GADM's Download page, select "Belgium" and "Shapefile" from the Country and Format drop-down menus, respectively, and click the download link provided (~22 MB, ZIP file).

Because most of the Papilio occurrences are situated within Flanders (We can only use the occurrences we have), we will use the administrative boundary of flanders in this exercise. Off course, you can also create your own boundary(trainingRegion).

For this exercise we will use this file

-download the FlandersWGS84 file in your raster folder

Now you can draw easily a polygon which you can use for creating the training area for niche modeling. Your training area shape should contain all the occurrences you need to use for your niche modeling. You need to save this polygon as a .shp file so you can clip the environmental raster data correctly.

Study Area

Belgium Belgium (/ˈbɛldʒəm/ (About this sound listen) BEL-jəm),[A] officially the Kingdom of Belgium, is a country in Western Europe bordered by France, the Netherlands, Germany and Luxembourg. A small and densely populated country, it covers an area of 30,528 square kilometres (11,787 sq mi) and has a population of more than 11 million. Straddling the cultural boundary between Germanic and Latin Europe, Belgium is home to two main linguistic groups: the Dutch-speaking, mostly Flemish community, which constitutes about 59 percent of the population, and the French-speaking, mostly Walloon population, which comprises about 40 percent of all Belgians. Additionally, there is a small ~1 percent group of German speakers who live in the East Cantons.

Flanders Flanders (Dutch: Vlaanderen [ˈvlaːndərə(n)] (About this sound listen), French: Flandre [flɑ̃dʁ], German: Flandern [ˈflandɐn]) is the Dutch-speaking northern portion of Belgium, although there are several overlapping definitions, including ones related to culture, language, politics and history. It is one of the communities, regions and language areas of Belgium. The demonym associated with Flanders is Fleming, while the corresponding adjective is Flemish. The official capital of Flanders is Brussels,[1] although the Brussels Capital Region has an independent regional government, and the government of Flanders only oversees the community aspects of Flanders life there such as (Flemish) culture and education.

study-area note The study area should never be dependent on country borders. The study area needs to contain validated occurrences of the species modeled.

Prepare Datasets

The environmental layers we need to download, do have a very large extent (Europe). For the niche modeling, we can lower the size of our files, by clipping the raster files with the polygon. This means that we will only use the environmental data for the polygons we want. We propose 2 options, or we can create the extent of our Flanders shapefile (faster, but less relevant). This will result in a bounding box we can use to clip the raster files. Or we can simply draw our training region. This should be based on the available record and knowledge of the species. We recommend you to draw your own training area when doing niche modeling.

Preparing the Study area

Option 1: Draw your trainingArea polygon in Qgis to clip raster files (recommended)

  • open your (cleaned) occurrences layes in Qgis
  • open your WGS84 Flanders shapefile in Qgis (as reference)
  • in Layers, choose Create "New shapefile"" layer
  • in the resulting window click the button for "polygon", click ok and name your polygon 'Species_name'_TrainingRegionFlanders

CreatePolygon

  • select your new layer (is highlighted) in the panels section
  • hit the Edit button
  • choose "Create new polygon" button
  • draw your Training Area

Every time you left click on the map, a dot will be created. QGIS will draw lines to connect these dots to create a polygon. If you mis-click, do not worry - you will be able to move points to the appropriate place.

  • right click anywhere on the map, leave inout an ID blank and hit OK
  • click on the "node" tool to edit your training layer (if needed)
  • save layer as shapefile

CreatePolygon

Option 2: Create Flanders bouding box in Qgis to clip Raster files (faster)

  1. First, we will create subsets from the environmental rasters to focus our modeling over our study area. To do this, we will create a polygon shapefile containing the extent of the study area and use this shapefile to clip all the raster map layers. Follow these steps using QGIS:
  • in Qgis, load the vlaanderen_wgs84.shp shapefile by adding a vector layer from the Layer > Add Layer > Add Vector Layer... menu. This displays the municipal-level administrative boundaries. Make sure the shapefile you will use has the correct SRS. In most cases WGS84.

data-prep1

Next, we will create a polygon from the extent of the Flanders shapefile that we have just saved.

  • go to Vector > Research Tools > Polygon from Layer Extent menu.
  • under the Input Layer drop-down menu, select the newly created vlaanderen_wgs84.shp shapefile.
  • under the Extent input line, select Save to File from the menu to save the file in your working directory. Then, click Run to create another shapefile called box1.shp, which consists of a polygon covering the extent of the study area.

data-prep4

We have created a bounding box of Flanders, this box can be used as the polygon to clip the raster files.

data-prepbox

Clipping RasterFiles (environmental data) with trainingRegion or bounding box

data-prep5

  • next, go to Processing > Toolbox menu, which opens the Processing Toolbox panel.
  • search for the Clip raster with polygon function under the SAGA geoalgorithms and select this function. This will open the Clip Raster with Polygon dialog box.
  • Under the Input drop-down menu, click ... and navigate through your working directory and select one of the raster layers, say biol1_210.tif.
  • under the Polygons input line, select the box1.shp shapefile (or the training region shapefile) from the drop-down menu. The rasters will be clipped using the extent of this polygon. Alternatively, choose the training region you created.

You can also find the processed files here:

  • box1 (Don't forget to unzip)

  • TrainingArea(don't forget to unzip)

    • under the Clipped input line, click ... and select Save to File from the menu to save the file in your working directory using the same file name, the output file type is .tif . Later we need to convert this file to *.ASC as this is the file type requirement used by MaxEnt. Then, click Run to generate the clipped raster file, biol1_210.tif.

    Repeat this for all other raster layers by following the same process.

Converting the Tif files into *.asc files

Maxent is very picky when it comes to the format of the files we can use. for the raster data, all should have exact the same resolution, bounding box and should be in the *.asc format. For the worldclim data this is no problem. If you like to use the altitude in your model. You need to give the topology data some extra tweaks. This is because of all the conversions and rounding errors.

First, let's convert the clipped raster data, which is in .tif format to the *.asc format.

  • in Qgis, go to raster --> conversion --> translate
  • choose the tif you want to convert in "input layer"
  • select outputlayer, name your file and choose .ASC as the extension
  • make sure your target SRS is EPSG:4326
  • select OK

conversion

conversion3

If you want to use the altitude.asc file in your modeling, you need to open the translated altitude.asc file in a text editor notepad++ and make sure the bounding box is similar with the bounding box of the other climate variables.

conversion4

#Extract the occurrence points of the species we are interested in modeling.

For niche modeling in Maxent we need species occurrence records. They should be in csv format and the separator should be ",". Preparing this file can be done in many tools. We woudl like to recommend R , Openrefine or LibreOffice Using Ms Excel might cause problems as the encoding of the .csv files by excel is not always accepted by Maxent.

  1. We need to be sure that all the occurrences we will use for our Niche modeling are within the training area and there are no duplicates in the file.

First, we will clean our occurrencs in Qgis.

  • open Qgis
  • load needed layers
    • shapefile (shp)
    • occurrences (csv)

cleanCSV

  • save your .csv as a shapefile

    • right click on the layer
    • choose save as ESRI-shape file
    • choose the right folder
    • save as desired_name_shape
    • click add layer to map
  • remove occurrences from a shapefile - click the Edit button - click select objects with one click - select the erroneous point - delete the point (del) - save your shapefile as a .csv (This is the file we will use for further processing)

cleanCSV cleanCSV

Further preparing your dataet in R

  • Inspect the threatened species using tools like R or Openrefine. For this exercise, let us model the distributions of Papilio_machaon that were observed in Flanders.

  • To select the species from the CSV file, you can use this R script.

R script Papilio

  • open R studio

  • start a new project in your maxent-Qgis directory

  • choose new R markdown file

  • copy paste the script in this file

  • make sure your folder structure is correct -[NichemodelingWorkshop] [csv] -- [processed] | [rasters] | [script] | [Qgis] | [shapefiles]

  • run the script

    • the script basically reads the file you downloaded from GBIF
    • filters the data from 1960 to 1990 (this was already done in your GBIF download, it's an extra measure)
    • removes duplicate values
    • selects the columns needed for Maxent (scientificName, decimalLatitude and decimalLongitude)
    • and writes a csv file in a folder named processed

Further preparing your data in Openrefine

In Openrefine

  • start new project
  • choose file downloaded with GBIF
  • remove unneeded columns (All except scientificName, decimalLatitude, decimalLongitude) -in Column All, choose Re-order/Remove Columns -or remove every column manually
  • clean scientificName (only genus an speciesname needed)
  • Remove duplicates
    • sort
    • choose reorder row permanently
    • choose edit cells on decimalLatitude --> blank down
      • remove blanked out cells by text facetting on decimalLatitude
      • in column All -> edit rown --> remove all matching rows
  • export data as csv file

removeColumns

Further preparing your data in a worksheet

  • Clean your dataset - remove all occurrences which do not pertain to Flanders - remove all suspicious occurrences (occurrences in the ocean...) - check your desired occurrence precission - keep only scientificName, decimalLatitude & decimalLongitude

You can find a processed version of the occurrences file here

  1. We are almost ready to create our first species distribution model. But before we do that, load all of the clipped environmental rasters and the species occurrence file in QGIS:
  • Load the clipped environmental raster layers by adding them from the Layer > Add Layer > Add Raster Layer... menu. Remember that these are the .ASC files. data-prep6

    data-prep6

  • Load species occurrence data CSV file by adding it from the Layer > Add Layer > Add Delimited Text Layer... menu. data-prep7

    • In the Create a Layer from a Delimited Text File dialog box, select the CSV file by navigating to the file in your working directory. Once it is opened, the species records and their coordinates will be shown in the lower part of the dialog box. In the X field and Y field drop-down menus, select 'Long' and 'Lat' columns, respectively.

Model Species Distributions

We are now ready to create our first species distribution model using MaxEnt.

  1. Open MaxEnt and load the occurrences and Environmental Layers by navigating to the respective directories of those files. Ensure that the tick boxes of all files are checked, and that the Environmental Layers files are all 'Continuous' types.

maxent1

  1. Also, in the main MaxEnt window, check tick boxes or select the following options:

    • Linear/Quadratic/Product/Threshold/Hinge features
    • Create response curves
    • Make pictures of predictions
    • Do jackknife to measure variable importance
    • Output format: Logistic
    • Output file type: asc

Leave the other advanced settings in their default for now. Then, click Run and wait for the processing to finish.

  1. Once the MaxEnt software completes its data processing:

    • Load the resulting ASC file in QGIS from the Layer > Add Layer > Add Raster Layer... menu. Then, change the styling of the raster layer by going to Layer > Properties... menu, or double-clicking on the layer under the Layers Panel. Change the styling of the raster layer as shown on the image below.

    The styling of the raster layer has been changed similar to the image below, which shows the logistic output of the MaxEnt's species distribution model.

    maxent3

Congratulations! You have now made your first species distribution model using QGIS, R, and MaxEnt.

Further Reading

To learn more about MaxEnt such as analysing and interpreting MaxEnt's outputs, adjusting model settings, etc. the following materials are suggested for further reading:

Elith, J., Graham, C.H., Anderson, R.P., Dudik, M., Ferrier, S., Guisan, A., et al. (2006) Novel methods improve prediction of species’ distributions from occurrence data. Ecography, 29, 129–151. (DOI)

Merow, C., Smith, M.J. & Silander, J.A. (2013) A practical guide to MaxEnt for modeling species’ distributions: what it does, and why inputs and settings matter. Ecography, 36, 1058–1069. (DOI)

Morales, N.S., Fernandez, I.C. & Baca-Gonzalez, V. (2017) MaxEnt’s parameter configuration and small samples: are we paying attention to recommendations? A systematic review. PeerJ, 5, e3093. (DOI)

Phillips, S.J., Anderson, R.P., Dudik, M., Schapire, R.E. & Blair, M.E. (2017) Opening the black box: an open-source release of Maxent. Ecography, 40, 887–893.(DOI)

Phillips, S.J., Anderson, R.P. & Schapire, R.E. (2006) Maximum entropy modeling of species geographic distributions. Ecological modeling, 190, 231–259. (DOI)

Yackulic, C.B., Chandler, R., Zipkin, E.F., Royle, J.A., Nichols, J.D., Campbell Grant, E.H. & Veran, S. (2013) Presence-only modeling using MAXENT: when can we trust the inferences? Methods in Ecology and Evolution, 4, 236–243. (DOI)

References

Hijmans, R.J., Cameron, S.E., Parra, J.L., Jones, P.G. & Jarvis, A. (2005) Very high resolution interpolated climate surfaces for global land areas. International Journal of Climatology, 25, 1965–1978. (DOI)

Ramos, L.T., Torres, A.M., Pulhin, F.B. & Lasco, R.D. (2011) Developing a georeferenced database of selected threatened forest tree species in the Philippines. Philippine Journal of Science, 141, 165–177. (PDF)

Stattersfield, A.J., Crosby, M., Long, A.J. & Wege, D.C. (1998) Endemic Bird Areas of the World: Priorities for Biodiversity Conservation. The Burlington Press, Ltd., Cambridge, United Kingdom.

License

Creative Commons Attribution 4.0 International CC BY 4.0. Please note the Disclaimer of Warranties and Limitation of Liability under Section 5 of this license as follows:

a. Unless otherwise separately undertaken by the Licensor, to the extent possible, the Licensor offers the Licensed Material as-is and as-available, and makes no representations or warranties of any kind concerning the Licensed Material, whether express, implied, statutory, or other. This includes, without limitation, warranties of title, merchantability, fitness for a particular purpose, non-infringement, absence of latent or other defects, accuracy, or the presence or absence of errors, whether or not known or discoverable. Where disclaimers of warranties are not allowed in full or in part, this disclaimer may not apply to You.

b. To the extent possible, in no event will the Licensor be liable to You on any legal theory (including, without limitation, negligence) or otherwise for any direct, special, indirect, incidental, consequential, punitive, exemplary, or other losses, costs, expenses, or damages arising out of this Public License or use of the Licensed Material, even if the Licensor has been advised of the possibility of such losses, costs, expenses, or damages. Where a limitation of liability is not allowed in full or in part, this limitation may not apply to You.

c. The disclaimer of warranties and limitation of liability provided above shall be interpreted in a manner that, to the extent possible, most closely approximates an absolute disclaimer and waiver of all liability.

Want to Contribute?

In case you wish to contribute or suggest changes, please feel free to fork this repository. Commit your changes and submit a pull request. Thanks.

About

A short tutorial for creating a species distribution model using QGIS, R, and MaxEnt.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • R 80.9%
  • QML 19.1%