class: title-slide # Session 6: Introduction to geospatial data<span style="display:block; margin-top: 10px ;"></span> ## ### ### <!-- Can also separate the various components of the extra argument 'params', eg as in ### , , MSc in Epidemiology --> --- layout: true .my-footer[ .alignleft[ © Pirani | Blangiardo ] .aligncenter[ MSc in Epidemiology ] .alignright[ , NA ] ] --- # Learning objectives After this lecture you should be able to <span style="display:block; margin-top: 40px ;"></span> - Describe (statistically) the different geospatial data types <span style="display:block; margin-top: 40px ;"></span> - Describe how geospatial data are represented in R <span style="display:block; margin-top: 40px ;"></span> - Describe Coordinate Reference Systems (Geographical and Projected CRS) <span style="display:block; margin-top: 40px ;"></span> The topics covered in this lecture are presented in: - Chapter 2 of the book **Geocomputation with R** https://r.geocompx.org/index.html --- # Outline <span style="display:block; margin-top: 30px ;"></span> 1\. [The different types of spatial data](#spatialData) <span style="display:block; margin-top: 30px ;"></span> 2\. [Spatial data in R](#SpatialR) <span style="display:block; margin-top: 30px ;"></span> 3\. [Coordinate Reference Systems ](#CRS) --- name: spatialData <span style="display:block; margin-top: 250px ;"></span> .myblue[.center[.huge[ **What are the different types of spatial data?**]]] --- # Terminology - The data are seen as being a realization of a stochastic process, that is, of a set of random numbers each of which is associated with a spatial location. - A spatial process in `\(d\)` dimensions is denoted as: `$$\{Z(\bm{s}): \bm{s} \in \mathcal{D} \subset \mathbb{R}^d \}$$` where - `\(Z\)` is the attribute we observe (e.g. temperature, number of sudden infant deaths etc.) <span style="display:block; margin-top: 10px ;"></span> - `\(\bm{s}\)` is the location where `\(Z\)` is observed (e.g. coordinates such as latitude and longitude) <span style="display:block; margin-top: 10px ;"></span> - `\(\mathcal{D}\)` is the domain, and it is called the **index set** = possible locations <span style="display:block; margin-top: 70px ;"></span> - Symbols are as follows: `\(\{\}\)` a set (a collection of elements); `\(\in\)` element of or belongs to; `\(\subset\)` subset of; `\(\mathbb{R}\)` real numbers set --- # Type of spatial data Cressie (1993) distinguishes three types of spatial data, based on the nature of the spatial domain `\(\mathcal{D}\)`: <span style="display:block; margin-top: 20px ;"></span> - .red[Areal data (also known as lattice data)]: `\(\mathcal{D}\)` is fixed (of regular or irregular shape) and partitioned into a finite number of areal units (e.g. census tract, pixels) with well-defined boundaries. <span style="display:block; margin-top: 20px ;"></span> - .red[Geostatistical (or point-referenced) data]: `\(\mathcal{D}\)` is a continuous fixed set. By continuous we mean that `\(Z(\bm{s})\)` can be observed everywhere within `\(\mathcal{D}\)`. By fixed we mean that the points in `\(\mathcal{D}\)` are non-stochastic. <span style="display:block; margin-top: 20px ;"></span> - .red[Point pattern data]: `\(\mathcal{D}\)` is itself random. Its index set gives the locations of random **events** that are the spatial point pattern. --- # Example of areal data (irregular lattice) [1] Data can be either irregularly aligned or gridded, and occur in the form of aggregated observation over areas. <span style="display:block; margin-top: 20px ;"></span> .pull-left[<center><img src=./img/HDI_Africa.png width='100%' title=''></center> .center[Human Development Index (HDI) for Africa. HDI values range from 0 (low development) to 1 (high development). Source: World Bank Open Data.] ] .pull-right[ - Lattice data involves data measured at different areas, e.g., neighbourhoods, cities, provinces, states, etc. <span style="display:block; margin-top: 20px ;"></span> - .red[The question of primary interest] is related to spatial variability of the phenomenon under study. <span style="display:block; margin-top: 40px ;"></span> - Spatial dependence appears because neighbour areas will show similar values of the variable of interest. <span style="display:block; margin-top: 40px ;"></span> - The *sites* `\(\boldsymbol{s} \in \mathcal{D}\)` in this case are actually the areas themselves. ] --- # Example of regular lattice data [2] <center><img src=./img/Getis-Ord.jpg width='40%' title=''></center> .center[Regular lattice: Getis-Ord remote sensing example data (from package `spdep`).] --- # Example of geostatistical data [1] Zinc measured in the top soil in a flood plain along the river Meuse (obtained from `sp` `R` package)
--- # Example of geostatistical data [2] .center[Average air temperature in degree Celsius in Brazil (March-November 2018)] .pull-left[ <center><img src=./img/Rplot_Temp_Dry2018.jpeg width='80%' title=''></center> .center[Observed] ] .pull-right[ <center><img src=./img/Rplot_Temp_Pred_Dry2018.jpeg width='57%' title=''></center> .center[Predicted] ] <span style="display:block; margin-top: 20px ;"></span> - The .red[question of primary interest] concerns the reconstruct a surface from noisy observations taken at a finite number of spatial locations. --- # Examples of point pattern data [1] .pull-left[ - In 1854, cholera hit the city of London. No one knew where the disease started. So, British physician Snow started mapping the outbreak. - But it wasn’t just the disease; he also mapped out roads, property boundaries, and water lines, discovering that cholera cases were only along one water line. - .blue[Breakthrough that connected geography to public health safety]. - The .red[question of primary interest] is whether, and if so where and when, statistically unusual local concentrations of cases occur. ] .pull-right[ <center><img src=./img/SnowMap.jpeg width='70%' title=''></center> .center[Snow's map of the 1854 London cholera outbreak (obtained from `HistData` `R` package)] <span style="display:block; margin-top: 10px ;"></span> ] --- # Examples of point pattern data [2] <center><img src=./img/Earthquakes_Italy.jpg width='38%' title=''></center> .center[Location of earthquakes in Italy 1900-2017 (moment magnitude scale).] <span style="display:block; margin-top: 10px ;"></span> Other examples in which points are the location of an .red[event] of interest: - Location of crimes - Location of trees --- name: spatialR <span style="display:block; margin-top: 250px ;"></span> .myblue[.center[.huge[ **Spatial Data in R**]]] --- # Spatial data in R: Vectors The are two fundamental distinctions: spatial .red[vector] data and .red[raster]. - .red[Vector] represents the world using .red[points], .red[lines] and .red[polygons] or combinations of those, where: <span style="display:block; margin-top: 20px ;"></span> - *Point*, is a single point location, such as a geocoded address or a temperature sensor or the location of a bus stop; <span style="display:block; margin-top: 20px ;"></span> - *Line*, is a set of ordered points, connected by straight line segments such as route travel or connections between locations; <span style="display:block; margin-top: 20px ;"></span> - *Polygon*, is an area, marked by one or more enclosing lines such as local authority districts or census tracts. <span style="display:block; margin-top: 30px ;"></span> .pull-left[ ``` r > library(leaflet) > popup = "Imperial College" > leaflet() %>% + setView(lng = -0.1769, lat = 51.4983, zoom = 2.5) %>% + addProviderTiles("NASAGIBS.ViirsEarthAtNight2012")%>% + addMarkers(lng = -0.1769, lat = 51.4983, + popup = popup) ``` ] .pull-right[
] --- # Spatial data in R: Rasters - .red[Raster]: typically divides the surface into cells (also called pixels) of constant size. The cell of one raster hold a value, which might be numeric or categorical. <center><img src=./img/raster.jpg width='35%' title=''></center> <span style="display:block; margin-top: 20px ;"></span> - Examples of continuous and categorical raster data. <center><img src=./img/Raster_data.jpg width='39%' title=''></center> .center[Source: Lovelace, Nowosad, and Muenchow (2019), Section 2.3.] --- # Some useful R packages for spatial classes and visualization - `sf` is a key package for spatial vector data (points, lines, polygons etc.). It refers to a formal standard that describes how objects in the real world can be represented in computers, with emphasis on the spatial geometry of these objects <span style="display:block; margin-top: 10px ;"></span> - `sp` precedes `sf`. `sp` was first release on CRAN in 2005 and provides classes and methods to create points, lines, polygons, and grids and to operate on them. Some R packages still relay on the `sp` package <span style="display:block; margin-top: 10px ;"></span> - `terra` contains classes and methods representing raster objects (but it has also its own vector data classes). It allows raster data to be loaded, processed and saved. <span style="display:block; margin-top: 10px ;"></span> - `stars` contains classes and methods representing spatio-temporal data (raster and vector data cubes) <span style="display:block; margin-top: 10px ;"></span> - `ggplot2`, `tmap`, `leaflet`, `mapview` and `gganimate` for visualization and maps --- # `sp` and `sf` objects ``` r > library(spData); library(sf); library(tidyverse) > > # world is a sf object containing a world map data > # from Natural Earth with a few variables from World Bank > world = st_read(system.file("shapes/world.gpkg", package="spData")) > plot(world["pop"]) # plot world population > > world_sp = as(world, "Spatial") # from sf to sp object > class(world_sp) # "SpatialPolygonsDataFrame" > > world_sf = st_as_sf(world_sp) # from sp to sf object > class(world_sf) # "sf" "data.frame" ``` <center><img src=./img/World_pop.jpeg width='29%' title=''></center> --- # `sp` and `sf` `sf` works well with the `tidyverse` collection of R packages. For example, functions can be combined using the pipe operator `\(\%>\%\)` given that both packages are loaded ``` r > # Select and plot information for a single attribute > world %>% select(lifeExp) %>% plot() ``` <center><img src=./img/LifeExp.jpeg width='45%' title=''></center> --- # `sp` and `sf` `sf` object includes spatial metadata like the coordinate reference system (CRS), which are stored in a list column. We can extract and plot only the geometry with the function `st_geometry()` ``` r > # Extract geometry > worlg_geo = st_geometry(world) > # Extract and plot out only the geometries > world %>% st_geometry() %>% plot() ``` <center><img src=./img/World_geom.jpeg width='45%' title=''></center> --- name: CRS <span style="display:block; margin-top: 250px ;"></span> .myblue[.center[.huge[ **Coordinate Reference Systems**]]] --- # Coordinate Reference Systems (CRS) - A CRS defines how the spatial elements of the data relate to the surface of the Earth. - There are two main types of CRS: 1. .red[Geographic coordinate reference systems (GCRS)]: coordinate systems that identify any location on the Earth's surface using longitude and latitude, with units in decimal degrees or degrees. 2. .red[Projected coordinate reference systems (PCRS)]: coordinate systems that provide various mechanisms to project maps of the Earth's ellipsoid shape onto a two-dimensional Cartesian coordinate plan, with units on feet or meter. Projected coordinate systems are referred to as map projections. <center><img src=./img/GCS_PCS.jpg width='45%' title=''></center> <span style="display:block; margin-top: 10px ;"></span> .center[Source: https://www.esri.com/arcgis-blog/products/arcgis-pro/mapping/gcs_vs_pcs/] --- # GCRS [1] Geographic CRS refers to a point on the Earth by its longitude and latitude values, which are angles measured from the Earth's center to a point on the Earth's surface. - The .blue[longitude] of a point on the Earth’s surface is the angle west or east of the prime meridian to another meridian that passes through that point. Longitude values range from -180 degrees when running west to 180 degrees when running east. Lines of Longitude are also called meridians. - The .blue[latitude] of a point on Earth's surface is the angle between the equator and the line that passes through that point and the center of the Earth. Latitude values are measured relative to the equator (0 degrees) and range from -90 degrees at the south pole to 90 degrees at the north pole. Lines of latitude are called parallels. <center><img src=./img/LongLat3.jpg width='25%' title=''></center> <span style="display:block; margin-top: 15px ;"></span> .center[Source: https://annefou.github.io/metos_python/03-crs_proj/] --- # GCRS [2] - Obviously we cannot actually measure these angles, but we can estimate them. To do so, we need a model of the shape of the Earth. Such a model is called as the .red[datum]. - Datum contains information on what **ellipsoid** is used to approximate the Earth's shape and the relationship between the coordinate system and location on the Earth's surface; it also describes the origin of a coordinate system. - There are two types of datum: geocentric and local <center><img src=./img/datum.jpg width='43%' title=''></center> .center[Source: Lovelace, Nowosad, and Muenchow (2019), Section 2.4.] --- # PCRS [1] - Projected CRS are based on a GCRS and rely on map projections to convert the three-dimensional surface of the Earth into `\(x\)` and `\(y\)` values. - The transition leads to some distortions: there is not one best projection. Some projections can be used for a map of the whole world; other projections are appropriate for small areas only. - There are three main projection types: conic, cylindrical, and planar: <center><img src=./img/Proj_CRS.jpg width='40%' title=''></center> <span style="display:block; margin-top: 15px ;"></span> .center[Source: https://www.earthdatascience.org/courses/earth-analytics/spatial-data-r/geographic-vs-projected-coordinate-reference-systems-UTM/] --- # PCRS [2] - Projected CRS is optimized to best represent the shape, and/or the rotation, and/or the area of features in a data set. - There is not a single system that does a great job at optimizing all these elements: shape, rotation and area. <center><img src=./img/PCRS.jpg width='50%' title=''></center> .center[Source: https://ihatecoordinatesystems.com/] <span style="display:block; margin-top: 35px ;"></span> A funny and explanatory YouTube video on the trade off between GCRS and PCRS is at this link: https://www.youtube.com/watch?v=kIID5FDi2JQ --- # Re-projecting Re-projecting is the process of changing the representation of locations from one CRS to another. When should data be transformed? - The projections of locations on the Earth into a two-dimensional plane are distortions, the projection that is best for an application may be different from the projection associated with the data we import. In these cases, data can be re-projected. - Another case is when two objects with different CRSs must be compared or combined. Thus, in the case of dealing with multiple data, re-projection permits to transform all data to a common CRS. - In R, to transform data to a different projection, we can use `st_transform()` function of the `sf` package. --- # CRS in R (before 2020) - Spatial R packages support a wide range of CRSs and they use the `PROJ` library to perform conversions between cartographic projections. - Before 2020, there are two ways of defining a CRS: - the .blue[proj4string] definition, which specifies attributes such as the projection, the ellipsoid and the datum, for example the WGS84 longitude and latitude projection is specified as: `\(+proj=longlat +ellps=WGS84 +datum=WGS84 +no\_defs\)` - the .blue[EPSG] numeric code (EPSG stands for European Petroleum Survey Group). The code refers to only one, well-defined coordinate reference system, for example the EPSG code of the WGS84 projection is 4326 (Note that the EPSG code could not be available for a particular coordinate system; https://spatialreference.org/ref/epsg/) --- # CRS in R (after 2020) - There are new developments at level of `PROJ` (https://proj.org/), and shift from `proj4string` to the `OGC WKT2` representation (OGC stands for Open Geospatial Consortium, and WKT stands for Well-known Text formats) - The CRS is represented by: - a set of mathematical rules for specifying how coordinates are to be assigned to points; - a set of parameters defining the position of the origin, the scale, and the orientation of a coordinate system (datum) - There is capability of direct transformations from CRS to CRS - The use of `proj4string` is discouraged (some `proj4string` string elements are not longer supported; also only three ellipsoids can be used). `proj4string` still can be used for coordinate operations (transformations between CRS). - In `sf`, we can retrieve the CRS of an object using `st_crs()` --- # Example ``` r > library(spData) > library(sf) > # Extract the CRS information from the sf object nz (New Zealand) > st_crs(nz) # here EPSG:2193 > > # Re-project > nz_wgs = st_transform(nz, 4326) > st_crs(nz_wgs) # now EPSG:4326 > # Extract the CRS information from the sf object world > st_crs(world) #EPSG:4326 > > # Re-project using the CRS of the sf object world > nz_wgs = st_transform(nz, st_crs(world)) > st_crs(nz_wgs) > > # Remove the CRS from nz_wgs > nz_wgs_NULL_crs = st_set_crs(nz_wgs, NA) > > # Plot > par(mfrow = c(1, 3)) > plot(st_geometry(nz)) > plot(st_geometry(nz_wgs)) > plot(st_geometry(nz_wgs_NULL_crs)) ``` --- # Example Plots of the results: <span style="display:block; margin-top: 30px ;"></span> <center><img src=./img/NZ.jpg width='70%' title=''></center> --- # References Cressie, N. (1993). _Statistics for Spatial Data_. John Wiley & Sons, Inc. Lovelace, R., J. Nowosad, and J. Muenchow (2019). _Geocomputation with R_. Chapman and Hall/CRC.