Introduction: The power to “roll your own” maps

In 2005 a group of R developers created the R package sp to extend R with classes and methods for spatial data (Pebesma and Bivand, 2005). Since then, hundreds of packages have been created to assist in analyzing and visualizing spatial data. Given the myriad of GIS software that already exists, created by ESRI as well as many other companies, what is the advantage of conducting geospatial analysis in R? Here’s a comparison of GIS and R provided by Robert Hijmans (UC Davis):

GIS R
Visual interaction Data & model focused
Data management Analysis
Geometric operations Attributes as important
Standard workflows Creativity & innovation
Single map production Many (simpler) maps
Click, click, click, & click Repeatability (single script)
Speed of execution Speed of development
Typically expensive Powerfull and free

This short class is designed to introduce geospatial analysis in R. To do this we will focus on the building blocks as well as some packages that facilitate easy analysis and visualization. The primary topics that I will focus on are:

  1. Point Data
  2. Polygon Data
  3. Understanding and using datums
  4. Geo-Coding
  5. Introducing the ggmap package
  6. Advanced Visualization Topics

If you need to conduct a quick review of R, I recommend that you look at two introductory lessons that we’ve developed for West Point cadets:

Before you begin conducting spatial analysis in R, make sure that you install three necessary packages: ggmap, Rcpp, and sp. This can be done in the packages tab in the lower right quadrant of RStudio. It can also be done on the command line by typing install.packages('ggmap') , install.packages('Rcpp'), and install.packages('sp')

Point Data

Point data is the simplest type of geospatial data. Spatial point data is used represent the spatial nature of events. Examples of point data include the location of a customer’s iPhone purchases in business, the location of a crime in law enforcement, the location of attacks in the military, or the location of infrastructure in engineering. Point data means that an entity is represented by an x coordinate and a y coordinate in space. In data, this is most often done by recording a longitude and latitude. We are going to import a data set that you should already be familiar with: the Houston Crime data set. You can import this data with the command:

library(ggmap)
library(Rcpp)
data(crime)  ##After running this commend, you shoud see 'crime' in your working environment.

Now let’s explore the data. Let’s first see what the column names are:

names(crime)
##  [1] "time"     "date"     "hour"     "premise"  "offense"  "beat"    
##  [7] "block"    "street"   "type"     "suffix"   "number"   "month"   
## [13] "day"      "location" "address"  "lon"      "lat"

We see that we have some time fields, type of offense fields, as well as some spatial fields. In addition to the street address, we actually have the longitude and the latitude. We can get a little more information on the columns with the command:

str(crime)
## 'data.frame':    86314 obs. of  17 variables:
##  $ time    : POSIXt, format: "2010-01-01 01:00:00" "2010-01-01 01:00:00" ...
##  $ date    : chr  "1/1/2010" "1/1/2010" "1/1/2010" "1/1/2010" ...
##  $ hour    : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ premise : chr  "18A" "13R" "20R" "20R" ...
##  $ offense : Factor w/ 7 levels "aggravated assault",..: 4 6 1 1 1 3 3 3 3 3 ...
##  $ beat    : chr  "15E30" "13D10" "16E20" "2A30" ...
##  $ block   : chr  "9600-9699" "4700-4799" "5000-5099" "1000-1099" ...
##  $ street  : chr  "marlive" "telephone" "wickview" "ashland" ...
##  $ type    : chr  "ln" "rd" "ln" "st" ...
##  $ suffix  : chr  "-" "-" "-" "-" ...
##  $ number  : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ month   : Ord.factor w/ 8 levels "january"<"february"<..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ day     : Ord.factor w/ 7 levels "monday"<"tuesday"<..: 5 5 5 5 5 5 5 5 5 5 ...
##  $ location: chr  "apartment parking lot" "road / street / sidewalk" "residence / house" "residence / house" ...
##  $ address : chr  "9650 marlive ln" "4750 telephone rd" "5050 wickview ln" "1050 ashland st" ...
##  $ lon     : num  -95.4 -95.3 -95.5 -95.4 -95.4 ...
##  $ lat     : num  29.7 29.7 29.6 29.8 29.7 ...

Now that we’ve learned a little bit about the data, let’s explore the spatial component of our data. To do this, I usually start by plotting it against a plane white background as simple x and y coordinates. To do this in R, we simply type:

plot(crime$lon,crime$lat,xlab="lon",ylab="lat")

This plot shows us immediately that we have some outliers that seem to be pretty far away from Houston, Texas.

Now let’s showcase the power of the ggmap package by plotting this same data on an image from Google Maps. First, let’s learn how to get a map from Google Maps. The following code gets and plots a map of Houston from Google Maps:

qmap("houston", zoom = 13)  #This command retrieves and plots a map of Houston