7 min read

Interactive Plots and Maps in R

Introduction

Often times when w are working with data, there always a geospatial component to the data—the locations. Most of us have used static maps to reveal information that other plots can not. And interactive maps can enliven geographic information to new insights. The most important type of interactivity, is the display of geographic data on interactive or ‘slippy’ web maps. Interactivity can take many forms, the most common and useful of which is the ability to pan around and zoom into any part of a geographic dataset overlaid on a ‘web map’ to show context.

R has several packages that support interactivity of features, including maps. Bachelier et al. (2019) developed leaflet.minicharts package that enhance interactivity of maps by superimpose on the basemaps with interactive plots. leaflet.minicharts is package in R bundled with functions to overlay plots on an interactive maps created with the leaflet package (Cheng, Karambelkar, and Xie 2018). These plots can represent variables associated to geographical points.

In this post, I will illustrate step by step overlaying of pie and bar plots on interactive map. I will also show adding single variable with value on an interactive map. Before we start rolling, let’s first load the packages we want to use in this session.

require(leaflet)
require(leaflet.minicharts)
require(sf)
require(tidyverse)

Once we have loaded the packages in our session, we first create a basemap using leaflet package (Cheng, Karambelkar, and Xie 2018). Leaflet maps are created with leaflet() function, the result of which is a leaflet map object which can be piped to other leaflet functions. This allows multiple map layers and control settings to be added interactively. Let’s make an interactive leaflet basemap:

tilesURL = "http://server.arcgisonline.com/ArcGIS/rest/services/Canvas/World_Light_Gray_Base/MapServer/tile/{z}/{y}/{x}"

basemap = leaflet(width = "100%", height = "800px") %>%
  addTiles(tilesURL)

basemap

Census Data

The power of R is that you can pull data, manipulate the data, do some analyses, and visualize the data - all in one open source framework. Here we want to know whether a proportional of married people is similar across all regions in Tanzania? To answer this question let us focus on the census data of 2002.

This population dataset is simple polygon feature with demographic information for each region in Tanzania. To use the dataset in R, we need first to import it into our session using st_read function from sf package (Pebesma 2018).

region = st_read("NBS correct/Tanzania_Region_EA_2002_region/Tanzania_Region_EA_2002_region.shp")
region %>% glimpse()
Rows: 26
Columns: 41
$ REG_CODE   <fct> 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, ...
$ REGNAME    <fct> Dodoma, Arusha, Kilimanjaro, Tanga, Morogoro, Pwani, Dar...
$ MALE       <dbl> 817428, 631402, 663593, 792646, 872497, 430015, 1238964,...
$ FEMALE     <dbl> 872164, 649527, 711501, 843018, 879727, 440261, 1221383,...
$ TOTAL      <dbl> 1689592, 1280929, 1375094, 1635664, 1752224, 870276, 246...
$ NUMBER     <dbl> 382231, 291568, 302026, 362689, 391374, 204161, 606481, ...
$ AVERAGE    <dbl> 4.731137, 4.926079, 5.051941, 5.156647, 4.999909, 5.9064...
$ SINGLE     <dbl> 1020592, 825219, 871438, 1002201, 1070773, 507153, 15483...
$ MARIED     <dbl> 534183, 387568, 394934, 491846, 443728, 269349, 713216, ...
$ UNKNOWN    <dbl> 134817, 68142, 108722, 141617, 237723, 93774, 198770, 10...
$ TANZANIANS <dbl> 1688825, 1274821, 1371788, 1629810, 1750413, 866569, 244...
$ OTHER      <dbl> 553, 3764, 1390, 3166, 1214, 595, 10446, 275, 422, 400, ...
$ KENYANS    <dbl> 103, 1879, 1591, 1417, 153, 119, 2085, 38, 50, 28, 106, ...
$ BURUNDIANS <dbl> 14, 46, 94, 108, 65, 36, 321, 2, 3, 5, 14, 26, 19, 45489...
$ MOZAMBIQUA <dbl> 2, 13, 49, 979, 209, 2904, 3241, 4465, 14222, 3122, 6, 3...
$ UGANDANS   <dbl> 4, 125, 52, 24, 12, 5, 324, 2, 1, 4, 10, 17, 14, 28, 8, ...
$ ZAMBIA     <dbl> 5, 31, 32, 60, 56, 5, 294, 0, 7, 8, 19, 1698, 0, 8, 1467...
$ RWANDIANS  <dbl> 7, 112, 51, 59, 15, 19, 111, 0, 2, 1, 0, 1, 9, 26, 97, 4...
$ MALAWI     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
$ DRC        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
$ AGE0_4     <dbl> 277780, 206230, 187057, 255003, 265219, 125381, 286574, ...
$ AGE5_9     <dbl> 255083, 197187, 203969, 243476, 244823, 117402, 266668, ...
$ AGE10_14   <dbl> 216767, 158561, 201580, 221967, 219390, 105338, 255954, ...
$ AGE15_19   <dbl> 173879, 137621, 141676, 164689, 180383, 89810, 286289, 7...
$ AGE20_24   <dbl> 145815, 129684, 105641, 132184, 163702, 76732, 337427, 6...
$ AGE25_29   <dbl> 125287, 111805, 94017, 126739, 147537, 69607, 300915, 64...
$ AGE30_34   <dbl> 108138, 86552, 84675, 104565, 118091, 56256, 221665, 518...
$ AGE35_39   <dbl> 80814, 66531, 68802, 82111, 91372, 42223, 149588, 42059,...
$ AGE40_44   <dbl> 68243, 48302, 58563, 66119, 73779, 34905, 108492, 34958,...
$ AGE45_49   <dbl> 50210, 35996, 46643, 51135, 56251, 27304, 74938, 26516, ...
$ AGE50_54   <dbl> 43978, 27415, 41893, 46307, 49529, 26825, 58589, 27826, ...
$ AGE55_59   <dbl> 32535, 17735, 28795, 30920, 34656, 18604, 33122, 19336, ...
$ AGE60_64   <dbl> 33497, 17103, 30458, 34044, 35351, 22450, 28465, 21143, ...
$ AGE65PLUS  <dbl> 77566, 40207, 81325, 76405, 72141, 57439, 51661, 44477, ...
$ AREA_KM2   <dbl> 42478.4079, 38364.2440, 13384.4470, 28210.9019, 69011.61...
$ SEXRATIO   <dbl> 1.2, 1.0, 1.0, 1.0, 1.4, 1.3, 1.4, 1.0, 0.9, 1.0, 1.1, 1...
$ POPDENS    <dbl> 1670.4824, 8293.2027, 5166.4552, 2711.4646, 3693.1305, 1...
$ ECONACTIV  <dbl> 939962, 718951, 782488, 915218, 1022792, 522155, 1651151...
$ VOTINGPOP  <dbl> 835618, 636377, 697481, 816401, 914577, 468278, 1479408,...
$ DEPENDANT  <dbl> 2732.6, 2209.9, 2429.6, 2309.3, 4237.8, 1833.4, 9244.9, ...
$ geometry   <MULTIPOLYGON [°]> MULTIPOLYGON (((36.85407 -6..., MULTIPOLYGO...

A glimpse reveal a subtle internal structure of the dataset. It’s a simple feature with polygons representing the geographical boundary of 26 regions in Tanzania. Each region contains demographic information collected in 2002 Housing and Population Census. We are only interested with number of people by gender and marital status, which we select them using select function from dplyr package (Wickham et al. 2018). Before the variable were selected, colnames were cleaned using clean_names() function from janitor package (Firke 2020).

region = region %>% 
  janitor::clean_names() %>%  
  select(regname,male,female,single, maried) %>% 
  st_transform(4326)

Since we need the centroid points of the region, st_point_on_surface function from sf package was used to convert polygon to point feature (Pebesma 2018). Then longitude and latitude of centroid points were extracted using st_coordinates function and combined with the attributes information to produce a tibble. Once the tibble was obtained, a total population in each region was computed by summing the number of male and female in each region.

region.tb = region %>% 
  st_point_on_surface() %>% 
  st_coordinates() %>% 
  as_tibble() %>% 
  rename(lon = 1, lat = 2) %>% 
  bind_cols(region %>% st_drop_geometry()) %>%
  mutate(total = male+female)

We now add to the base map a pie chart for each region that represents the share of married people. We also change the width of the pie charts so their area is proportional to the total population in the region.

colors = c("#104E8B", "#FF00FF") #<- c("#4fc13c", "#cccccc")
colors2 <- c("#3093e5", "#fcba50")

basemap %>%
  addMinicharts(lng = region.tb$lon, 
                lat = region.tb$lat, 
                type = "pie", 
                chartdata = region.tb[, c("single", "maried")], 
                colorPalette = colors2, 
                width = 60 * sqrt(region.tb$total) / sqrt(max(region.tb$total)), 
                transitionTime = 0)

Now let’s represent the marital status using bar charts.

basemap %>%
  addMinicharts(lng = region.tb$lon, 
                lat = region.tb$lat, 
                type = "bar", 
                chartdata = region.tb[, c("single", "maried")], 
                colorPalette = colors2, width = 25, height = 65)

Representing a single variable

leaflet.minicharts has been designed to represent multiple variables at once, but you still may want to use it to represent a single variable. In the next example, we represent the total population of each region in 2002. When data passed to addMinicharts contains a single column, it automatically represents it with circle which area is proportional to the corresponding value. In the example we also use the parameter showLabels to display rounded values of the variable inside the circles.

basemap %>%
  addMinicharts(lng = region.tb$lon, 
                lat = region.tb$lat, 
                # type = "bar", 
                chartdata = region.tb$total, 
                showLabels = TRUE, 
                width = 55)

References

Bachelier, Veronique, Jalal-Edine ZAWAM, Benoit Thieurmel, and Francois Guillem. 2019. Leaflet.minicharts: Mini Charts for Interactive Maps. https://CRAN.R-project.org/package=leaflet.minicharts.

Cheng, Joe, Bhaskar Karambelkar, and Yihui Xie. 2018. Leaflet: Create Interactive Web Maps with the Javascript ’Leaflet’ Library. https://CRAN.R-project.org/package=leaflet.

Firke, Sam. 2020. Janitor: Simple Tools for Examining and Cleaning Dirty Data. https://CRAN.R-project.org/package=janitor.

Pebesma, Edzer. 2018. Sf: Simple Features for R. https://CRAN.R-project.org/package=sf.

Wickham, Hadley, Romain François, Lionel Henry, and Kirill Müller. 2018. Dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr.