For Easter, I wanted to have a look at the number of hares in Germany. Wild hare populations have been rapidly declining over the last 10 years but during the last three years they have at least been stable.
This plot shows the 16 federal states of Germany. Their fill color shows the proportion of hares per square kilometer in 2015/2016. The black area of the pie charts shows the percentage of the total number of hares (not corrected by area) for each federal state in regard to the total number of hares in all of Germany during that time. The size of the hares in the background reflects the total number of hares in the eight federal states with the highest percentages.
Below, you can find the code I used to produce this plot.
Hare population numbers
The German hunter’s association (Deutscher Jagdverband) publishes population numbers for a variety of common species of wild game, including hares. The most recent numbers are for the year 2015/16.
I downloaded the pdf and extracted the table on page 1 with the tabulizer package.
library(tidyverse)
#devtools::install_github("ropensci/tabulizer")
library(tabulizer)
txt <- extract_tables("2015-16 Jahresstrecke Feldhase.pdf")
hares <- txt[[1]] %>%
as.data.frame(stringsAsFactors = FALSE)
# first row contains column names, remove from df and set as colnames
hares_df <- data.frame(hares[-1 , ])
colnames(hares_df) <- gsub(" ", "", as.character(c("bundesland", unlist(hares[1, -1]))))
# remove the spaces in the numbers and make numeric
hares_df <- apply(hares_df, 2, function(x) gsub(" ", "", x))
hares_df[, -1] <- apply(hares_df[, -1], 2, function(x) as.numeric(x))
hare_final <- as.data.frame(hares_df, stringsAsFactors = FALSE)
The map
I downloaded the ESRI Shapefile of German federal states from the Bundesamt für Kartographie und Geodäsie, Frankfurt am Main, 2011 and read it in with the rgdal package.
library(maptools)
ger <- rgdal::readOGR(dsn = "shp", layer = "vg2500_bld")
## OGR data source with driver: ESRI Shapefile
## Source: "shp", layer: "vg2500_bld"
## with 16 features
## It has 6 fields
I then convert the shapefile into a dataframe for plotting. This allowes the plotting of polygons in the shape of ech federal state with ggplot.
library(plyr)
ger_df <- fortify(ger)
ger@data$id <- rownames(ger@data)
ger_df_final <- join(ger_df, ger@data, by = "id")
ger_df_final$GEN <- as.character(ger_df_final$GEN)
ger_df_final$GEN <- gsub("\xfc", "ü", ger_df_final$GEN)
ger_df_final$GEN <- gsub("ü", "ue", ger_df_final$GEN)
To match the names of the federal states in the map dataframe with the hare population table, I assign a corresponding column to the latter.
hare_final2 <- filter(hare_final, bundesland != "gesamt")
hare_final2$GEN <- as.factor(c("Baden-Wuerttemberg", "Bayern", "Berlin", "Brandenburg", "Bremen", "Hamburg", "Hessen", "Mecklenburg-Vorpommern", "Niedersachsen", "Nordrhein-Westfalen", "Rheinland-Pfalz", "Saarland", "Sachsen", "Sachsen-Anhalt", "Schleswig-Holstein", "Thueringen"))
colnames(hare_final2)[2:12] <- paste0("Y_", gsub("/", "_", colnames(hare_final)[2:12]))
hare_final2[, 2:12] <- apply(hare_final2[, 2:12], 2, function(x) as.numeric(x))
Area
The hare population numbers are given as absolute numbers but I want them normalized by area. The German statistics portal (http://www.statistik-portal.de) provides this data on their website. I use the XML package to scrape it from there.
library(XML)
table = readHTMLTable("http://www.statistik-portal.de/Statistik-Portal/de_jb01_jahrtab1.asp", header = TRUE, which = 1, stringsAsFactors = FALSE)
table$V1 <- gsub("ü", "ü", table$V1)
colnames(table) <- c("GEN", "area", "pop_total", "pop_male", "pop_female", "inh_p_km2")
# numbers are in German format, so need to convert them to English decimal notation
table[, -1] <- apply(table[, -1], 2, function(x) as.numeric(gsub(",", ".", gsub("\\.", "", x))))
table <- table[-17, ]
table$GEN <- as.factor(table$GEN)
I then divide each population number by area (in km2).
hare_final2$GEN <- as.character(hare_final2$GEN)
table$GEN <- hare_final2$GEN
hare_final3 <- hare_final2 %>%
left_join(table, by = "GEN")
hare_final3[, 2:12] <- apply(hare_final3[, 2:12], 2, function(x) x / hare_final3$area)
This final table I then join with the map dataframe.
map <- left_join(ger_df_final, hare_final3, by = "GEN")
The background map
The background map shows Germany’s federal states (as polygons) colored according to how many hares there were relative to the area in km2.
I define the mapping theme for ggplot, so that I have no axes, grids, etc. and a transparent background.
map_theme <- list(theme(panel.grid.minor = element_blank(),
panel.grid.major = element_blank(),
panel.background = element_rect(fill = "transparent", color = NA),
plot.background = element_rect(fill = "transparent", color = NA),
panel.border = element_blank(),
axis.line = element_blank(),
axis.text.x = element_blank(),
axis.text.y = element_blank(),
axis.ticks = element_blank(),
axis.title.x = element_blank(),
axis.title.y = element_blank(),
plot.title = element_text(size = 18)))
The background of each polygon is also filled with a hare image with the hare’s size being proportional to the percentage of total hare numbers of each federal state compared to the total number of hares in Germany in 2015/16.
To achieve this, I first downloaded an open-source image of a rabbit and read it with the png package. This image I then want to plot with gplot’s annotation_custom() function, so I need to convert the image to a raster.
library(png)
bunny <- readPNG("rabbit-297212_1280.png") #https://pixabay.com
library(grid)
g <- rasterGrob(bunny, interpolate = TRUE)
I am plotting hares only for the 8 federal states with percentage above 1, because the other ones would be too small on the plot.
For each of these 8 federal states, I am plotting the hare image in relative size by scaling the xmax and ymax values according to the percentage values for each state. The image canvas always has a size of 15 by 15, so I am centering the images at 7.5. I then save each image as a png with transparent background.
val <- round(filter(hare_final_pie, GEN == "Bayern")$percent, digits = 0) * 0.5 / 2
df <- data.frame(xmin = 7.5 - val,
xmax = 7.5 + val,
ymin = 7.5 - val,
ymax = 7.5 + val)
qplot(0:15, 0:15, geom = "blank") +
annotation_custom(g, xmin = df$xmin, xmax = df$xmax, ymin = df$ymin, ymax = df$ymax) +
map_theme
ggsave("my_image_Bayern.png", bg = "transparent")
val <- round(filter(hare_final_pie, GEN == "Niedersachsen")$percent, digits = 0) * 0.5 / 2
df <- data.frame(xmin = 7.5 - val,
xmax = 7.5 + val,
ymin = 7.5 - val,
ymax = 7.5 + val)
qplot(0:15, 0:15, geom = "blank") +
annotation_custom(g, xmin = df$xmin, xmax = df$xmax, ymin = df$ymin, ymax = df$ymax) +
map_theme
ggsave("my_image_Niedersachsen.png", bg = "transparent")
val <- round(filter(hare_final_pie, GEN == "Nordrhein-Westfalen")$percent, digits = 0) * 0.5 / 2
df <- data.frame(xmin = 7.5 - val,
xmax = 7.5 + val,
ymin = 7.5 - val,
ymax = 7.5 + val)
qplot(0:15, 0:15, geom = "blank") +
annotation_custom(g, xmin = df$xmin, xmax = df$xmax, ymin = df$ymin, ymax = df$ymax) +
map_theme
ggsave("my_image_Nordrhein-Westfalen.png", bg = "transparent")
val <- round(filter(hare_final_pie, GEN == "Schleswig-Holstein")$percent, digits = 0) * 0.5 / 2
df <- data.frame(xmin = 7.5 - val,
xmax = 7.5 + val,
ymin = 7.5 - val,
ymax = 7.5 + val)
qplot(0:15, 0:15, geom = "blank") +
annotation_custom(g, xmin = df$xmin, xmax = df$xmax, ymin = df$ymin, ymax = df$ymax) +
map_theme
ggsave("my_image_Schleswig-Holstein.png", bg = "transparent")
val <- round(filter(hare_final_pie, GEN == "Baden-Wuerttemberg")$percent, digits = 0) * 0.5 / 2
df <- data.frame(xmin = 7.5 - val,
xmax = 7.5 + val,
ymin = 7.5 - val,
ymax = 7.5 + val)
qplot(0:15, 0:15, geom = "blank") +
annotation_custom(g, xmin = df$xmin, xmax = df$xmax, ymin = df$ymin, ymax = df$ymax) +
map_theme
ggsave("my_image_Baden-Wuerttemberg.png", bg = "transparent")
val <- round(filter(hare_final_pie, GEN == "Hessen")$percent, digits = 0) * 0.5 / 2
df <- data.frame(xmin = 7.5 - val,
xmax = 7.5 + val,
ymin = 7.5 - val,
ymax = 7.5 + val)
qplot(0:15, 0:15, geom = "blank") +
annotation_custom(g, xmin = df$xmin, xmax = df$xmax, ymin = df$ymin, ymax = df$ymax) +
map_theme
ggsave("my_image_Hessen.png", bg = "transparent")
val <- round(filter(hare_final_pie, GEN == "Rheinland-Pfalz")$percent, digits = 0) * 0.5 / 2
df <- data.frame(xmin = 7.5 - val,
xmax = 7.5 + val,
ymin = 7.5 - val,
ymax = 7.5 + val)
qplot(0:15, 0:15, geom = "blank") +
annotation_custom(g, xmin = df$xmin, xmax = df$xmax, ymin = df$ymin, ymax = df$ymax) +
map_theme
ggsave("my_image_Rheinland-Pfalz.png", bg = "transparent")
val <- round(filter(hare_final_pie, GEN == "Brandenburg")$percent, digits = 0) * 0.5 / 2
df <- data.frame(xmin = 7.5 - val,
xmax = 7.5 + val,
ymin = 7.5 - val,
ymax = 7.5 + val)
qplot(0:15, 0:15, geom = "blank") +
annotation_custom(g, xmin = df$xmin, xmax = df$xmax, ymin = df$ymin, ymax = df$ymax) +
map_theme
ggsave("my_image_Brandenburg.png", bg = "transparent")
Next, I follow the instructions and code from a StackOverflow post: Each image is now read in again and converted to a dataframe for plotting. Because I only want to fill within the polygon borders, I am restricting the points to those, that fall in these shapes. I need to use the “groups” column here, because this is the column that was used for plotting the map polygons.
#http://stackoverflow.com/questions/28206611/adding-custom-image-to-geom-polygon-fill-in-ggplot
# converting raster image to plottable data.frame
library(sp)
ggplot_rasterdf <- function(color_matrix, bottom = 0, top = 1, left = 0, right = 1) {
require("dplyr")
require("tidyr")
if (dim(color_matrix)[3] > 3) hasalpha <- T else hasalpha <- F
outMatrix <- matrix("#00000000", nrow = dim(color_matrix)[1], ncol = dim(color_matrix)[2])
for (i in 1:dim(color_matrix)[1])
for (j in 1:dim(color_matrix)[2])
outMatrix[i, j] <- rgb(color_matrix[i,j,1], color_matrix[i,j,2], color_matrix[i,j,3], ifelse(hasalpha, color_matrix[i,j,4], 1))
colnames(outMatrix) <- seq(1, ncol(outMatrix))
rownames(outMatrix) <- seq(1, nrow(outMatrix))
as.data.frame(outMatrix) %>% mutate(Y = nrow(outMatrix):1) %>% gather(X, color, -Y) %>%
mutate(X = left + as.integer(as.character(X))*(right - left)/ncol(outMatrix), Y = bottom + Y*(top - bottom)/nrow(outMatrix))
}
my_image <- readPNG("my_image_Bayern.png")
groups <- as.character(unique(filter(map, GEN == "Bayern")$group))
my_image_dat <- ggplot_rasterdf(my_image,
left = min(map[map$group %in% groups,]$long),
right = max(map[map$group %in% groups,]$long),
bottom = min(map[map$group %in% groups,]$lat),
top = max(map[map$group %in% groups,]$lat) )
bayern <- my_image_dat[point.in.polygon(my_image_dat$X, my_image_dat$Y,
map[map$group %in% groups,]$long,
map[map$group %in% groups,]$lat) %>% as.logical,]
my_image <- readPNG("my_image_Niedersachsen.png")
groups <- as.character(unique(filter(map, GEN == "Niedersachsen")$group))
my_image_dat <- ggplot_rasterdf(my_image,
left = min(map[map$group %in% groups,]$long),
right = max(map[map$group %in% groups,]$long),
bottom = min(map[map$group %in% groups,]$lat),
top = max(map[map$group %in% groups,]$lat) )
niedersachsen <- my_image_dat[point.in.polygon(my_image_dat$X, my_image_dat$Y,
map[map$group %in% groups,]$long,
map[map$group %in% groups,]$lat) %>% as.logical,]
my_image <- readPNG("my_image_Nordrhein-Westfalen.png")
groups <- as.character(unique(filter(map, GEN == "Nordrhein-Westfalen")$group))
my_image_dat <- ggplot_rasterdf(my_image,
left = min(map[map$group %in% groups,]$long),
right = max(map[map$group %in% groups,]$long),
bottom = min(map[map$group %in% groups,]$lat),
top = max(map[map$group %in% groups,]$lat) )
nrw <- my_image_dat[point.in.polygon(my_image_dat$X, my_image_dat$Y,
map[map$group %in% groups,]$long,
map[map$group %in% groups,]$lat) %>% as.logical,]
my_image <- readPNG("my_image_Schleswig-Holstein.png")
groups <- as.character(unique(filter(map, GEN == "Schleswig-Holstein")$group))
my_image_dat <- ggplot_rasterdf(my_image,
left = min(map[map$group %in% groups,]$long),
right = max(map[map$group %in% groups,]$long),
bottom = min(map[map$group %in% groups,]$lat),
top = max(map[map$group %in% groups,]$lat) )
sh <- my_image_dat[point.in.polygon(my_image_dat$X, my_image_dat$Y,
map[map$group %in% groups,]$long,
map[map$group %in% groups,]$lat) %>% as.logical,]
my_image <- readPNG("my_image_Baden-Wuerttemberg.png")
groups <- as.character(unique(filter(map, GEN == "Baden-Wuerttemberg")$group))
my_image_dat <- ggplot_rasterdf(my_image,
left = min(map[map$group %in% groups,]$long),
right = max(map[map$group %in% groups,]$long),
bottom = min(map[map$group %in% groups,]$lat),
top = max(map[map$group %in% groups,]$lat) )
bw <- my_image_dat[point.in.polygon(my_image_dat$X, my_image_dat$Y,
map[map$group %in% groups,]$long,
map[map$group %in% groups,]$lat) %>% as.logical,]
my_image <- readPNG("my_image_Hessen.png")
groups <- as.character(unique(filter(map, GEN == "Hessen")$group))
my_image_dat <- ggplot_rasterdf(my_image,
left = min(map[map$group %in% groups,]$long),
right = max(map[map$group %in% groups,]$long),
bottom = min(map[map$group %in% groups,]$lat),
top = max(map[map$group %in% groups,]$lat) )
hessen <- my_image_dat[point.in.polygon(my_image_dat$X, my_image_dat$Y,
map[map$group %in% groups,]$long,
map[map$group %in% groups,]$lat) %>% as.logical,]
my_image <- readPNG("my_image_Rheinland-Pfalz.png")
groups <- as.character(unique(filter(map, GEN == "Rheinland-Pfalz")$group))
my_image_dat <- ggplot_rasterdf(my_image,
left = min(map[map$group %in% groups,]$long),
right = max(map[map$group %in% groups,]$long),
bottom = min(map[map$group %in% groups,]$lat),
top = max(map[map$group %in% groups,]$lat) )
rp <- my_image_dat[point.in.polygon(my_image_dat$X, my_image_dat$Y,
map[map$group %in% groups,]$long,
map[map$group %in% groups,]$lat) %>% as.logical,]
my_image <- readPNG("my_image_Brandenburg.png")
groups <- as.character(unique(filter(map, GEN == "Brandenburg")$group))
my_image_dat <- ggplot_rasterdf(my_image,
left = min(map[map$group %in% groups,]$long),
right = max(map[map$group %in% groups,]$long),
bottom = min(map[map$group %in% groups,]$lat),
top = max(map[map$group %in% groups,]$lat) )
brandenburg <- my_image_dat[point.in.polygon(my_image_dat$X, my_image_dat$Y,
map[map$group %in% groups,]$long,
map[map$group %in% groups,]$lat) %>% as.logical,]
Now, I can plot the background plot. I also set the x- and y-limits now, as well as an empty title because I want to overlay the pie charts onto this plot and need to have the same coordinates and margins for this. I am also defining the legend position to be on the top right, so that I can plot the legend of the pie charts on the bottom right.
p1 <- ggplot() +
map_theme +
geom_polygon(data = map, aes(long, lat, group = group, fill = Y_2015_16)) +
geom_path(data = map, aes(long, lat, group = group), color = "white", size = 0.5) +
scale_fill_gradientn(colors = rev(terrain.colors(10)), limits = c(0, max(as.numeric(hare_final3$Y_2015_16)))) +
coord_cartesian(ylim = c(47, 55.5), xlim = c(5, 15.5)) +
theme(legend.justification = "top") +
labs(
title = " ",
fill = "# / km2"
) +
geom_tile(data = bayern, aes(x = X, y = Y), fill = bayern$color) +
geom_tile(data = niedersachsen, aes(x = X, y = Y), fill = niedersachsen$color) +
geom_tile(data = nrw, aes(x = X, y = Y), fill = nrw$color) +
geom_tile(data = sh, aes(x = X, y = Y), fill = sh$color) +
geom_tile(data = bw, aes(x = X, y = Y), fill = bw$color) +
geom_tile(data = hessen, aes(x = X, y = Y), fill = hessen$color) +
geom_tile(data = rp, aes(x = X, y = Y), fill = rp$color) +
geom_tile(data = brandenburg, aes(x = X, y = Y), fill = brandenburg$color)
The pie plot
Over each federal state, I want to plot a pie chart that shows the percentage of the total hare population that falls on each federal state. I use the scatterpie package for that. The coordinates for each pie should be the center points of each federal state, which I determined with the following code from a StackOverflow post.
library(scatterpie)
# http://stackoverflow.com/questions/10368180/plotting-pie-graphs-on-map-in-ggplot
getLabelPoint <- function(county) {Polygon(county[c('long', 'lat')])@labpt}
centroids <- by(map, map$GEN, getLabelPoint)
centroids <- do.call("rbind.data.frame", centroids) %>%
tibble::rownames_to_column()
names(centroids) <- c("GEN", "long", "lat")
I then calculate the percentages and join them with the centroid coordinates. I am also adjusting the positions of the pie charts for some federal states so that they don’t overlap with other pie charts or with the background images.
hare_final_pie <- hare_final2 %>%
select(one_of(c("GEN", "Y_2015_16"))) %>%
mutate(percent = round(Y_2015_16 / sum(Y_2015_16) * 100, digits = 2),
rest = 100 - percent) %>%
left_join(centroids, by = "GEN")
hare_final_pie[hare_final_pie$GEN == "Brandenburg", "long"] <- 14
hare_final_pie[hare_final_pie$GEN == "Brandenburg", "lat"] <- 52
hare_final_pie[hare_final_pie$GEN == "Niedersachsen", "long"] <- 8
hare_final_pie[hare_final_pie$GEN == "Niedersachsen", "lat"] <- 52.8
hare_final_pie[hare_final_pie$GEN == "Schleswig-Holstein", "long"] <- 10.5
hare_final_pie[hare_final_pie$GEN == "Schleswig-Holstein", "lat"] <- 54
hare_final_pie[hare_final_pie$GEN == "Hamburg", "long"] <- 10.2
hare_final_pie[hare_final_pie$GEN == "Berlin", "long"] <- 13.6
hare_final_pie[hare_final_pie$GEN == "Nordrhein-Westfalen", "long"] <- 6.8
hare_final_pie[hare_final_pie$GEN == "Hessen", "long"] <- 9.2
hare_final_pie[hare_final_pie$GEN == "Hessen", "lat"] <- 50.9
hare_final_pie[hare_final_pie$GEN == "Rheinland-Pfalz", "long"] <- 7
hare_final_pie[hare_final_pie$GEN == "Rheinland-Pfalz", "lat"] <- 50
hare_final_pie[hare_final_pie$GEN == "Baden-Wuerttemberg", "long"] <- 9.5
hare_final_pie[hare_final_pie$GEN == "Baden-Wuerttemberg", "lat"] <- 49
hare_final_pie[hare_final_pie$GEN == "Bayern", "long"] <- 11
hare_final_pie[hare_final_pie$GEN == "Bayern", "lat"] <- 49.6
Now, I am plottin the second plot with only the pie charts. Theoretically, one could plot these pie charts directly on top of another ggplot but here this doesn’t work because I am using conflicting fill attributes: one with a continuous scale for the polygons and another with a categorical scale for the pie charts. Therefore, I am overlaying the two plots instead.
hare_final_pie$radius <- 0.2
p2 <- ggplot() +
geom_scatterpie(data = hare_final_pie, aes(x = long, y = lat, group = GEN, r = radius), cols = colnames(hare_final_pie)[3:4]) +
map_theme +
coord_cartesian(ylim = c(47, 55.5), xlim = c(5, 15.5)) +
theme(legend.justification = "bottom") +
scale_fill_manual(values = c("black", "transparent")) +
labs(
title = "German hare populations in 2015/16 by federal state",
fill = ""
)
grid.newpage()
pushViewport(viewport(layout = grid.layout(1, 1 ,widths = unit(1 ,"npc"))))
print(p1 + theme(legend.position = "right"), vp = viewport(layout.pos.row = 1, layout.pos.col = 1))
print(p2 + theme(legend.position = "right"), vp = viewport(layout.pos.row = 1, layout.pos.col = 1))
sessionInfo()
## R version 3.3.3 (2017-03-06)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: macOS Sierra 10.12.3
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] grid stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] scatterpie_0.0.7 png_0.1-7 plyr_1.8.4 maptools_0.9-2
## [5] sp_1.2-4 dplyr_0.5.0 purrr_0.2.2 readr_1.1.0
## [9] tidyr_0.6.1 tibble_1.3.0 ggplot2_2.2.1 tidyverse_1.1.1
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.12.10 forcats_0.2.0 tools_3.3.3 digest_0.6.12
## [5] jsonlite_1.4 lubridate_1.6.0 evaluate_0.10 nlme_3.1-131
## [9] gtable_0.2.0 lattice_0.20-35 psych_1.7.3.21 DBI_0.6-1
## [13] rgdal_1.2-6 yaml_2.1.14 parallel_3.3.3 haven_1.0.0
## [17] xml2_1.1.1 stringr_1.2.0 httr_1.2.1 knitr_1.15.1
## [21] hms_0.3 rprojroot_1.2 R6_2.2.0 readxl_0.1.1
## [25] foreign_0.8-67 rmarkdown_1.4 udunits2_0.13 tweenr_0.1.5
## [29] modelr_0.1.0 reshape2_1.4.2 magrittr_1.5 units_0.4-3
## [33] MASS_7.3-45 backports_1.0.5 scales_0.4.1 htmltools_0.3.5
## [37] rvest_0.3.2 assertthat_0.2.0 mnormt_1.5-5 ggforce_0.1.1
## [41] colorspace_1.3-2 labeling_0.3 stringi_1.1.5 lazyeval_0.2.0
## [45] munsell_0.4.3 broom_0.4.2