I create learning content for the web: Online courses, data visualizations, and tutorials.
November 25th, 2018
Tidytuesday is a weekly data project initiated by the R4DS Online Learning Community. The idea behind tidytuesday is that individuals practice data analysis with the tidyverse packages.
This time I tried to tackle the tidytuesday on malaria. Here is the script.
Malaria is still a global disease that is caused by mosquito infections. People suffer form malaria mainly in Africa.
Let us load the packages and the data first:
library(tidyverse)
library(countrycode)
library(grid)
library(gridExtra)
The malaria_deaths.csv
file has some weird column names. Let’s change those to meaningful column names.
malaria <- read_csv("malaria_deaths.csv") %>%
setNames(c("country", "code", "year", "deaths"))
Let us see how many countries are in the dataset?
malaria %>%
count(country)
# A tibble: 228 x 2
country n
<chr> <int>
1 Afghanistan 27
2 Albania 27
3 Algeria 27
4 American Samoa 27
5 Andean Latin America 27
6 Andorra 27
7 Angola 27
8 Antigua and Barbuda 27
9 Argentina 27
10 Armenia 27
# ... with 218 more rows
About 228 countries with 27 data points each are in the dataset. I am curious to see which countries suffer the worst from malaria in terms of deaths per year:
(plagued_countries <- malaria %>%
group_by(country) %>%
summarise(
deaths = sum(deaths)
) %>%
arrange(desc(deaths)) %>%
head)
# A tibble: 6 x 2
country deaths
<chr> <dbl>
1 Sierra Leone 4905.
2 Burkina Faso 4581.
3 Uganda 3977.
4 Equatorial Guinea 3853.
5 Cote d'Ivoire 3796.
6 Nigeria 3669.
Clearly the African countries suffer the most. It would be interesting to see how that looks like globally. Also, I’d like to see how the deaths by malaria changed during tha past 20 years:
world <- map_data("world")
malaria_some_years <- malaria %>%
filter(year %in% c(1995, 2000, 2005, 2010))
deaths_per_year <- malaria_some_years %>%
inner_join(iso3166 %>% select(a3, mapname), by = c(code = "a3")) %>%
left_join(world, by = c(country = "region"))
ggplot(deaths_per_year, aes(long, lat, map_id = mapname,
fill = deaths)) +
geom_map(map = world) +
scale_fill_gradient(low = "blue", high = "red") +
theme_void() +
coord_map(xlim = c(-180, 180)) +
labs(fill = "Deaths per year") +
facet_wrap(~ year, ncol = 2)
It did not improve drastically. And the worst regions are Africa and Sout-East Asia. Let us have a closer look at Africa:
deaths_africa_per_year <- deaths_per_year %>%
mutate(
countrycode(deaths_per_year$mapname,
origin = "country.name",
destination = "continent")) %>%
rename(continent = `countrycode(...)`) %>%
filter(continent == "Africa")
ggplot(deaths_africa_per_year, aes(long, lat, map_id = mapname,
fill = deaths)) +
geom_map(map = world) +
scale_fill_gradient(low = "blue", high = "red") +
theme_void() +
coord_map(xlim = c(-80, 80)) +
labs(fill = "Deaths per year") +
facet_wrap(~ year, ncol = 2)
Again, there is some improvement, but not on a massive scale. Especially the western part of mid Africa seems to be strongly affected by malaria.
A line chart might be a better option to see how the deaths by malaria changed over time:
malaria %>%
group_by(year) %>%
summarise(deaths = sum(deaths)) %>%
ggplot(aes(x = year, y = deaths)) +
geom_line() +
scale_y_continuous(limits = c(0, 4500)) +
scale_x_continuous(breaks = seq(1990, 2016, by = 5)) +
labs(title = "Malaria deaths per year worldwide",
y = "Deaths",
x = "Year")
Indeed, it improved, but not by a strong margin. Currently about 2400 people die from malaria every year around the globe. In terms of the most plagued countries, how they improve over the past 25 years?
malaria %>%
group_by(year, country) %>%
summarise(deaths = sum(deaths)) %>%
arrange(country, year) %>%
filter(country %in% plagued_countries$country) %>%
ungroup() %>%
mutate(
improvement = deaths - lag(deaths)
) %>%
ggplot(aes(x = year, y = improvement, colour = country)) +
geom_line() +
ylab("Improvements (lower = better)")
There was some improvements over the past 10 years, but not much really. Especially Equatorial Guinea doesn’t have a steady change in malaria deaths.