Tidytuesday is a weekly data project initiated by the R4DS Online Learning Community. The idea behind tidytuesday is that individuals practice data analysis with the tidyverse packages.

This time I tried to tackle the tidytuesday on malaria. Here is the script.

Malaria is still a global disease that is caused by mosquito infections. People suffer form malaria mainly in Africa.

Let us load the packages and the data first:

library(tidyverse)
library(countrycode)
library(grid)
library(gridExtra)

The malaria_deaths.csv file has some weird column names. Let’s change those to meaningful column names.

malaria <- read_csv("malaria_deaths.csv") %>%
  setNames(c("country", "code", "year", "deaths"))

Let us see how many countries are in the dataset?

malaria %>%
  count(country)
# A tibble: 228 x 2
   country                  n
   <chr>                <int>
 1 Afghanistan             27
 2 Albania                 27
 3 Algeria                 27
 4 American Samoa          27
 5 Andean Latin America    27
 6 Andorra                 27
 7 Angola                  27
 8 Antigua and Barbuda     27
 9 Argentina               27
10 Armenia                 27
# ... with 218 more rows

About 228 countries with 27 data points each are in the dataset. I am curious to see which countries suffer the worst from malaria in terms of deaths per year:

(plagued_countries <- malaria %>%
  group_by(country) %>%
  summarise(
    deaths = sum(deaths)
  ) %>%
  arrange(desc(deaths)) %>%
  head)
# A tibble: 6 x 2
  country           deaths
  <chr>              <dbl>
1 Sierra Leone       4905.
2 Burkina Faso       4581.
3 Uganda             3977.
4 Equatorial Guinea  3853.
5 Cote d'Ivoire      3796.
6 Nigeria            3669.

Clearly the African countries suffer the most. It would be interesting to see how that looks like globally. Also, I’d like to see how the deaths by malaria changed during tha past 20 years:

world <- map_data("world")

malaria_some_years <- malaria %>%
  filter(year %in% c(1995, 2000, 2005, 2010))

deaths_per_year <- malaria_some_years %>%
  inner_join(iso3166 %>% select(a3, mapname), by = c(code = "a3")) %>%
  left_join(world, by = c(country = "region"))

ggplot(deaths_per_year, aes(long, lat, map_id = mapname, 
                                                fill = deaths)) +
  geom_map(map = world) +
  scale_fill_gradient(low = "blue", high = "red") +
  theme_void() +
  coord_map(xlim = c(-180, 180)) +
  labs(fill = "Deaths per year") +
  facet_wrap(~ year, ncol = 2)

It did not improve drastically. And the worst regions are Africa and Sout-East Asia. Let us have a closer look at Africa:

deaths_africa_per_year <- deaths_per_year %>%
  mutate(
    countrycode(deaths_per_year$mapname, 
                origin = "country.name", 
                destination = "continent")) %>%
  rename(continent = `countrycode(...)`) %>%
  filter(continent == "Africa")

ggplot(deaths_africa_per_year, aes(long, lat, map_id = mapname, 
                                   fill = deaths)) +
  geom_map(map = world) +
  scale_fill_gradient(low = "blue", high = "red") +
  theme_void() +
  coord_map(xlim = c(-80, 80)) +
  labs(fill = "Deaths per year") +
  facet_wrap(~ year, ncol = 2)

Again, there is some improvement, but not on a massive scale. Especially the western part of mid Africa seems to be strongly affected by malaria.

A line chart might be a better option to see how the deaths by malaria changed over time:

malaria %>%
  group_by(year) %>%
  summarise(deaths = sum(deaths)) %>%
  ggplot(aes(x = year, y = deaths)) +
  geom_line() +
  scale_y_continuous(limits = c(0, 4500)) +
  scale_x_continuous(breaks = seq(1990, 2016, by = 5)) +
  labs(title = "Malaria deaths per year worldwide",
       y = "Deaths",
       x = "Year")

Indeed, it improved, but not by a strong margin. Currently about 2400 people die from malaria every year around the globe. In terms of the most plagued countries, how they improve over the past 25 years?

malaria %>%
  group_by(year, country) %>%
  summarise(deaths = sum(deaths)) %>%
  arrange(country, year) %>%
  filter(country %in% plagued_countries$country) %>%
  ungroup() %>%
  mutate(
    improvement = deaths - lag(deaths)
  ) %>%
  ggplot(aes(x = year, y = improvement, colour = country)) +
  geom_line() +
  ylab("Improvements (lower = better)")

There was some improvements over the past 10 years, but not much really. Especially Equatorial Guinea doesn’t have a steady change in malaria deaths.