I create learning content for the web: Online courses, data visualizations, and tutorials.
January 5th, 2019
This weeks tidytuesday is about #rstats tweets. The dataset comes from Mike Kearney and incorporates tweets on #rstats and #tidytuesday from 2009 to 2018. The dataset contains more than 80 columns. Here are the questions I wanted to answer. I decided to concentrate myself on #rstats.
library(tidyverse)
library(lubridate)
library(hrbrthemes)
library(tidytext)
library(lubridate)
library(cowplot)
theme_set(theme_ipsum())
tweets <- read_rds("rstats_tweets.rds")
I had no idea who is tweeting about #rstats when I first looked at the dataset. It turned out that most “people” are not real people but organizations since they tweet about #rstats all the time. Have a look:
top_contributors <- tweets %>%
add_count(screen_name) %>%
group_by(screen_name) %>%
nest() %>%
mutate(
n_tweets = data %>% map_dbl(~ nrow(.))
) %>%
arrange(desc(n_tweets)) %>%
slice(1:20)
top_contributors_unnested <- top_contributors %>%
unnest() %>%
mutate(
hour = hour(created_at),
wday = wday(created_at, label = TRUE)
)
top_contributors_unnested %>%
select(screen_name, wday, hour) %>%
count(screen_name, wday, hour) %>%
mutate(
wday = fct_relevel(wday, c("So", "Sa", "Fr", "Do", "Mi", "Di", "Mo"))
) %>%
ggplot(aes(x = hour, y = wday)) +
geom_tile(aes(fill = n), color = "white") +
scale_fill_gradient(low = "#E55D87", high = "#5FC3E4") +
facet_wrap(~ screen_name) +
theme_ipsum() +
theme(plot.subtitle = element_text(vjust = 1),
plot.caption = element_text(vjust = 1),
axis.title = element_text(colour = "beige"),
axis.text = element_text(colour = "white"),
axis.text.x = element_text(size = 11.5),
axis.text.y = element_text(size = 11.5),
plot.title = element_text(colour = "beige"),
legend.text = element_text(colour = "beige"),
legend.title = element_text(colour = "beige"),
plot.background = element_rect(fill = "black"),
strip.text.x = element_text(colour = "white", face = "bold"),
legend.background = element_rect(fill = "black",
colour = "black")) +
labs(title = "When do the top 20 tweeters tweet?",
x = "Hour",
y = "Wday",
fill = "Number of posts")
It seems like pranayroy01, revodavid, RLangTip, and thinkR_fr are real people. At least according to the hours they tweet. Most tweets come from CranberriesFeed and Rbloggers.
The dataset does not contain many coordinates from where the #rstats tweeters tweet. Still, I found about 1000 locations and plotted them on a world map:
longitude_latitude <- tweets %>%
select(screen_name, geo_coords, favorite_count) %>%
mutate(
latitude = geo_coords %>% map_dbl(~ .x[1]),
longitude = geo_coords %>% map_dbl(~ .x[2])
) %>%
drop_na(longitude, latitude)
world <- map_data("world") %>%
filter(region != "Antarctica")
p <- ggplot() +
geom_map(data = world, map = world,
aes(long, lat, group = group, map_id = region),
fill = "#dbdbdb", color = "#7f7f7f") +
scale_y_continuous(breaks=c()) +
scale_x_continuous(breaks=c()) +
labs(x = "", y = "") +
coord_map(xlim = c(-180, 180),
ylim = c(-200, 200)) +
geom_point(data = longitude_latitude,
aes(x = longitude,
y = latitude,
size = favorite_count),
alpha = .8,
color = "#E55D87") +
theme(
plot.background = element_rect(fill = "black")
)
ggdraw(p) +
theme(
plot.background = element_rect(fill = "black")
)
Well, #rtweets is dominated by United States and the European countries. Some people tweet from India, Southern America, and China. Let me have a closer look at Europe:
europe_countries <- c("Austria","Belgium","Bulgaria","Croatia","Cyprus",
"Czech Republic","Denmark","Estonia","Finland","France",
"Germany","Greece","Hungary","Ireland","Italy","Latvia",
"Lithuania","Luxembourg","Malta","Netherlands","Poland",
"Portugal","Romania","Slovakia","Slovenia","Spain",
"Sweden","UK", "Switzerland", "Poland")
europe <- world %>%
filter(region %in% europe_countries)
p <- ggplot() +
geom_map(data = europe, map = europe,
aes(long, lat, group = group, map_id = region),
fill = "#dbdbdb", color = "#7f7f7f") +
scale_y_continuous(breaks=c()) +
scale_x_continuous(breaks=c()) +
labs(x = "", y = "") +
coord_map(xlim = c(-20, 40),
ylim = c(30, 70)) +
geom_point(data = longitude_latitude,
aes(x = longitude,
y = latitude,
size = favorite_count),
alpha = .8,
color = "#E55D87") +
guides(size = FALSE) +
theme(
plot.background = element_rect(fill = "black"),
plot.subtitle = element_text(vjust = 1),
plot.caption = element_text(vjust = 1),
legend.text = element_text(colour = "beige"),
legend.title = element_text(colour = "beige"),
legend.background = element_rect(colour = "black"),
legend.position = "bottom") +
labs(x = NULL, y = NULL)
ggdraw(p) +
theme(
plot.background = element_rect(fill = "black")
)
Interestingly, Switzerland and Belgium has quite a reputation in terms of popular tweets on #rstats. Is it because Datacamp has one of its headquarter there?
There must have been some trends in the most popular hashtags in the past years. Let’s see:
Indeed, tidyverse gained ground in 2018, before that year, it was not in the top 10. In the years prior, ggplot2 was the new hot thing in #rstats. In terms of other programming languages, javascript made it into the top 10 this year. There seems to be an everlasting debate about #rstats and #python since python creeps up again and again in the top 10 hashtags. in 2018 tensorflow was a hot topic, I guess because there is an R interface by now.
tweets %>%
select(created_at) %>%
mutate(
month = created_at %>% round_date("month")
) %>%
count(month) %>%
ggplot(aes(month, n)) +
geom_area(fill = "#E55D87") +
labs(
x = "Month",
y = "Number of tweets",
title = "Number of #rstats tweets from 2010 to 2018"
) +
theme_ipsum() +
theme(plot.subtitle = element_text(vjust = 1),
plot.caption = element_text(vjust = 1),
axis.title = element_text(colour = "beige"),
axis.text = element_text(colour = "white"),
axis.text.x = element_text(size = 11.5),
axis.text.y = element_text(size = 11.5),
plot.title = element_text(colour = "beige"),
legend.text = element_text(colour = "beige"),
legend.title = element_text(colour = "beige"),
plot.background = element_rect(fill = "black"),
strip.text.x = element_text(colour = "white",
face = "bold"),
legend.background = element_rect(fill = "black",
colour = "black"))
#rstats has become more and more popular in the past years. There has been a steady increase in the number of #rstats tweets and the trend does not indicate to decrease any time soon. Let’s hope so.