This weeks tidytuesday is about #rstats tweets. The dataset comes from Mike Kearney and incorporates tweets on #rstats and #tidytuesday from 2009 to 2018. The dataset contains more than 80 columns. Here are the questions I wanted to answer. I decided to concentrate myself on #rstats.

library(tidyverse)
library(lubridate)
library(hrbrthemes)
library(tidytext)
library(lubridate)
library(cowplot)
theme_set(theme_ipsum())

tweets <- read_rds("rstats_tweets.rds")

At what time of the day do the top 20 tweeters tweet?

I had no idea who is tweeting about #rstats when I first looked at the dataset. It turned out that most “people” are not real people but organizations since they tweet about #rstats all the time. Have a look:

top_contributors <- tweets %>% 
  add_count(screen_name) %>%
  group_by(screen_name) %>%
  nest() %>%
  mutate(
    n_tweets = data %>% map_dbl(~ nrow(.))
  ) %>%
  arrange(desc(n_tweets)) %>%
  slice(1:20)

top_contributors_unnested <- top_contributors %>%
  unnest() %>%
  mutate(
    hour = hour(created_at),
    wday = wday(created_at, label = TRUE)
  )

top_contributors_unnested %>%
  select(screen_name, wday, hour) %>%
  count(screen_name, wday, hour) %>%
  mutate(
    wday = fct_relevel(wday, c("So", "Sa", "Fr", "Do", "Mi", "Di", "Mo"))
  ) %>%
  ggplot(aes(x = hour, y = wday)) +
  geom_tile(aes(fill = n), color = "white") +
  scale_fill_gradient(low = "#E55D87", high = "#5FC3E4") +
  facet_wrap(~ screen_name) + 
  theme_ipsum() +
  theme(plot.subtitle = element_text(vjust = 1), 
    plot.caption = element_text(vjust = 1), 
    axis.title = element_text(colour = "beige"), 
    axis.text = element_text(colour = "white"), 
    axis.text.x = element_text(size = 11.5), 
    axis.text.y = element_text(size = 11.5), 
    plot.title = element_text(colour = "beige"), 
    legend.text = element_text(colour = "beige"), 
    legend.title = element_text(colour = "beige"), 
    plot.background = element_rect(fill = "black"), 
    strip.text.x = element_text(colour = "white", face = "bold"),
    legend.background = element_rect(fill = "black", 
        colour = "black")) +
  labs(title = "When do the top 20 tweeters tweet?",
     x = "Hour", 
     y = "Wday", 
     fill = "Number of posts")

It seems like pranayroy01, revodavid, RLangTip, and thinkR_fr are real people. At least according to the hours they tweet. Most tweets come from CranberriesFeed and Rbloggers.

From where to most tweeters post?

The dataset does not contain many coordinates from where the #rstats tweeters tweet. Still, I found about 1000 locations and plotted them on a world map:

longitude_latitude <- tweets %>%
  select(screen_name, geo_coords, favorite_count) %>%
  mutate(
    latitude = geo_coords %>% map_dbl(~ .x[1]),
    longitude = geo_coords %>% map_dbl(~ .x[2])
  ) %>%
  drop_na(longitude, latitude) 

world <- map_data("world") %>%
  filter(region != "Antarctica")

p <- ggplot() + 
  geom_map(data = world, map = world,
             aes(long, lat, group = group, map_id = region),
             fill = "#dbdbdb", color = "#7f7f7f") + 
  scale_y_continuous(breaks=c()) +
  scale_x_continuous(breaks=c()) +
  labs(x = "", y = "") +
  coord_map(xlim = c(-180, 180),
            ylim = c(-200, 200)) +
  geom_point(data = longitude_latitude, 
             aes(x = longitude,
                 y = latitude,
                 size = favorite_count),
             alpha = .8,
             color = "#E55D87") +
  theme(
    plot.background = element_rect(fill = "black")
  )

ggdraw(p) +
  theme(
    plot.background = element_rect(fill = "black")
  )

Well, #rtweets is dominated by United States and the European countries. Some people tweet from India, Southern America, and China. Let me have a closer look at Europe:

europe_countries <- c("Austria","Belgium","Bulgaria","Croatia","Cyprus",
                   "Czech Republic","Denmark","Estonia","Finland","France",
                   "Germany","Greece","Hungary","Ireland","Italy","Latvia",
                   "Lithuania","Luxembourg","Malta","Netherlands","Poland",
                   "Portugal","Romania","Slovakia","Slovenia","Spain",
                   "Sweden","UK", "Switzerland", "Poland")

europe <- world %>%
  filter(region %in% europe_countries)

p <- ggplot() + 
  geom_map(data = europe, map = europe,
             aes(long, lat, group = group, map_id = region),
             fill = "#dbdbdb", color = "#7f7f7f") + 
  scale_y_continuous(breaks=c()) +
  scale_x_continuous(breaks=c()) +
  labs(x = "", y = "") +
  coord_map(xlim = c(-20, 40),
            ylim = c(30, 70)) +
  geom_point(data = longitude_latitude, 
             aes(x = longitude,
                 y = latitude,
                 size = favorite_count),
             alpha = .8,
             color = "#E55D87") +
  guides(size = FALSE) +
  theme(
    plot.background = element_rect(fill = "black"),
    plot.subtitle = element_text(vjust = 1), 
    plot.caption = element_text(vjust = 1), 
    legend.text = element_text(colour = "beige"), 
    legend.title = element_text(colour = "beige"), 
    legend.background = element_rect(colour = "black"),
    legend.position = "bottom") +
  labs(x = NULL, y = NULL)

ggdraw(p) +
  theme(
    plot.background = element_rect(fill = "black")
  )

Interestingly, Switzerland and Belgium has quite a reputation in terms of popular tweets on #rstats. Is it because Datacamp has one of its headquarter there?

What were the top 10 hashtags mentioned in #rstats tweets in the past years?

There must have been some trends in the most popular hashtags in the past years. Let’s see:

Indeed, tidyverse gained ground in 2018, before that year, it was not in the top 10. In the years prior, ggplot2 was the new hot thing in #rstats. In terms of other programming languages, javascript made it into the top 10 this year. There seems to be an everlasting debate about #rstats and #python since python creeps up again and again in the top 10 hashtags. in 2018 tensorflow was a hot topic, I guess because there is an R interface by now.

How often have people tweeted about #rstats in the past years?

tweets %>%
  select(created_at) %>%
  mutate(
    month = created_at %>% round_date("month")
  ) %>%
  count(month) %>%
  ggplot(aes(month, n)) +
  geom_area(fill = "#E55D87") +
  labs(
    x = "Month",
    y = "Number of tweets",
    title = "Number of #rstats tweets from 2010 to 2018"
  ) +
  theme_ipsum() +
  theme(plot.subtitle = element_text(vjust = 1), 
    plot.caption = element_text(vjust = 1), 
    axis.title = element_text(colour = "beige"), 
    axis.text = element_text(colour = "white"), 
    axis.text.x = element_text(size = 11.5), 
    axis.text.y = element_text(size = 11.5), 
    plot.title = element_text(colour = "beige"), 
    legend.text = element_text(colour = "beige"), 
    legend.title = element_text(colour = "beige"), 
    plot.background = element_rect(fill = "black"), 
    strip.text.x = element_text(colour = "white", 
                                face = "bold"),
    legend.background = element_rect(fill = "black", 
        colour = "black"))

#rstats has become more and more popular in the past years. There has been a steady increase in the number of #rstats tweets and the trend does not indicate to decrease any time soon. Let’s hope so.