A common problem for many who work with graphs in ggplot2 is plotting areas under a curve. I guess the main reason for this problem is that plotting areas under curve works somewhat differently than plotting histograms, barcharts, or line charts. Let’s dig in.

A very neat way of solving the problem is to use the stat_function function. stat_function allows you to plot arbitrary functions. We might for example plot a simple polynomial on top of an empty plot:

ggplot(NULL, aes(x = c(-20, 20))) +
  stat_function(fun = function(x) { x**3 },
                geom = "line")

You have to provide ggplot with at least the boundaries of your x axis, otherwise it will show you nothing. The first argument is fun. You could either make up your own functions from scratch (like in the example above) or you use predefined functions, such as dnorm:

ggplot(NULL, aes(x = c(-3, 3))) +
  stat_function(fun = dnorm,
                geom = "line")

Now that we know what the fun argument does, let’s look at the geom argument. With geom you tell ggplot how your function should be visualized. The most common visuals are probably an area, a line or points. Since we are interested in areas under the curve, let’s do that:

ggplot(NULL, aes(x = c(-3, 3))) +
  stat_function(fun = dnorm,
                geom = "area")

That’s better. But let’s plot it with another color. I like steelblue:

ggplot(NULL, aes(x = c(-3, 3))) +
  stat_function(fun = dnorm,
                geom = "area",
                fill = "steelblue")

But know you might say, that’s all fine, however, some functions have arguments, where should I put these? For such a problem stat_function comes with the args argument:

ggplot(NULL, aes(x = c(20, 180))) +
  stat_function(fun = dnorm,
                geom = "area",
                fill = "steelblue",
                args = list(
                  mean = 100,
                  sd = 20
                ))

See, how the values on the x axis changed. I also adjusted the x aesthetics.

But what about specific areas under the curve not the whole curve?

True, sometimes we are only interested in the area under a curve for specific values on the x axis. For that, we need a simple twist, the xlim argument:

ggplot(NULL, aes(x = c(-3, 3))) +
  stat_function(fun = dnorm,
                geom = "area",
                fill = "steelblue",
                xlim = c(0, 3))

Oh, that didn’t work. Somehow ggplot ignored the negative values on my x axis. But we can easily fix that with xlim:

ggplot(NULL, aes(x = c(-3, 3))) +
  stat_function(fun = dnorm,
                geom = "area",
                fill = "steelblue",
                xlim = c(0, 3)) +
  xlim(-3, 3)

Wow, it worked. We have an area under a curve. Let’s get more fancy and visualize the left part of my standard normal distribution but with a line from -10 to 0:

ggplot(NULL, aes(x = c(-3, 3))) +
  stat_function(fun = dnorm,
                geom = "line",
                xlim = c(-10, 0)) +
  stat_function(fun = dnorm,
                geom = "area",
                fill = "steelblue",
                xlim = c(0, 3)) +
  xlim(-3, 3)

Let’s make a more complicated plots. Imagine you want to show your students how the critical t value of a t-distributions changes with different degrees of freedom:

t_critical_df_5 <- qt(0.95, df = 5)
t_critical_df_25 <- qt(0.95, df = 25)

ggplot(NULL, aes(x = c(-4, 4))) +
  # T-distribution with 5 degrees of freedom
  #   Non-significant area
  stat_function(fun = dt,
                geom = "line",
                xlim = c(-3, t_critical_df_5),
                args = list(
                  df = 5
                )) +
  #   Significant area
  stat_function(fun = dt,
                geom = "area",
                xlim = c(t_critical_df_5, 4),
                alpha = .2,
                fill = "orange",
                args = list(
                  df = 5
                )) +
  # t-distribution with 25 degrees of freedom
  #   Non-significant area
  stat_function(fun = dt,
                geom = "line",
                xlim = c(-4, t_critical_df_25),
                args = list(
                  df = 25
                )) +
  #   Significant area
  stat_function(fun = dt,
                geom = "area",
                xlim = c(t_critical_df_25, 4),
                alpha = .2,
                fill = "steelblue",
                args = list(
                  df = 25
                )) +
  xlim(-4, 4)

And that’s all the magic. I hope that next time it will be easier for you to plot areas under curves.