I create learning content for the web: Online courses, data visualizations, and tutorials.

November 19th, 2018

A common problem for many who work with graphs in ggplot2 is plotting areas under a curve. I guess the main reason for this problem is that plotting areas under curve works somewhat differently than plotting histograms, barcharts, or line charts. Let’s dig in.

A very neat way of solving the problem is to use the stat_function function. stat_function allows you to plot arbitrary functions. We might for example plot a simple polynomial on top of an empty plot:

```
ggplot(NULL, aes(x = c(-20, 20))) +
stat_function(fun = function(x) { x**3 },
geom = "line")
```

You have to provide ggplot with at least the boundaries of your x axis, otherwise it will show you nothing. The first argument is **fun**. You could either make up your own functions from scratch (like in the example above) or you use predefined functions, such as dnorm:

```
ggplot(NULL, aes(x = c(-3, 3))) +
stat_function(fun = dnorm,
geom = "line")
```

Now that we know what the **fun** argument does, let’s look at the **geom** argument. With geom you tell ggplot how your function should be visualized. The most common visuals are probably an *area*, a *line* or *point*s. Since we are interested in areas under the curve, let’s do that:

```
ggplot(NULL, aes(x = c(-3, 3))) +
stat_function(fun = dnorm,
geom = "area")
```

That’s better. But let’s plot it with another color. I like steelblue:

```
ggplot(NULL, aes(x = c(-3, 3))) +
stat_function(fun = dnorm,
geom = "area",
fill = "steelblue")
```

But know you might say, that’s all fine, however, some functions have arguments, where should I put these? For such a problem stat_function comes with the **args** argument:

```
ggplot(NULL, aes(x = c(20, 180))) +
stat_function(fun = dnorm,
geom = "area",
fill = "steelblue",
args = list(
mean = 100,
sd = 20
))
```

See, how the values on the x axis changed. I also adjusted the x aesthetics.

True, sometimes we are only interested in the area under a curve for specific values on the x axis. For that, we need a simple twist, the **xlim** argument:

```
ggplot(NULL, aes(x = c(-3, 3))) +
stat_function(fun = dnorm,
geom = "area",
fill = "steelblue",
xlim = c(0, 3))
```

Oh, that didn’t work. Somehow ggplot ignored the negative values on my x axis. But we can easily fix that with **xlim**:

```
ggplot(NULL, aes(x = c(-3, 3))) +
stat_function(fun = dnorm,
geom = "area",
fill = "steelblue",
xlim = c(0, 3)) +
xlim(-3, 3)
```

Wow, it worked. We have an area under a curve. Let’s get more fancy and visualize the left part of my standard normal distribution but with a line from -10 to 0:

```
ggplot(NULL, aes(x = c(-3, 3))) +
stat_function(fun = dnorm,
geom = "line",
xlim = c(-10, 0)) +
stat_function(fun = dnorm,
geom = "area",
fill = "steelblue",
xlim = c(0, 3)) +
xlim(-3, 3)
```

Let’s make a more complicated plots. Imagine you want to show your students how the critical t value of a t-distributions changes with different degrees of freedom:

```
t_critical_df_5 <- qt(0.95, df = 5)
t_critical_df_25 <- qt(0.95, df = 25)
ggplot(NULL, aes(x = c(-4, 4))) +
# T-distribution with 5 degrees of freedom
# Non-significant area
stat_function(fun = dt,
geom = "line",
xlim = c(-3, t_critical_df_5),
args = list(
df = 5
)) +
# Significant area
stat_function(fun = dt,
geom = "area",
xlim = c(t_critical_df_5, 4),
alpha = .2,
fill = "orange",
args = list(
df = 5
)) +
# t-distribution with 25 degrees of freedom
# Non-significant area
stat_function(fun = dt,
geom = "line",
xlim = c(-4, t_critical_df_25),
args = list(
df = 25
)) +
# Significant area
stat_function(fun = dt,
geom = "area",
xlim = c(t_critical_df_25, 4),
alpha = .2,
fill = "steelblue",
args = list(
df = 25
)) +
xlim(-4, 4)
```

And that’s all the magic. I hope that next time it will be easier for you to plot areas under curves.