Introduction to Statistics (MAT/SST 115.03 2008S)

Stacked Bar Graphs in R


Stacked Bar Graphs (also called Segmented Bar Graphs) are a data visualization technique that can be useful for studying two-way tables. In a stacked bar plot, we use one bar for each value of the explanatory variable (as in simple bar plots). However, the bar is segmented into multiple parts, one for each value of the response variable. In Workshop Statistics, Edition 3, stacked bar graphs are introduced in Topic 6.

In R, you make stacked bar graphs using the barplot function. However, instead of providing barplot with a vector to plot, you provide it with a modified frame. (How do you modify the frame? You call as.matrix on that frame. Don't ask why.) The barplot function plots each column in the table as a bar. So, you first create your table using whatever method you find easiest:

Once you've read in the data, the basic command is fairly straightforward.

barplot(as.matrix(EnvironmentSpending))

Of course, this is R, so there are dozens of options to make the barplot more interesting. Here are a few that might be helpful.

As in the past, main="Title" puts a title on the graph.

It's usually helpful if stacked bar plots included a legend that explains what the parts of each a stacked bar represents. The legend=... option adds the legend. As you might expect, we'll use a vector of strings for that legend. You can create the vector directly.

barplot(as.matrix(EnvironmentSpending),
 legend=c("Too Little", "About Right", "Too Much")
)

If we've labeled the rows of our table, we can also grab use the vector of row names.

barplot(as.matrix(EnvironmentSpending),
 legend=rownames(EnvironmentSpending)
)

We can even recolor the parts of the stacked bar graph using the col=.... option. Once again, we provide a vector of strings that name colors, such as c("green","grey","red"). You can look up possible names using colors().

Putting it all together, we get the following as a relatively complete bar graph for activity 6-1.

barplot(as.matrix(EnvironmentSpending),
  main="Political Perspectives on Environment Spending",
  legend=rownames(EnvironmentSpending),
  col=c("green","grey","red")
)

Better Legends

Unfortunately, the designers of R seem to have chosen a bad default place for the legend. In this particular example, it may obscure the separation between two parts of the stacked bar.

The solution is to tell R a bit more about how you expect the graph to look. In particular, you can use xlim=c(lower,upper) to specify the horizontal “limits” of the image and width=num to specify the width of each bar. For example, if we have three bars, we might make each one two units wide and allow nine units of space.

barplot(as.matrix(EnvironmentSpending),
  main="Political Perspectives on Environment Spending",
  legend=rownames(EnvironmentSpending),
  col=c("green","grey","red"),
  xlim=c(0,9), width=2
)

Alternate Displays

You may also find it useful to place the various bars for one value (column) side-by-side, rather than stacked on top of each other. For that, you use the parameter besid=T. For example,

barplot(as.matrix(EnvironmentSpending), besid=T)

It is worth trying this alternate at least once, just to see the difference.

Creative Commons License

Samuel A. Rebelsky, rebelsky@grinnell.edu

Copyright (c) 2007-8 Samuel A. Rebelsky.

This work is licensed under a Creative Commons Attribution-NonCommercial 2.5 License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc/2.5/ or send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.