How to make bubble charts in ggplot2

FlowingData has a great tutorial on making bubble charts in R. Bubble charts are like x-y scatterplots with an additional value mapped to the size of each dot (or “bubble”).

The tutorial produces a clean and professional-looking plot.  When working in R, however, there are often many ways to do a single task.  My preferred tool for this task is ggplot2.

With the same dataset FlowingData used in their example, I used ggplot2 code to create the bubble chart:

#updated so the following will run in ggplot2 1.0.0 (thanks, Ulrich!)
crime <-read.csv("http://datasets.flowingdata.com/crimeRatesByState2005.tsv", header=TRUE, sep="\t")
ggplot(crime, aes(x=murder, y=burglary, size=population, label=state),guide=FALSE)+
geom_point(colour="white", fill="red", shape=21)+ scale_size_area(max_size = 15)+
scale_x_continuous(name="Murders per 1,000 population", limits=c(0,12))+
scale_y_continuous(name="Burglaries per 1,000 population", limits=c(0,1250))+
geom_text(size=4)+
theme_bw()

One of my favorite things about ggplot2 is the flexible and consistent framework:

  • scale_area() automatically scales the bubbles to reflect differences in terms of areas (instead of radius). However, I still made an arbitrary decision by specifying the range of minimum and maximum sizes
  • the aesthetics defined in the main ggplot() command are applied to the rest of the arguments, unless overridden (If I wouldn’t have specified the geom_text size=4, the text would instead use size=population. Similarly, I specified color in geom_point to produce red bubbles–without affecting the text color)
  • theme_bw() removes the default grey background (which I often prefer).
  • using shape 21 (instead of default circle) allows me to set the outline of the shape to white, and fill the circle in red

This is what we get:

Not a bad start, but it could benefit from carefully repositioning the labels. Also, I’m pretty sure these rates are per 100,000 (not 1,000) otherwise the average person in NC would be a victim of burglary 1.2 times per year. (The FD post has since been updated.)

I also started the y-axis at 0. I’m not sure what these relationships are supposed to mean, but I think it is helps emphasize that there is roughly a 4-fold difference in state burglary rates, and that low crime areas still have some crime.

In the comments of the FlowinData post, someone posted code that handles much of the text repositioning in R.  It is always especially satisfying when there is a way to do all the manipulation through code, without directly “touching” the data.

Here is another example of a bubble plot from our recent paper:

Maenner & Durkin 2010, click for larger

Edits:Reference for above figure: Maenner MJ, Durkin MS. Trends in the prevalence of autism on the basis of special education data. Pediatrics. 2010;126(5):e1018–e1025.

In the time since I posted this, ggplot2 has updated, and the original code now produces an error. It’s been updated to work in ggplot2 0.9.1

The last figure was made 3+ years ago. I’ve updated the code to produce essentially the same plot with ggplot2 0.9.3.1:

ggplot(asd_data, aes(x=prev2003, y=asd_diff, weight=denom2003, colour=octile, size=denom2003)) + 
  geom_point( alpha=0.8, guide="none") + 
  scale_size_area(breaks=c(250, 500, 1000, 10000, 50000), "2002 District\nElementary School\nPopulation", max_size=20) + 
  stat_smooth(method="rlm", size=0.5, colour="black", alpha=0.4, level=0.95)+
  scale_colour_brewer(palette="Spectral", type="qual",name="2002 Autism\nPrevalence Octile") + 
  coord_equal(ratio=1/2)+
  guides(colour = guide_legend(override.aes = list(alpha = 1)))+
  ggtitle("Figure 4. Change in Autism Prevalence between 2002 and 2008 vs Baseline (2002) Prevalence,\n 
       Wisconsin Elementary School Districts (with weighted linear best-fit line and 95% confidence band)") +
  scale_x_continuous("2002 Autism Prevalence (per 1,000)") + 
  scale_y_continuous("Change in Autism Prevalence (per 1,000) between 2002 and 2008")

8 thoughts on “How to make bubble charts in ggplot2

  1. It’s cool that you’re using ggplot2, but it would be even cooler if you cited it in your publications. Citations make it easier for me to convince my colleagues that this is important and useful work and that I should keep doing it.

    • Hadley,

      Thanks for the comment–while I’ve only had one paper come out since using ggplot2 (so far), it has always been my intention to properly cite it. Although my paper mentions ggplot2 in the methods, the editor removed the citation (and ignored my request to put it back in the proofs). I don’t think that the editor understood that ggplot2 was separate from R (despite my suggestions).

      I will try to follow up with the editor again and ask if they can correct that citation. I’m sorry for this misunderstanding–I really admire your work and I understand the importance of having one’s work properly cited.

  2. Figure 3. Changes in special education….looks nice. However, I’m not sure I understand the story you’re trying to tell. Can you please give me the full ref of the paper to read it? Thanks

  3. When I try to reproduce your code, I got this message:

    Error in continuous_scale(“size”, “area”, area_pal(range), …) :
    unused argument(s) (to = c(1, 25))
    >

    this is the whole code:

    > ggplot(crime, aes(x=murder, y=burglary, size=population, label=state, shape=21))+
    + geom_point(colour=”white”, fill=”red”)+scale_area(to=c(1,25))+
    + scale_x_continuous(name=”Murders per 1,000 population”, limits=c(0,12))+
    + scale_y_continuous(name=”Burglaries per 1,000 population”,limits=c(0,1250))+
    + geom_text(size=4)+theme_bw()
    Error in continuous_scale(“size”, “area”, area_pal(range), …) :
    unused argument(s) (to = c(1, 25))
    >

  4. Ricardo,

    Thanks for raising these issues–
    I’ve included a full reference to the paper, and I’ve updated the code to run on ggplot2 0.9.1. The previous code was written for an older version (probably ~0.8.9 or so). The “to=” argument is now “range=”, and the “shape” argument has been relocated to geom_point.

    The updated code runs for me as of 5/25/2012.

  5. Hi Matthew,

    Thanks for updating this. The 3 figures in your paper are very informative (and a delight to the eye)…have you ever considered posting the code? I, and all the ggplot2 novices out there would have good examples to learn from.

    Cheers

    This would be a great contribution to all the novices out there like myself

    • Great, I’m glad it worked for you. I like your suggestion for putting some examples together for people who want to make these kind of figures–perhaps I could do it this summer. Hopefully you’ve found the ggplot2 site, which does a great job covering the basics.

Comments are closed.