Barplots and position adjustments in ggplot2

Data Visualisation with R

A barplot with geom_bar() with a categorical variable

library(palmerpenguins)
ggplot(penguins, aes(x = island)) +
  geom_bar()

  • If you have a categorical variable, then you usually want to study the frequency of its categories.
  • Here the stat = "count" is computing the frequencies for each category for you.

A barplot with geom_bar() with a discrete numerical variable

penguins %>% 
  # for demonstration, change 2009 to 2012
  mutate(year = ifelse(year==2009, 2012, year)) %>% 
  ggplot(aes(x = year)) +
  geom_bar()

  • If you supply a numerical variable, you can see now that the x-axis scale is continuous.
  • If you want to study each level in a discrete variable, then you may want to convert the discrete variable to a factor instead x = factor(year).
  • When the variable is a factor or character, the distances between the bars are equal and the labels correspond to that particular level.

A barplot with geom_col()

  • Sometimes your input data may already contain pre-computed counts.
penguins_summary <- penguins %>% group_by(sex) %>% tally() 

penguins_summary
# A tibble: 3 × 2
  sex        n
  <fct>  <int>
1 female   165
2 male     168
3 <NA>      11
  • In this case, you don’t need stat = "count" to do the counting for you and use geom_col() instead.
ggplot(penguins_summary, 
       aes(x = sex, y = n)) +
  geom_col()

  • This is essential a short hand for geom_bar(stat = "identity") where stat = "identity" means that you will take the value as supplied without any statistical transformation.

A stacked barplot with geom_col()

penguins %>% 
  group_by(species, sex, year) %>% 
  tally() %>% 
  ggplot(aes(year, n, fill = sex, group = year, color = species)) +
  geom_col(position = "stack", linewidth = 8) +
  geom_col(position = "stack", linewidth = 1, color = "black")

  • By default the values in y are stacked on top of another.
  • The aesthetic group here breaks the count in two groups and stack one on top of the other (try running the code without group = year).

A grouped barplot with geom_col()

penguins %>% 
  group_by(sex, species, year) %>% 
  tally() %>% 
  ggplot(aes(sex, n, fill = species)) +
  geom_col(color = "black", position = "dodge")

  • Here the x values are recalculated so that the factor levels within the same group (as determined by x) can fit.

Another grouped barplot with geom_col()

penguins %>% 
  group_by(sex, species, year) %>% 
  tally() %>% 
  ggplot(aes(sex, n, fill = species, group = year)) +
  geom_col(color = "black", position = "dodge2")

  • position = "dodge" doesn’t deal well when there is fill and group together but you can use position = "dodge2" that recalculates the x values in another way.

Stacked percentage barplot with geom_col()

penguins %>% 
  group_by(species, sex, year) %>% 
  tally() %>% 
  ggplot(aes(sex, n, fill = species, group = year)) +
  geom_col(color = "black", position = "fill")

  • If you want to compare the percentages between the different x, then position = "fill" can be handy.

Summary

  • geom_bar() is used to create barplots with categorical variables.
  • geom_col() is used to create barplots with pre-computed counts.
  • position argument in geom_col() can be used to adjust the position of the bars.