ISSS608 Visual Analytics & Applications Coursework
by Wilson Tan
  • Hands-on Exercises
    • Week 1: Hands-on Exercise
    • Week 2: Hands-on Exercise
    • Week 3: Hands-on Exercise
    • Week 4: Hands-on Exercise
  • In-class Exercises
    • Week 1: In-class Exercise
    • Week 2: In-class Exercise
    • Week 3: In-class Exercise
    • Week 4: In-class Exercise
    • Week 5: In-class Exercise
    • Week 6: In-class Exercise
    • Week 7: In-class Exercise
  • Take-home Exercises
    • Take-home Exercise 01
    • Take-home Exercise 02
    • Take-home Exercise 03

Week 2: Hands-on Exercise

Disclaimer

Hands-on exercises are for my own practice and are ungraded. Thus, the plots and write-ups may be unrefined and poorly labelled.

Load dataset

exam_data <- read_csv("data/data_02/Exam_data.csv")
Rows: 322 Columns: 7
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (4): ID, CLASS, GENDER, RACE
dbl (3): ENGLISH, MATHS, SCIENCE

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Why use ggrepel?

When there is a large number of data points, it may become difficult to annotate the graph using traditional ggplot2:

  • Plot
  • Code

ggplot(data=exam_data, 
       aes(x= MATHS, 
           y=ENGLISH)) +
  geom_point() +
  geom_smooth(method=lm, 
              linewidth=0.5) +  
  geom_label(aes(label = ID), 
             hjust = .5, 
             vjust = -.5) +
  coord_cartesian(xlim=c(0,100),
                  ylim=c(0,100)) +
  ggtitle("English scores versus Maths scores for Primary 3")

To use ggrepel, we just need to replace geom_text() by geom_text_repel() and geom_label() by geom_label_repel()

Example of using ggrepel

  • Plot
  • Code
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
Warning: ggrepel: 317 unlabeled data points (too many overlaps). Consider
increasing max.overlaps

ggplot(data=exam_data, 
       aes(x= MATHS, 
           y=ENGLISH)) +
  geom_point() +
  geom_smooth(method=lm, 
              size=0.5) +  
  geom_label_repel(aes(label = ID), 
                   fontface = "bold") +
  coord_cartesian(xlim=c(0,100),
                  ylim=c(0,100)) +
  ggtitle("English scores versus Maths scores for Primary 3")
Warning: ggrepel: 317 unlabeled data points (too many overlaps). Consider
increasing max.overlaps

Themes! Themes! Cool themes! From ggtheme package

While ggplot2 has some built-in themes such as theme_gray(), theme_bw(), theme_classic(), theme_dark(), theme_light(), theme_linedraw(), theme_minimal(), and theme_void(), we can also use some cool themes from ggtheme.

  • theme_gray()
  • The Economist
  • WSJ
  • Old Excel

Using hrbthemes package

hrbthemes focuses on typographic elements, allowing you to customize label placements and fonts used.

  • Plot
  • Code

ggplot(data=exam_data, 
             aes(x = MATHS)) +
  geom_histogram(bins=20, 
                 boundary = 100,
                 color="grey25", 
                 fill="grey90") +
  ggtitle("Distribution of Maths scores") +
  theme_ipsum(axis_title_size = 18,
              base_size = 15,
              grid = "")
Tips
  • axis_title_size alters the font size of the axis title

  • base_size messes with the default axis labels

  • grid determines whether you see grids. It accepts the following values: TRUE, FALSE, X, x, Y, y, or a combination, i.e., XY

patchwork! Patching multiple graphs together

Imagine that you have multiple graphs:

p1 <- ggplot(data=exam_data, 
             aes(x = MATHS)) +
  geom_histogram(bins=20, 
                 boundary = 100,
                 color="grey25", 
                 fill="grey90") + 
  coord_cartesian(xlim=c(0,100)) +
  ggtitle("Distribution of Maths scores")
p2 <- ggplot(data=exam_data, 
             aes(x = ENGLISH)) +
  geom_histogram(bins=20, 
                 boundary = 100,
                 color="grey25", 
                 fill="grey90") +
  coord_cartesian(xlim=c(0,100)) +
  ggtitle("Distribution of English scores")
p3 <- ggplot(data=exam_data, 
             aes(x= MATHS, 
                 y=ENGLISH)) +
  geom_point() +
  geom_smooth(method=lm, 
              size=0.5) +  
  coord_cartesian(xlim=c(0,100),
                  ylim=c(0,100)) +
  ggtitle("English scores versus Maths scores for Primary 3")

You can combine two graphs together side by side:

  • Plot
  • Code

p1 + p2

Or combine three of them using the following operators:

  • “|” operator to place the plots side by side

  • “/” operator to stack one on top of another

  • “()” operator the define the sequence of plotting

And also add the following:

  • plot_annotation(), which will automatically tag the different figures
  • inset_element(), which will add another plot based on your specified position (not demonstrated)
  • Plot
  • Code

((p1 / p2) | p3) + 
  plot_annotation(tag_levels = 'A') & theme_economist()