ISSS608 Visual Analytics & Applications Coursework
by Wilson Tan
  • Hands-on Exercises
    • Week 1: Hands-on Exercise
    • Week 2: Hands-on Exercise
    • Week 3: Hands-on Exercise
    • Week 4: Hands-on Exercise
  • In-class Exercises
    • Week 1: In-class Exercise
    • Week 2: In-class Exercise
    • Week 3: In-class Exercise
    • Week 4: In-class Exercise
    • Week 5: In-class Exercise
    • Week 6: In-class Exercise
    • Week 7: In-class Exercise
  • Take-home Exercises
    • Take-home Exercise 01
    • Take-home Exercise 02
    • Take-home Exercise 03

Week 1: Hands-on Exercise

Disclaimer

Hands-on exercises are for my own practice and are ungraded. Thus, the plots and write-ups may be unrefined and poorly labelled.

Let’s explore the ggplot2 package in R!

Load dataset

exam_data <- read_csv("data/data_01/Exam_data.csv")
Rows: 322 Columns: 7
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (4): ID, CLASS, GENDER, RACE
dbl (3): ENGLISH, MATHS, SCIENCE

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Comparing R graphics with ggplot2

  • R graphics
  • ggplot2
hist(exam_data$MATHS)

ggplot(data=exam_data, aes(x = MATHS)) +
  geom_histogram(bins=10, 
                 boundary = 100,
                 color="black", 
                 fill="grey") +
  ggtitle("Distribution of Maths scores")

ggplot2 has a more complicated syntax, but offers much more customization options to help you make your data visualizations beautiful.

Exploring ggplot2

  • Bar chart
  • Dot plot
  • Histogram
  • Kernel density
  • Box plot
  • Violin plot
  • Scatterplot
ggplot(data = exam_data,
       aes(x = RACE)) +
  geom_bar()

ggplot(data = exam_data,
       aes(x = MATHS)) +
  geom_dotplot(dotsize = 0.5,
               binwidth = 2.5) +
  scale_y_continuous(NULL,
                     breaks = NULL)

ggplot(data = exam_data,
       aes(x = MATHS,
           fill = GENDER)) +
  geom_histogram(bins = 20,
                 color = "grey20") +
  scale_fill_manual(values = c("pink", "steelblue"))

ggplot(data = exam_data,
       aes(x = MATHS,
           color = GENDER)) +
  geom_density()

ggplot(data = exam_data,
       aes(y = MATHS,
           x = GENDER,
           fill = GENDER)) +
  geom_boxplot(notch = TRUE) +
  scale_fill_manual(values = c("pink", "steelblue"))

ggplot(data = exam_data,
       aes(y = MATHS,
           x = GENDER)) +
  geom_violin()

ggplot(data = exam_data,
       aes(x = MATHS,
           y = ENGLISH)) +
  geom_point() +
  coord_cartesian(xlim = c(0, 100),
                  ylim = c(0, 100))

Some other elements…

Combining geom objects + stat

ggplot(data = exam_data,
       aes(y = MATHS,
           x = GENDER,
           fill = GENDER)) +
  geom_boxplot(notch = TRUE) +
  geom_point(position = "jitter",
             size = 0.5) +
  scale_fill_manual(values = c("pink", "steelblue")) +
  stat_summary(geom = "point",
               fun = "mean",
               colour = "red",
               size = 4) +
  theme_classic()

Theme used: Classic

Scatterplot with best fit line!

ggplot(data = exam_data,
       aes(x = MATHS,
           y = ENGLISH)) +
  geom_point() +
  geom_smooth(method = lm,
              linewidth = 0.5)
`geom_smooth()` using formula = 'y ~ x'

Working with facets

ggplot(data = exam_data,
       aes(x = MATHS)) +
  geom_histogram(bins = 20) +
  facet_wrap(~ CLASS) +
  theme_minimal()

Theme used: Minimal

And that’s it for Week 1!