My journey (so far) with R package development

Johané Nienkemper-Swanepoel


Centre for Multi-Dimensional Data Visualisation (MuViSU).
Department of Statistics and Actuarial Science, Stellenbosch University.


UCT seminar series: 13 October 2025

Some general tips

  • CRUCIAL: GitHub and Git for version control, project management and collaboration.

    • Improves work flow and the process of maintaining individual scripts.

    • Promotes open-source software.

    • Creates visibility.

    • Valuable to receive feedback from users (experts, non-technical users, students)

  • Happy Git and GitHub for the useR: Jenny Bryan and fellow contributors.

  • R Packages: Hadley Wickham and Jennifer Bryan

  • R Forwards: Partners with community groups to advance inclusive and open-source software and technologies.

  • R Forwards package development: resources of workshops

Package 1

moveEZ - pronounced move easy

An R package for animated biplots.

Ganey R & Nienkemper-Swanepoel J (2025). moveEZ: Animated Biplots.moveEZ, R package version 1.1.1, https://CRAN.R-project.org/package=moveEZ.

Background

  • Consider a dataset \({\bf{X}}\) comprising of \(n\) observations and \(p\) continuous variables, along with an additional variable representing “time”.

  • A natural approach is to construct separate biplots for each level of the time variable, enabling the user to explore how samples and variable relationships evolve across time.

  • However, when the time variable includes many levels, this quickly results in an overwhelming number of biplots.

  • The goal of moveEZ is to address this challenge by animating a single biplot across the levels of the time variable, allowing for dynamic visualisation of temporal or sequential changes in the data.

Biplots

  • Singular Value Decomposition
X
n × p
= U
n × p
D
p × p
V′
p × p
  • Representing samples
Z
n × p
= X
n × p
V′
p × p
  • Representing variables
V
p × p

Climate Data

  • Period: 1950 to 2020 (10 year increments)
  • Regions: IPCC regions map
  • IPCC: Intergovernmental Panel on Climate Change
  • Aggregated monthly measurements:
    1. Accumulated precipitation (AccPrec)
    2. Daily evaporation (DailyEva)
    3. Mean temperature (Temp)
    4. Soil moisture (SoilMois)
    5. Wind speed (Wind)
    6. Standardised precipitation index 6-month (SPI6)
Iturbide et al. 2020. An update of IPCC climate reference regions for subcontinental analysis of climate model data: definition and aggregated datasets. Earth System Science Data 12(4):2959-2970.

Source: https://sites.ualberta.ca/~ahamann/data/climateaf.html

Climate Data

PCA biplot

library(biplotEZ)
bp <- biplot(Africa_climate, scaled = TRUE) |> 
  PCA(group.aes = Africa_climate$Region) |> 
  samples(opacity = 0.8, 
          col = scales::hue_pal()(10)) |> 
  axes(col="black") |> 
  plot()

Fixed Variable Frame

A biplot is first constructed using the full dataset \({\bf{X}}\), and the animation is achieved by slicing the observations according to the “time” variable. In this approach, the variable axes remain fixed and only the sample points are animated over time.

Samples
Z
Variables
V

moveplot()

Using the previously created PCA biplot object bp, the moveplot() function enables animation of the sample points over time. This function is piped with several key arguments:

  • time.var: Specifies the name of the variable in the dataset that represents the temporal or sequential dimension. In this case, the variable “Year” relates to the time variable.

  • group.var: Indicates a grouping variable used for colour-coding. In this case, the variable “Region” relates to the group variable.

  • hulls: A logical argument that determines whether to display individual sample points or to draw convex hulls around each group.

  • scale.var: A numerical value to scale the vectors.

move: A logical argument that controls whether the biplot is animated. If set to TRUE, the sample points are animated across time. If set to FALSE, the function returns a faceted plot showing a static biplot for each time level.
shadow: A logical argument that controls whether samples of previous time points are still displayed as time moves forward (burn-in of previous states). Only works when move = TRUE and hulls = FALSE.

moveplot()

bp |> moveplot(time.var = "Year", group.var = "Region", hulls = FALSE, move = FALSE)
# Object of class biplot, based on 960 samples and 9 variables.
# 6 numeric variables.
# 3 categorical variables.

moveplot()

bp1 |> moveplot(time.var = "Year", group.var = "Region", hulls = TRUE, move = TRUE)

moveplot()

bp |> moveplot(time.var = "Year", group.var = "Region", hulls = FALSE, move = TRUE, shadow = TRUE)

Dynamic Frame

Separate biplots are constructed for each time slice of the data. Both the sample points and variable axes evolve over time, resulting in a fully dynamic animation that reflects temporal changes in the underlying data structure.


Samples
Variables
X1
Z1
V1
X2
Z2
V2

X8
Z8
V8

moveplot2()

  • The moveplot2() function extends the animation to both the sample points and the variable axes.

  • Unlike moveplot(), which keeps the variable axes fixed, moveplot2() constructs a separate biplot for each time slice, allowing both components to evolve over time.

  • The function shares the same arguments as moveplot(), with the move argument determining whether the animation is shown or presented as static facets for samples and variables. Setting move = TRUE produces an animated biplot in which both the samples and variables transition across time.

moveplot2()

Additional enhancements include reflections to align the biplots if needed.

Available options include:

  • "x" – Reflect about the x-axis

  • "y" – Reflect about the y-axis

  • "xy" – Reflect about both axes

Both align.time and reflect can be vectors when alignment is needed at multiple time points with each entry in reflect corresponds to a time point in align.time.

moveplot2()

bp |> moveplot2(time.var = "Year", group.var = "Region", hulls = TRUE, move = FALSE)
# Object of class biplot, based on 960 samples and 9 variables.
# 6 numeric variables.
# 3 categorical variables.

moveplot2()

Here, the biplot is aligned at the 1950 time point by reflection about the x-axis.

bp |> 
  moveplot2(time.var = "Year", group.var = "Region", hulls = TRUE, move = TRUE, align.time = "1950", reflect = "x")

Dynamic Frame: configurative matching

  • As before, separate biplots are constructed for each time slice of the data.

  • Now, the biplots are aligned according to a specific target:

    • Default: the average of the separate biplot coordinates is calculated and used as a target.

    • Option: specify a target (e.g. a specific year)

  • Generalised orthogonal Procrustes Analysis (GPA) is used to transform each biplot to the target:

    • translation, scaling, rotation and reflection
  • Results in an aligned animation to expose subtle temporal changes that occur over time.

GPA methodology

  • Image 1: \({\bf{A}}\) is the target visualisation
  • Image 2: \({\bf{B}}\) is the testee visualisation, figures are already centred (translation not required)
  • Image 3: The coordinates of \({\bf{B}}\) are reflected
  • Image 4: The coordinates of \({\bf{B}}\) are rotated and scaled
Borg, I. & Groenen, P. 2005. Modern Multidimensional Scaling. 2nd ed. United States of America: Springer. (Page 433)

moveplot3()

This function shares the same arguments as moveplot() and moveplot2(), with the addition of the target argument:

  • target = NULL - use the average of available biplots
  • target = Africa_climate_target - use of a specific target dataset

To illustrate the use of a fixed target, consider the year 1989 from the Africa_climate data set, which consists of the same variables and number of observations:

moveplot3()

bp |> moveplot3(time.var = "Year", group.var = "Region", hulls = TRUE, move = TRUE, target = NULL)

moveplot3()

bp |> moveplot3(time.var = "Year", group.var = "Region", hulls = TRUE, move = TRUE, target = Africa_climate_target)

Quantifying the movement

In conjunction with moveplot3(), five measures of comparison can be calculated to measure the difference between the target display and the display representing every other time point.

  • Procrustes Statistic (PS)
  • Congruence Coefficient (CC)
  • Absolute Mean Bias (AMB)
  • Mean Bias (MB)
  • Root Mean Squared Bias (RMSB)

This can be extracted as tables or line graphs.

results <- bp |> moveplot3(time.var = "Year", group.var = "Region", hulls = TRUE, move = FALSE, 
                target = Africa_climate_target) |> evaluation()

Table of evaluation measures

# 
# 
#         Target vs. 1950
# -----  ----------------
# PS               0.2112
# CC               0.9556
# AMB              0.4976
# MB               0.0000
# RMSB             0.6549
# 
#         Target vs. 1960
# -----  ----------------
# PS               0.1738
# CC               0.9559
# AMB              1.6285
# MB               0.0000
# RMSB             2.3374
# 
# 
#         Target vs. 1970
# -----  ----------------
# PS               0.2047
# CC               0.9521
# AMB              1.6469
# MB               0.0000
# RMSB             2.3450
# 
#         Target vs. 1980
# -----  ----------------
# PS               0.1570
# CC               0.9604
# AMB              1.5816
# MB               0.0000
# RMSB             2.3185
# 
# 
#         Target vs. 1990
# -----  ----------------
# PS               0.1698
# CC               0.9603
# AMB              1.6250
# MB               0.0000
# RMSB             2.3322
# 
#         Target vs. 2000
# -----  ----------------
# PS               0.2472
# CC               0.9451
# AMB              1.6976
# MB               0.0000
# RMSB             2.3489
# 
# 
#         Target vs. 2010
# -----  ----------------
# PS               0.1618
# CC               0.9635
# AMB              1.6034
# MB               0.0000
# RMSB             2.3178
# 
#         Target vs. 2020
# -----  ----------------
# PS               0.1277
# CC               0.9712
# AMB              1.5778
# MB               0.0000
# RMSB             2.2826

moveplot3()

results$bias.plot

moveplot3()

results$fit.plot

Package 2

GPAbin - pronounced G - P - A - bin

An R package to unify multiple biplot visualisations into a single display.

Nienkemper-Swanepoel J (2025). GPAbin: Unifying Multiple Biplot Visualisations into a Single Display. R package version 1.1.1, https://CRAN.R-project.org/package=GPAbin.

Pipeline of functions

  • missmi: This function produces a list of elements to be used when producing a GPAbin biplot.

  • impute: Choose between four available multiple imputation strategies in R.

  • DRT: Multiple correspondence analysis (MCA) is performed on the multiple imputed datasets.

  • GPAbin: Combines multiple configurations from dimension reduction solutions applied to multiple imputed data sets.

  • biplFig: Creates a biplot. Current version (1.1.1): MCA biplot.

  • evalMeas: Calculates measures of comparison based on distances between two configurations in two dimensions.

Acknowledgements

We acknowledge the support and contributions of:

  • Dianne Cook - Monash University

    • NGA(MaSS) - funding for this collaboration and visit
  • Emi Tanaka - slide inspiration

  • Ursula Laa - slide inspiration