Johané Nienkemper-Swanepoel
GPAbin - pronounced G - P - A - bin
An R package to unify multiple biplot visualisations into a single display.
Nienkemper-Swanepoel J (2025). GPAbin: Unifying Multiple Biplot Visualisations into a Single Display. R package version 1.1.1, https://CRAN.R-project.org/package=GPAbin.
Missing data mechanisms (MDMs):
Missing Completely At Random (MCAR): Missing values are independent of observed and missing observations.
Missing At Random (MAR): Missing values depend on observed values, but not on missing observations.
Missing Not At Random (MNAR): Missing values depend on both observed and missing observations.
Focus placed on MAR scenarios - assumed to be the standard occurrence in practice.
Two general approaches to impute missing values in multiple variables:
Joint modelling (JM): The same imputation model is used for all variables.
Multiple imputation with multiple correspondence analysis (MIMCA)
Multilevel joint modelling multiple imputation (jomo)
Dirichlet process mixture of products of multinomial distribution model (DPMPM)
Fully conditional specification (FCS) / sequential regression / chained equations: Imputation models are specified per variable that is conditioned on the other variables.
If \(\mathbf{X}\) is categorical with \(n\) rows (samples) and \(p\) columns (variables) with a total of \(q\) category levels:
\[\mathbf{R}^{-\frac{1}{2}}\mathbf{G}\mathbf{C}^{-\frac{1}{2}} = \mathbf{U}\mathbf{ \Sigma}\mathbf{V}^\prime\]
\(\mathbf{G}\Rightarrow\) indicator matrix of \(\mathbf{X}\) with \(n\) samples and \(q\) columns.
\(\mathbf{R}^{-\frac{1}{2}}\),\(\mathbf{C}^{-\frac{1}{2}}\Rightarrow\) diagonal matrix of row and column weights of \(\mathbf{G}\).
Generally, plot the first two columns of:
\(\mathbf{U\Sigma}\) (principal coordinates) for the sample coordinates and
the first two columns of \(\mathbf{V}\) (standard coordinates) for the category level point coordinates.
GPAbin
missmi: This function produces a list of elements to be used when producing a GPAbin biplot.
impute: Choose between four available multiple imputation strategies in R.
DRT: Multiple correspondence analysis (MCA) is performed on the multiple imputed datasets.
GPAbin: Combines multiple configurations from dimension reduction solutions applied to multiple imputed data sets.
biplFig: Creates a biplot. Current version (1.1.1): MCA biplot.
evalMeas: Calculates measures of comparison based on distances between two configurations in two dimensions.
data(missdat)
imp.method: choose between c("MIMCA", "jomo", "DPMPM", "mice").m: number of imputationsdata(implist)
implist: an object of multiple imputations to use for illustration of the algorithm.method: in the current version only MCA is available.Borg, I. & Groenen, P. 2005. Modern Multidimensional Scaling. 2nd ed. United States of America: Springer. (Page 433)
Solid green triangles: testee (completed) category level points.
Solid red squares: target (centroid configuration) category level points.
G.target: the default is NULL to utilise the centroid coordinates of the m imputations.Z.col, CLP.col: Colour of sample coordinates and category level points, respectively.Z.pch, CLP.pch: Plotting character of sample coordinates and category level points, respectively.Z.cex, CLP.cex: Size of plotting character for sample points and category level points, respectively.Orthogonal Procrustes Analysis of complete MCA biplot (target) vs. GPAbin biplot (testee):
Procrustes Statistic (PS): between 0 (good) and 1 (bad).
Absolute Mean Bias (AMB): low (good) compared to other AMB values.
Root Mean Squared Bias (RMSB): low (good) compared to other RMSB values.
Evaluation measures based on response profiles:
Similarity Percentage (SP): between 0 (bad) and 1 (good). Coordinates of category levels in closest proximity to sample coordinates per variable.
Response Pattern Recovery (RPR): between 0 (bad) and 1 (good). The number of recovered response profiles predicted from the GPAbin biplot compared to the true response profiles.
compdat: Complete data matrix representing the input data of missmi(). This only applies to simulated data.
CRUCIAL: GitHub and Git for version control, project management and collaboration.
Improves work flow and the process of maintaining individual scripts.
Promotes open-source software.
Creates visibility.
Valuable to receive feedback from users (experts, non-technical users, students)
Happy Git and GitHub for the useR: Jenny Bryan and fellow contributors.
R Packages: Hadley Wickham and Jennifer Bryan.
R Forwards: Partners with community groups to advance inclusive and open-source software and technologies.
R Forwards package development: resources of workshops.
ggplot2 additions to package.I acknowledge the support and contributions of:
Prof. Niël le Roux and Prof. Sugnet Lubbe:
Dianne Cook - Monash University
Emi Tanaka - slide inspiration
Ursula Laa - slide inspiration