Plotting PCA (Principal Component Analysis) (2024)

This document explains PCA, clustering, LFDA and MDS related plotting using {ggplot2} and {ggfortify}.

{ggfortify} let {ggplot2} know how to interpret PCA objects. After loading {ggfortify}, you can use ggplot2::autoplot function for stats::prcomp and stats::princomp objects.

library(ggfortify)df <- iris[1:4]pca_res <- prcomp(df, scale. = TRUE)autoplot(pca_res)

Plotting PCA (Principal Component Analysis) (1)

PCA result should only contains numeric values. If you want to colorize by non-numeric values which original data has, pass original data using data keyword and then specify column name by colour keyword. Use help(autoplot.prcomp) (or help(autoplot.*) for any other objects) to check available options.

autoplot(pca_res, data = iris, colour = 'Species')

Plotting PCA (Principal Component Analysis) (2)

Passing label = TRUE draws each data label using rownames

autoplot(pca_res, data = iris, colour = 'Species', label = TRUE, label.size = 3)

Plotting PCA (Principal Component Analysis) (3)

Passing shape = FALSE makes plot without points. In this case, label is turned on unless otherwise specified.

autoplot(pca_res, data = iris, colour = 'Species', shape = FALSE, label.size = 3)

Plotting PCA (Principal Component Analysis) (4)

Passing loadings = TRUE draws eigenvectors.

autoplot(pca_res, data = iris, colour = 'Species', loadings = TRUE)

Plotting PCA (Principal Component Analysis) (5)

You can attach eigenvector labels and change some options.

autoplot(pca_res, data = iris, colour = 'Species', loadings = TRUE, loadings.colour = 'blue', loadings.label = TRUE, loadings.label.size = 3)

Plotting PCA (Principal Component Analysis) (6)

By default, each component are scaled as the same as standard biplot. You can disable the scaling by specifying scale = 0

autoplot(pca_res, scale = 0)

Plotting PCA (Principal Component Analysis) (7)

{ggfortify} supports stats::factanal object as the same manner as PCAs. Available opitons are the same as PCAs.

Important You must specify scores option when calling factanal to calcurate sores (default scores = NULL). Otherwise, plotting will fail.

d.factanal <- factanal(state.x77, factors = 3, scores = 'regression')autoplot(d.factanal, data = state.x77, colour = 'Income')

Plotting PCA (Principal Component Analysis) (8)

autoplot(d.factanal, label = TRUE, label.size = 3, loadings = TRUE, loadings.label = TRUE, loadings.label.size = 3)

Plotting PCA (Principal Component Analysis) (9)

{ggfortify} supports stats::kmeans class. You must explicitly pass original data to autoplot function via data keyword. Because kmeans object doesn’t store original data. The result will be automatically colorized by categorized cluster.

set.seed(1)autoplot(kmeans(USArrests, 3), data = USArrests)

Plotting PCA (Principal Component Analysis) (10)

autoplot(kmeans(USArrests, 3), data = USArrests, label = TRUE, label.size = 3)

Plotting PCA (Principal Component Analysis) (11)

{ggfortify} supports cluster::clara, cluster::fanny, cluster::pam as well as cluster::silhouette classes.Because these instances should contains original data in its property, there is no need to pass original data explicitly.

library(cluster)autoplot(clara(iris[-5], 3))

Plotting PCA (Principal Component Analysis) (12)

Specifying frame = TRUE in autoplot for stats::kmeans and cluster::* draws convex for each cluster.

autoplot(fanny(iris[-5], 3), frame = TRUE)

Plotting PCA (Principal Component Analysis) (13)

If you want probability ellipse, {ggplot2} 1.0.0 or later is required. Specify whatever supported in ggplot2::stat_ellipse’s type keyword via frame.type option.

autoplot(pam(iris[-5], 3), frame = TRUE, frame.type = 'norm')

Plotting PCA (Principal Component Analysis) (14)

If you want a Silhouette plot, pass a Silhouette object to autoplot function.

autoplot(silhouette(pam(iris[-5], 3L)))

Plotting PCA (Principal Component Analysis) (15)

For more information on Silhouette plots and how they can be used, seebase R example,scikit-learn example andoriginal paper.

{lfda} package supports a set of Local Fisher Discriminant Analysis methods. You can use autoplot to plot the analysis result as the same manner as PCA.

library(lfda)# Local Fisher Discriminant Analysis (LFDA)model <- lfda(iris[-5], iris[, 5], r = 3, metric="plain")autoplot(model, data = iris, frame = TRUE, frame.colour = 'Species')

Plotting PCA (Principal Component Analysis) (16)

# Semi-supervised Local Fisher Discriminant Analysis (SELF)model <- self(iris[-5], iris[, 5], beta = 0.1, r = 3, metric="plain")autoplot(model, data = iris, frame = TRUE, frame.colour = 'Species')

Plotting PCA (Principal Component Analysis) (17)

Before Plotting

Even though MDS functions returns matrix or list (not specific class), {ggfortify} can infer background class from list attribute and perform autoplot.

NOTE Inference from matrix is not supported.

NOTE {ggfortify} can plot stats::dist instance as heatmap.

autoplot(eurodist)

Plotting PCA (Principal Component Analysis) (18)

Plotting Classical (Metric) Multidimensional Scaling

stats::cmdscale performs Classical MDS and returns point coodinates as matrix, thus you can not use autoplot in this case. However, either eig = TRUE, add = True or x.ret = True is specified, stats::cmdscale return list instead of matrix. In these cases, {ggfortify} can infer how to plot it via autoplot. Refer to help(cmdscale) to check what these options are.

autoplot(cmdscale(eurodist, eig = TRUE))

Plotting PCA (Principal Component Analysis) (19)

Specify label = TRUE to plot labels.

autoplot(cmdscale(eurodist, eig = TRUE), label = TRUE, label.size = 3)

Plotting PCA (Principal Component Analysis) (20)

Plotting Non-metric Multidimensional Scaling

MASS::isoMDS and MASS::sammon perform Non-metric MDS and return list which contains point coordinates. Thus, autoplot can be used.

NOTE On background, autoplot.matrix is called to plot MDS. See help(autoplot.matrix) to check available options.

library(MASS)autoplot(isoMDS(eurodist), colour = 'orange', size = 4, shape = 3)
## initial value 7.505733 ## final value 7.505688 ## converged

Plotting PCA (Principal Component Analysis) (21)

Passing shape = FALSE makes plot without points. In this case, label is turned on unless otherwise specified.

autoplot(sammon(eurodist), shape = FALSE, label.colour = 'blue', label.size = 3)
## Initial stress : 0.01705## stress after 10 iters: 0.00951, magic = 0.500## stress after 20 iters: 0.00941, magic = 0.500

Plotting PCA (Principal Component Analysis) (22)

Plotting PCA (Principal Component Analysis) (2024)
Top Articles
Latest Posts
Article information

Author: Gregorio Kreiger

Last Updated:

Views: 6484

Rating: 4.7 / 5 (77 voted)

Reviews: 92% of readers found this page helpful

Author information

Name: Gregorio Kreiger

Birthday: 1994-12-18

Address: 89212 Tracey Ramp, Sunside, MT 08453-0951

Phone: +9014805370218

Job: Customer Designer

Hobby: Mountain biking, Orienteering, Hiking, Sewing, Backpacking, Mushroom hunting, Backpacking

Introduction: My name is Gregorio Kreiger, I am a tender, brainy, enthusiastic, combative, agreeable, gentle, gentle person who loves writing and wants to share my knowledge and understanding with you.