Skip to content

Classical ML

Principal Component Analysis (PCA)

Principal Component Analysis for dimensionality reduction — the directions of maximal variance via eigenvectors/SVD, choosing k by explained variance, and why scaling matters — with code.

8 min readReviewed May 2026

1Big Picture

Principal Component Analysis (PCA) is the standard linear technique for dimensionality reduction: it finds a new set of axes — the principal components — ordered by how much variance in the data they capture, then keeps only the top few. You compress high-dimensional data into a low-dimensional representation that preserves as much variation as possible, useful for visualization, denoising, decorrelation, and speeding up downstream models.

The components are orthogonal directions of maximal variance, and they're exactly the eigenvectors of the data's covariance matrix (equivalently, the right singular vectors from an SVD). The frame to hold: center the data, find the directions of greatest variance, and project onto the top-k of them. Interviewers check that you know what PCA maximizes, the eigenvector/SVD connection, how to choose k, and why feature scaling matters.

2Intuition + Visual

Imagine a cloud of points stretched mostly along one diagonal. PCA rotates the coordinate system so the first new axis points along the direction of greatest spread, the second along the next-greatest (orthogonal to the first), and so on. Projecting onto the first few axes keeps the structure that matters and discards directions where the data barely varies.

flowchart LR
    X["Data (n × d)"] --> C["Center: subtract the mean"]
    C --> COV["Covariance matrix (d × d)"]
    COV --> E["Eigen / SVD: components by variance"]
    E --> P["Project onto top-k components -> (n × k)"]

The first principal component is the single direction that, if you projected all points onto it, would preserve the most variance — equivalently, the line minimizing total squared reconstruction error.

3The Math

Given centered data XRn×dX \in \mathbb{R}^{n \times d} (mean subtracted), the covariance matrix is

Σ=1n1XX\Sigma = \frac{1}{n - 1} X^\top X

PCA finds the orthonormal directions ww that maximize projected variance wΣww^\top \Sigma\, w subject to w=1\lVert w \rVert = 1. The solution is the eigenvectors of Σ\Sigma, ordered by eigenvalue:

Σwi=λiwi\Sigma\, w_i = \lambda_i\, w_i

Each eigenvalue λi\lambda_i is the variance captured by component ii, so the fraction of variance explained by the top kk components is i=1kλi/j=1dλj\sum_{i=1}^{k}\lambda_i \big/ \sum_{j=1}^{d}\lambda_j — the standard way to choose kk. In practice PCA is computed via the SVD of X=USVX = U S V^\top: the principal components are the columns of VV, and the singular values relate to variance by λi=si2/(n1)\lambda_i = s_i^2 / (n-1). SVD is preferred for numerical stability (it avoids forming XXX^\top X). Scaling matters: PCA is variance-driven, so features on larger scales dominate — standardize features first unless they're already comparable.

4Implementation
python
1import numpy as np
2
3rng = np.random.default_rng(0)
4X = rng.normal(size=(500, 5)) @ rng.normal(size=(5, 5))   # correlated 5-D data
5
6def pca(X, k):
7    Xc = X - X.mean(axis=0)                  # 1. center
8    U, S, Vt = np.linalg.svd(Xc, full_matrices=False)  # 2. SVD
9    components = Vt[:k]                       # 3. top-k directions (k × d)
10    projected = Xc @ components.T            # 4. project -> (n × k)
11    explained = (S**2) / (S**2).sum()        # variance ratio per component
12    return projected, components, explained[:k]
13
14proj, comps, var = pca(X, k=2)
15print(f"shape {proj.shape}, variance explained by top 2: {var.sum():.2%}")
5Interview Questions
  1. Conceptual: What does PCA maximize, and what are the principal components? (It finds orthogonal directions that maximize projected variance — the eigenvectors of the covariance matrix, ordered by eigenvalue.)
  2. Implementation: Why compute PCA via SVD instead of eigendecomposition of the covariance? (SVD on the centered data is more numerically stable and avoids explicitly forming XᵀX, which can lose precision.)
  3. Applied: How do you choose the number of components k? (By cumulative explained variance — keep enough components to reach a target, e.g. 95%, using the eigenvalue/singular-value ratios.)
  4. Systems-level: Why must you scale features before PCA? (PCA is variance-driven, so a feature on a larger scale dominates the components regardless of importance — standardize first.)
  5. Failure modes: When does PCA fail or mislead, e.g. vs. t-SNE? (It only captures linear structure and global variance; nonlinear manifolds or cluster structure may need t-SNE/UMAP, which preserve local neighborhoods instead.)
6Retrieval Check

From memory: list the four PCA steps (center, covariance/SVD, top-k, project), state what eigenvalues represent, and explain why scaling matters. Check against Stage 3.

This is one static walkthrough. A live session goes further.

Ask follow-ups at interview depth, get the math and code rendered as you go, and run a retrieval drill until it sticks — then come back to the thread anytime.

Related concepts