What does PCA maximize?

It finds orthogonal directions (principal components) that maximize the variance of the data projected onto them — the eigenvectors of the covariance matrix, ordered by eigenvalue.

Why compute PCA with SVD instead of eigendecomposition?

SVD on the centered data is more numerically stable and avoids explicitly forming XᵀX, which can lose precision.

How do you choose the number of principal components?

By cumulative explained variance — keep enough components to reach a target (e.g. 95%) using the eigenvalue or singular-value ratios.

Why must you scale features before PCA?

PCA is variance-driven, so a feature on a larger scale dominates the components regardless of importance — standardize features first.

PCA Explained (ML Interview)

Principal Component Analysis for dimensionality reduction — the directions of maximal variance via eigenvectors/SVD, choosing k by explained variance, and why scaling matters — with code.

1Big Picture

Principal Component Analysis (PCA) is the standard linear technique for dimensionality reduction: it finds a new set of axes — the principal components — ordered by how much variance in the data they capture, then keeps only the top few. You compress high-dimensional data into a low-dimensional representation that preserves as much variation as possible, useful for visualization, denoising, decorrelation, and speeding up downstream models.

The components are orthogonal directions of maximal variance, and they're exactly the eigenvectors of the data's covariance matrix (equivalently, the right singular vectors from an SVD). The frame to hold: center the data, find the directions of greatest variance, and project onto the top-k of them. Interviewers check that you know what PCA maximizes, the eigenvector/SVD connection, how to choose k, and why feature scaling matters.

2Intuition + Visual

Imagine a cloud of points stretched mostly along one diagonal. PCA rotates the coordinate system so the first new axis points along the direction of greatest spread, the second along the next-greatest (orthogonal to the first), and so on. Projecting onto the first few axes keeps the structure that matters and discards directions where the data barely varies.

The first principal component is the single direction that, if you projected all points onto it, would preserve the most variance — equivalently, the line minimizing total squared reconstruction error.

3The Math

Given centered data $X \in \mathbb{R}^{n \times d}$ (mean subtracted), the covariance matrix is

\Sigma = \frac{1}{n - 1} X^\top X

PCA finds the orthonormal directions $w$ that maximize projected variance $w^\top \Sigma\, w$ subject to $\lVert w \rVert = 1$ . The solution is the eigenvectors of $\Sigma$ , ordered by eigenvalue:

\Sigma\, w_i = \lambda_i\, w_i

Each eigenvalue $\lambda_i$ is the variance captured by component $i$ , so the fraction of variance explained by the top $k$ components is $\sum_{i=1}^{k}\lambda_i \big/ \sum_{j=1}^{d}\lambda_j$ — the standard way to choose $k$ . In practice PCA is computed via the SVD of $X = U S V^\top$ : the principal components are the columns of $V$ , and the singular values relate to variance by $\lambda_i = s_i^2 / (n-1)$ . SVD is preferred for numerical stability (it avoids forming $X^\top X$ ). Scaling matters: PCA is variance-driven, so features on larger scales dominate — standardize features first unless they're already comparable.

4Implementation

python

1import numpy as np
2
3rng = np.random.default_rng(0)
4X = rng.normal(size=(500, 5)) @ rng.normal(size=(5, 5))   # correlated 5-D data
5
6def pca(X, k):
7    Xc = X - X.mean(axis=0)                  # 1. center
8    U, S, Vt = np.linalg.svd(Xc, full_matrices=False)  # 2. SVD
9    components = Vt[:k]                       # 3. top-k directions (k × d)
10    projected = Xc @ components.T            # 4. project -> (n × k)
11    explained = (S**2) / (S**2).sum()        # variance ratio per component
12    return projected, components, explained[:k]
13
14proj, comps, var = pca(X, k=2)
15print(f"shape {proj.shape}, variance explained by top 2: {var.sum():.2%}")

5Interview Questions

Conceptual: What does PCA maximize, and what are the principal components? (It finds orthogonal directions that maximize projected variance — the eigenvectors of the covariance matrix, ordered by eigenvalue.)
Implementation: Why compute PCA via SVD instead of eigendecomposition of the covariance? (SVD on the centered data is more numerically stable and avoids explicitly forming XᵀX, which can lose precision.)
Applied: How do you choose the number of components k? (By cumulative explained variance — keep enough components to reach a target, e.g. 95%, using the eigenvalue/singular-value ratios.)
Systems-level: Why must you scale features before PCA? (PCA is variance-driven, so a feature on a larger scale dominates the components regardless of importance — standardize first.)
Failure modes: When does PCA fail or mislead, e.g. vs. t-SNE? (It only captures linear structure and global variance; nonlinear manifolds or cluster structure may need t-SNE/UMAP, which preserve local neighborhoods instead.)

6Retrieval Check

From memory: list the four PCA steps (center, covariance/SVD, top-k, project), state what eigenvalues represent, and explain why scaling matters. Check against Stage 3.

Principal Component Analysis (PCA)

Related concepts