Principle Component Analysis for Image Clustering

Portfolio AI News Blog Newsletter Signup

2025-10-05

Principle Component Analysis PCA is a Machine Learning method to reduce the Dimension and Complexity of Data, Re-Balance the Dataset on the most Important Valuable features. With PCA, we can easily Cluster Our Data to find closely similiar and widely different data point.

PCA Principle Component Analysis for an Image dataset including 6 different steps:

Flatten the 2D Image to Single Row Vector format
Center the Data against its Mean Value
Calculate the Covariance Matrix
Calculate and Rank the Eigen-value and Eigen-matrix from the Covariance Matrix
Perform PCA Dot Product between 2 or 3 highest value Eigen-Matrix Pricipe Components and the original Dataset
Plot PCA result to find Dataset Clustering

import numpy as np
import scipy.sparse.linalg

###
### Flatten 2D Image to Single Row vector
###

imgs_flatten = np.array([im.reshape(-1) for im in imgs])

###
### Center the Data
###

def center_data(Y):
    mean_vector = np.mean(Y, axis=0)                          ### Find Mean Vector of Features for all 55 samples
    print("Mean Vector\n",mean_vector)
    mean_matrix = np.repeat(mean_vector, Y.shape[0])          ### Generate 1 single row 55x4096 Mean Matrix from Mean Vector
    # print("Matrix Shape ", mean_matrix.shape)
    mean_matrix = np.reshape(mean_matrix, Y.shape, order="F") ### Generate 55 rows 55x4096 Mean Matrix from Mean Vector
    X = Y - mean_matrix                                       ### Center All Datapoint by Removing Mean Matrix from Y
    return X

X = center_data(imgs_flatten)

###
### Calculate Covariance Matrix
###

def get_cov_matrix(X):
    ### Getting Covariance Matrix by Cal Dot Product of XT and X 
    ### and divide by number of observation -1 
    obs_n = X.shape[0]-1      
    cov_matrix = np.dot(X.T, X)/obs_n
    return cov_matrix

cov_matrix = get_cov_matrix(X)


###
### Calculate and Rank EIGEN-VALUE and EIGEN-VECTOR
###

eigenvals, eigenvecs = scipy.sparse.linalg.eigsh(cov_matrix, k=55)
eigenvals = eigenvals[::-1]       ### Invert the Order of Eigenvals
eigenvecs = eigenvecs[:,::-1]     ### Invert the Order of Eigenvecs

###
### Perform PCA for Most VALUABLE Princple Components of the Dataset
###

Xred2 = perform_PCA(X, eigenvecs,2)
def perform_PCA(X, eigenvecs, k):
    V = eigenvecs[:, :k] ## Select most valuable K component
    Xred = np.dot(X, V)  ## Perform Dot product on the Centered Data 
    return Xred

### By Plotting the PCA Result we find the high similar or widely different images

Want to Receive Updates On Fastest AI Models, Successful AI Startups and New Hiring Candidates. Subscribe To My Newsletters

Dystill Vision