Subspace Clustering with Active Learning

Hankui Peng

Supervisors: Nicos Pavlidis, Idris Eckley

Industrial partner: Data Science Campus, Office for National Statistics

3rd May 2019




    1. Introduction

    2. Subspace Clustering

    3. Active Learning

    4. Experiments

    5. Conclusions & Future Work


Classic Setting

Subspace Setting


  • Image segmentation Color


  • Digit recognition Color

and more...

  • face recognition
  • motion segmentation
  • text analysis

Subspace Clustering

Subspace Clustering


The subspace clustering problem refers to the problem of finding the subspaces, their dimensions, a basis for each subspace, and the segmentation of the data.


$X$: $N$ by $P$ data matrix.

$N$: number of data objects.

$P$: full data dimensionality.

$K$: number of clusters.

$q$: subspace dimensionality.

K-Subspace Clustering (KSC)

KSC is a $K$-means like algorithm that iterates between updating subspace locations and updating cluster assignment, in order to minimise the overall reconstruction error:

$$L(\textbf{X},\Omega)= \sum_{k=1}^{K}\sum_{\textbf{x}\in\mathcal{X}_{k} } \left\|\textbf{x}-V_{k}V_{k}^{T}\textbf{x} \right\|_{2}^{2},$$


$V_{k}$ is the $P\times q$ basis matrix for the $k$th subspace,

$\mathcal{X}_{k}$ is the set of data that belong to the $k$th subspace, and

$\Omega=\left\{\omega_{1},\ldots,\omega_{N} \right\}$ is the set of cluster assignments, where $\omega_{i}\in\left\{1,\ldots,K \right\}$ for $i\in\left\{1,\ldots,N \right\}$.

For data that lie in $q$-dimensional subspaces, a local optimum can be found by iterating between the following:

Subspace formation

Conducting SVD decomposition:

$$X_{k}=U_{k}\Sigma_{k}V_{k}^{T}, \text{for } k=\left\{1,\ldots,K\right\},$$

gives us the basis matrix $V_{k}$ of the $k$th subspace.

Assignment update

Assigning each data object to its closest subspace:

$$\omega_{i}=\text{arg}\underset{k=1,\ldots,K}{\text{ min}} \left\|\textbf{x}_{i}-V_{k}V_{k}^{T}\textbf{x}_{i} \right\|_{2}^{2}, \text{for } i=\left\{1, \ldots, N\right\}.$$

Stop the iterative procedure when the objective function value plateaus.

Visual Example


Simply Easy Learning


  • Converges in very few number of iterations


  • Can easily get stuck in local minima

  • Sensitive to initialisations

Active Learning

Active Learning


Active learning improves the performance of machine learning algorithms through actively querying data labels.

  • Uncertainty sampling

  • Query by committee

  • Expected model change

Our proposed AL framework borrows ideas from both uncertainty sampling and expected model change.

The Framework


  • The removal of a misclassified data object from its currently allocated cluster would result in a large decrease in the reconstruction error of that cluster;

  • The addition of a misclassified data object to its true cluster would incur small increase in the reconstruction error of that cluster.


  • Query stage: querying the true classes of those data objects that are potentially misclassified;

  • Update stage: update the subspaces and correspondingly the labels of other potentially misclassified data objects.

Query Stage

Query Stage

Influence of data deletion

The influence of data deletion $U_{1}(\textbf{x}_{s},\omega_{s})$ is measured by the decrease in reconstruction error:



  • $\textbf{x}_{s}$ is the data object to be removed from its allocated cluster $\omega_{s}$;

  • $\tilde{L}_{-}(\textbf{x}_{s},\omega_{s})$ denotes the reconstruction error of the perturbed subspace (after $\textbf{x}_{s}$ is removed).

Query Stage

Influence of data addition

The influence of data addition $U_{2}(\textbf{x}_{s},k)$ is measured by the increase in reconstruction error:



  • $\textbf{x}_{s}$ is the data object to be added to cluster $k$ ($k\in\left\{1,\ldots,K \right\}$);

  • $\tilde{L}_{+}(\textbf{x}_{s},k)$ denotes the reconstruction error of the perturbed subspace (after $\textbf{x}_{s}$ is added).

Query Stage

Combining the influence of data deletion and addition together, we decide the most informative data object as: $$\textbf{x}_{s}^{*}=\arg\max_{\textbf{x}_{s}\in\mathcal{X}_{U}} \left\{U_{1}(\textbf{x}_{s},\omega_{s})-U_{2}(\textbf{x}_{s},\omega^{*}_{s}) \right\},$$


$$\omega^{*}_{s}=\arg\min_{\substack{k\in\left\{1,\ldots,K \right\}\\k\neq\omega_{s}}}L(\textbf{x}_{s},k),$$

is considered as the most likely true label for $\textbf{x}_{s}$.

Perturbation Analysis on the Eigenproblem

Approximation through convergent power series ($n\gg l$):

  • Perturbed covariance:
  • Perturbed eigenvalue:
  • Perturbed eigenvector:

We can also deduce the perturbed forms (deletion / addition) of the covariance matrix as:

$S_{I_{-}}=S+\frac{l}{n-l}\left[(S-S_{I})-(\bar{\textbf{x}}_{I}-\bar{\textbf{x}})(\bar{\textbf{x}}_{I}-\bar{\textbf{x}})^{T} \right]-\frac{l^{2}}{(n-l)^{2}}(\bar{\textbf{x}}_{I}-\bar{\textbf{x}})(\bar{\textbf{x}}_{I}-\bar{\textbf{x}})^{T},$

$S_{I_{+}}=S+\frac{l}{n+l}\left[(S_{I}-S)-(\bar{\textbf{x}}_{I}+\bar{\textbf{x}})(\bar{\textbf{x}}_{I}+\bar{\textbf{x}})^{T} \right] +\frac{l^{2}}{(n+l)^{2}}\left(\bar{\textbf{x}}_{I}+\bar{\textbf{x}} \right)\left(\bar{\textbf{x}}_{I}+\bar{\textbf{x}} \right)^{T}.$

Perturbation Analysis on the Eigenproblem

  • Equating the coefficients in $S(\epsilon)\textbf{v}(\epsilon)=\lambda(\epsilon)\textbf{v}(\epsilon)$ gives us:
\begin{aligned} \alpha_{1}=\textbf{v}^{T}S^{(1)}\textbf{v}. \end{aligned}
  • It is the Rayleigh quotient of $S^{(1)}$ and the normalised $\textbf{v}$.

  • For $l=1$, the first order approximation of the $k$th ($k\in\left\{1,\ldots,K \right\}$) perturbed eigenvalue can be written as

\begin{aligned} \lambda_{k}(\epsilon)&=\lambda_{k}+\epsilon\lambda_{k}^{(1)}+\mathcal{O}(\epsilon^{2})\\ &\approx\lambda_{k}+\epsilon \textbf{v}_{k}^{T}S^{(1)}\textbf{v}_{k}\\ &= \lambda_{k}+\frac{1}{n+1} \textbf{v}_{k}^{T}\left(\left(\bar{\textbf{x}}-\textbf{x}_{i} \right)\left(\bar{\textbf{x}}-\textbf{x}_{i} \right)^{T}-S \right)\textbf{v}_{k}\\ &=\frac{1}{n+1}\alpha_{ki}^2-\frac{n}{n+1} \lambda_{k}, \end{aligned}

where $\alpha_{ki}=\textbf{v}_{k}^{T}(\textbf{x}_{i}-\bar{\textbf{x}})$.

Perturbation Analysis on the Eigenproblem

The perturbed form of the influence of single data deletion:

\begin{equation} \begin{aligned} U_{1}(\textbf{x}_{s},\omega_{s}) &=\sum_{\textbf{x}\in\mathcal{X}_{\omega_{s}}}L(\textbf{x},\omega_{s})-\sum_{\textbf{x}\in\mathcal{X}_{\omega_{s}}\backslash\left\{\textbf{x}_{s}\right\}}\tilde{L}_{-}(\textbf{x},\omega_{s})\\ &=\sum_{k=q+1}^{P}\left(\frac{1}{n-1}\alpha_{ki}^{2}-\frac{1}{n-1}\lambda_{k} \right). \end{aligned} \end{equation}

The perturbed form of the influence of single data addition:

\begin{equation} \begin{aligned} U_{2}(\textbf{x}_{s},\omega) &=\sum_{\textbf{x}\in\mathcal{X}_{\omega}\cup\left\{\textbf{x}_{s} \right\}}\tilde{L}_{+}(\textbf{x},\omega)-\sum_{\textbf{x}\in\mathcal{X}_{\omega}}L(\textbf{x},\omega)\\ &=\sum_{k=q+1}^{P}\left\{\frac{1}{n+1}\alpha_{ki}^2-\frac{2n+1}{n+1}\lambda_{k} \right\}. \end{aligned} \end{equation}

Visual Example (Cont'd)


Simply Easy Learning Simply Easy Learning Simply Easy Learning

Update Stage

$K$ Subspace Clustering with Constraints (KSCC)

Constrained objective:

$$g(\mathcal{V})=\sum_{\textbf{x}_{u}\in\mathcal{X}_{U}}\left\{\min_{m\in\left\{1,\ldots,K \right\}}\left\|\textbf{x}_{u}-V_{m}V_{m}^{T}\textbf{x}_{u} \right\|_{2}^{2} \right\}+\min_{\substack{P_{n}\in\mathcal{P}(K)\\n\in\left\{1,\ldots,K! \right\}}}\left\{\sum_{l=1}^{K} \left\|X_{l}-V_{P_{nl}}V_{P_{nl}}^{T}X_{l} \right\|_{2}^{2} \right\},$$


  • $\mathcal{V}$ is the set of all basis matrices;

  • $\mathcal{X}_{U}$ denotes the set of unqueried data objects;

  • $\mathcal{P}(K)$ is the set of all possible permutations of $\left\{1,\ldots, K \right\}$.

$K$ Subspace Clustering with Constraints (KSCC)

KSCC algorithm

  • Stage 1: fitting subspaces

  • Stage 2: updating assignment

  • Stage 3: satisfying constraints by solving

$$P_{n}^{*} =\arg \min_{\substack{P_{n}\in\mathcal{P}(K)\\n\in\left\{1,\ldots,K! \right\}}}\left\{\sum_{l=1}^{K} \left\|X_{l}-V_{P_{nl}}V_{P_{nl}}^{T}X_{l} \right\|_{2}^{2} \right\},$$

through exhaustive search ($\mathcal{O}(K!)$) when $K<=5$; or

through the Hungarian algorithm ($\mathcal{O}(K^3)$) when $K>5$.

*We can show that the constrained objective decreases monotonically throughout iterations of KSCC.


Strategies for comparison

  • SCAL-A: querying data with the least influence upon adding to their most likely cluster

  • SCAL-D: querying data with the largest influence upon deleting from their allocated cluster

  • SCAL: the combination of SCAL-A and SCAL-D

  • MaxResid: querying data with the maximum projection distance to their allocated cluster

  • MinMargin: querying data with the minimum margin in projection distances

  • Random: querying data by random sampling

Visual Example (Cont'd)


Comparing different strategies (from left to right: Random, MinMargin, SCAL):

Visual Example (Cont'd)


Faces Example


- Clustering faces ($K=3$) under various lighting


- 64 faces for each face in 2016-dimensional space


- Images of one subject lie close to one low

  dimensional subspace of dimension 9

Faces Example


Conclusions & Future Work

Conclusions & Future Work


  • Active strategies based on changes in reconstruction error improve the cluster performance quickly

Future work

  • Explore other approaches to robustify the KSC algorithm

  • Understand the interplay between rate of improvement and noise level / between subspace angles

  • Extend the current framework to incorporate other types of subspace algorithms (e.g. spectral methods)


  • P. S. Bradley and O. L. Mangasarian, “K-plane clustering,” Journal of Global Optimization, vol. 16, no. 1, pp. 23–32, 2000.

  • J. Lipor and L. Balzano, “Margin-based active subspace clustering,” in 2015 IEEE 6th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP). IEEE, 2015, pp. 377–380.

  • J. Lipor and L. Balzano, “Leveraging union of subspace structure to improve constrained clustering,” in Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017, pp. 2130–2139.

  • H. W. Kuhn, “The hungarian method for the assignment problem,” Naval research logistics quarterly, vol. 2, no. 1-2, pp. 83–97, 1955.