# Classic Setting¶ # Subspace Setting¶ # Motivation¶

• Image segmentation # Motivation¶

• Digit recognition # and more...¶

• face recognition
• motion segmentation
• text analysis

# Subspace Clustering¶

#### Definition¶

The subspace clustering problem refers to the problem of finding the subspaces, their dimensions, a basis for each subspace, and the segmentation of the data.

#### Notation¶

$X$: $N$ by $P$ data matrix.

$N$: number of data objects.

$P$: full data dimensionality.

$K$: number of clusters.

$q$: subspace dimensionality.

# K-Subspace Clustering (KSC)¶

KSC is a $K$-means like algorithm that iterates between updating subspace locations and updating cluster assignment, in order to minimise the overall reconstruction error:

$$L(\textbf{X},\Omega)= \sum_{k=1}^{K}\sum_{\textbf{x}\in\mathcal{X}_{k} } \left\|\textbf{x}-V_{k}V_{k}^{T}\textbf{x} \right\|_{2}^{2},$$

where

$V_{k}$ is the $P\times q$ basis matrix for the $k$th subspace,

$\mathcal{X}_{k}$ is the set of data that belong to the $k$th subspace, and

$\Omega=\left\{\omega_{1},\ldots,\omega_{N} \right\}$ is the set of cluster assignments, where $\omega_{i}\in\left\{1,\ldots,K \right\}$ for $i\in\left\{1,\ldots,N \right\}$.

For data that lie in $q$-dimensional subspaces, a local optimum can be found by iterating between the following:

### Subspace formation¶

Conducting SVD decomposition:

$$X_{k}=U_{k}\Sigma_{k}V_{k}^{T}, \text{for } k=\left\{1,\ldots,K\right\},$$

gives us the basis matrix $V_{k}$ of the $k$th subspace.

### Assignment update¶

Assigning each data object to its closest subspace:

$$\omega_{i}=\text{arg}\underset{k=1,\ldots,K}{\text{ min}} \left\|\textbf{x}_{i}-V_{k}V_{k}^{T}\textbf{x}_{i} \right\|_{2}^{2}, \text{for } i=\left\{1, \ldots, N\right\}.$$

Stop the iterative procedure when the objective function value plateaus.

## Visual Example • Converges in very few number of iterations

### Drawbacks¶

• Can easily get stuck in local minima

• Sensitive to initialisations

# Active Learning¶

Active learning improves the performance of machine learning algorithms through actively querying data labels.

• Uncertainty sampling

• Query by committee

• Expected model change

Our proposed AL framework borrows ideas from both uncertainty sampling and expected model change.

# The Framework¶

Assumptions:

• The removal of a misclassified data object from its currently allocated cluster would result in a large decrease in the reconstruction error of that cluster;

• The addition of a misclassified data object to its true cluster would incur small increase in the reconstruction error of that cluster.

Strategy:

• Query stage: querying the true classes of those data objects that are potentially misclassified;

• Update stage: update the subspaces and correspondingly the labels of other potentially misclassified data objects.

# Query Stage¶

### Influence of data deletion¶

The influence of data deletion $U_{1}(\textbf{x}_{s},\omega_{s})$ is measured by the decrease in reconstruction error:

$$U_{1}(\textbf{x}_{s},\omega_{s})=\sum_{\textbf{x}\in\mathcal{X}_{\omega_{s}}}L(\textbf{x},\omega_{s})-\sum_{\textbf{x}\in\mathcal{X}_{\omega_{s}}\backslash\left\{\textbf{x}_{s}\right\}}\tilde{L}_{-}(\textbf{x},\omega_{s}),$$

where

• $\textbf{x}_{s}$ is the data object to be removed from its allocated cluster $\omega_{s}$;

• $\tilde{L}_{-}(\textbf{x}_{s},\omega_{s})$ denotes the reconstruction error of the perturbed subspace (after $\textbf{x}_{s}$ is removed).

# Query Stage¶

The influence of data addition $U_{2}(\textbf{x}_{s},k)$ is measured by the increase in reconstruction error:

$$U_{2}(\textbf{x}_{s},k)=\sum_{\textbf{x}\in\mathcal{X}_{k}\cup\left\{\textbf{x}_{s}\right\}}\tilde{L}_{+}(\textbf{x},k)-\sum_{\textbf{x}\in\mathcal{X}_{k}}L(\textbf{x},k),$$

where

• $\textbf{x}_{s}$ is the data object to be added to cluster $k$ ($k\in\left\{1,\ldots,K \right\}$);

• $\tilde{L}_{+}(\textbf{x}_{s},k)$ denotes the reconstruction error of the perturbed subspace (after $\textbf{x}_{s}$ is added).

# Query Stage¶

Combining the influence of data deletion and addition together, we decide the most informative data object as: $$\textbf{x}_{s}^{*}=\arg\max_{\textbf{x}_{s}\in\mathcal{X}_{U}} \left\{U_{1}(\textbf{x}_{s},\omega_{s})-U_{2}(\textbf{x}_{s},\omega^{*}_{s}) \right\},$$

where

$$\omega^{*}_{s}=\arg\min_{\substack{k\in\left\{1,\ldots,K \right\}\\k\neq\omega_{s}}}L(\textbf{x}_{s},k),$$

is considered as the most likely true label for $\textbf{x}_{s}$.

## Perturbation Analysis on the Eigenproblem¶

Approximation through convergent power series ($n\gg l$):

• Perturbed covariance:
$$S(\epsilon)=S+S^{(1)}\epsilon+S^{(2)}\epsilon^{2}+\cdots,$$
• Perturbed eigenvalue:
$$\lambda(\epsilon)=\lambda+\alpha_{1}\epsilon+\alpha_{2}\epsilon^{2}+\cdots,$$
• Perturbed eigenvector:
$$\textbf{v}(\epsilon)=\textbf{v}+\boldsymbol{\psi}_{1}\epsilon+\boldsymbol{\psi}_{2}\epsilon^{2}+\cdots.$$

We can also deduce the perturbed forms (deletion / addition) of the covariance matrix as:

$S_{I_{-}}=S+\frac{l}{n-l}\left[(S-S_{I})-(\bar{\textbf{x}}_{I}-\bar{\textbf{x}})(\bar{\textbf{x}}_{I}-\bar{\textbf{x}})^{T} \right]-\frac{l^{2}}{(n-l)^{2}}(\bar{\textbf{x}}_{I}-\bar{\textbf{x}})(\bar{\textbf{x}}_{I}-\bar{\textbf{x}})^{T},$

$S_{I_{+}}=S+\frac{l}{n+l}\left[(S_{I}-S)-(\bar{\textbf{x}}_{I}+\bar{\textbf{x}})(\bar{\textbf{x}}_{I}+\bar{\textbf{x}})^{T} \right] +\frac{l^{2}}{(n+l)^{2}}\left(\bar{\textbf{x}}_{I}+\bar{\textbf{x}} \right)\left(\bar{\textbf{x}}_{I}+\bar{\textbf{x}} \right)^{T}.$

## Perturbation Analysis on the Eigenproblem¶

• Equating the coefficients in $S(\epsilon)\textbf{v}(\epsilon)=\lambda(\epsilon)\textbf{v}(\epsilon)$ gives us:
\begin{aligned} \alpha_{1}=\textbf{v}^{T}S^{(1)}\textbf{v}. \end{aligned}
• It is the Rayleigh quotient of $S^{(1)}$ and the normalised $\textbf{v}$.

• For $l=1$, the first order approximation of the $k$th ($k\in\left\{1,\ldots,K \right\}$) perturbed eigenvalue can be written as

\begin{aligned} \lambda_{k}(\epsilon)&=\lambda_{k}+\epsilon\lambda_{k}^{(1)}+\mathcal{O}(\epsilon^{2})\\ &\approx\lambda_{k}+\epsilon \textbf{v}_{k}^{T}S^{(1)}\textbf{v}_{k}\\ &= \lambda_{k}+\frac{1}{n+1} \textbf{v}_{k}^{T}\left(\left(\bar{\textbf{x}}-\textbf{x}_{i} \right)\left(\bar{\textbf{x}}-\textbf{x}_{i} \right)^{T}-S \right)\textbf{v}_{k}\\ &=\frac{1}{n+1}\alpha_{ki}^2-\frac{n}{n+1} \lambda_{k}, \end{aligned}

where $\alpha_{ki}=\textbf{v}_{k}^{T}(\textbf{x}_{i}-\bar{\textbf{x}})$.

## Perturbation Analysis on the Eigenproblem¶

The perturbed form of the influence of single data deletion:

\begin{equation} \begin{aligned} U_{1}(\textbf{x}_{s},\omega_{s}) &=\sum_{\textbf{x}\in\mathcal{X}_{\omega_{s}}}L(\textbf{x},\omega_{s})-\sum_{\textbf{x}\in\mathcal{X}_{\omega_{s}}\backslash\left\{\textbf{x}_{s}\right\}}\tilde{L}_{-}(\textbf{x},\omega_{s})\\ &=\sum_{k=q+1}^{P}\left(\frac{1}{n-1}\alpha_{ki}^{2}-\frac{1}{n-1}\lambda_{k} \right). \end{aligned} \end{equation}

The perturbed form of the influence of single data addition:

\begin{equation} \begin{aligned} U_{2}(\textbf{x}_{s},\omega) &=\sum_{\textbf{x}\in\mathcal{X}_{\omega}\cup\left\{\textbf{x}_{s} \right\}}\tilde{L}_{+}(\textbf{x},\omega)-\sum_{\textbf{x}\in\mathcal{X}_{\omega}}L(\textbf{x},\omega)\\ &=\sum_{k=q+1}^{P}\left\{\frac{1}{n+1}\alpha_{ki}^2-\frac{2n+1}{n+1}\lambda_{k} \right\}. \end{aligned} \end{equation}

## Visual Example (Cont'd)   # Update Stage¶

## $K$ Subspace Clustering with Constraints (KSCC)¶

#### Constrained objective:¶

$$g(\mathcal{V})=\sum_{\textbf{x}_{u}\in\mathcal{X}_{U}}\left\{\min_{m\in\left\{1,\ldots,K \right\}}\left\|\textbf{x}_{u}-V_{m}V_{m}^{T}\textbf{x}_{u} \right\|_{2}^{2} \right\}+\min_{\substack{P_{n}\in\mathcal{P}(K)\\n\in\left\{1,\ldots,K! \right\}}}\left\{\sum_{l=1}^{K} \left\|X_{l}-V_{P_{nl}}V_{P_{nl}}^{T}X_{l} \right\|_{2}^{2} \right\},$$

where

• $\mathcal{V}$ is the set of all basis matrices;

• $\mathcal{X}_{U}$ denotes the set of unqueried data objects;

• $\mathcal{P}(K)$ is the set of all possible permutations of $\left\{1,\ldots, K \right\}$.

## $K$ Subspace Clustering with Constraints (KSCC)¶

#### KSCC algorithm¶

• Stage 1: fitting subspaces

• Stage 2: updating assignment

• Stage 3: satisfying constraints by solving

$$P_{n}^{*} =\arg \min_{\substack{P_{n}\in\mathcal{P}(K)\\n\in\left\{1,\ldots,K! \right\}}}\left\{\sum_{l=1}^{K} \left\|X_{l}-V_{P_{nl}}V_{P_{nl}}^{T}X_{l} \right\|_{2}^{2} \right\},$$

through exhaustive search ($\mathcal{O}(K!)$) when $K<=5$; or

through the Hungarian algorithm ($\mathcal{O}(K^3)$) when $K>5$.

*We can show that the constrained objective decreases monotonically throughout iterations of KSCC.

# Strategies for comparison¶

• SCAL-A: querying data with the least influence upon adding to their most likely cluster

• SCAL-D: querying data with the largest influence upon deleting from their allocated cluster

• SCAL: the combination of SCAL-A and SCAL-D

• MaxResid: querying data with the maximum projection distance to their allocated cluster

• MinMargin: querying data with the minimum margin in projection distances

• Random: querying data by random sampling

## Visual Example (Cont'd)

Comparing different strategies (from left to right: Random, MinMargin, SCAL):

# Visual Example (Cont'd)¶ ## Faces Example - Clustering faces ($K=3$) under various lighting

- 64 faces for each face in 2016-dimensional space

- Images of one subject lie close to one low

dimensional subspace of dimension 9 ## Faces Example   # Conclusions & Future Work¶

### Conclusions¶

• Active strategies based on changes in reconstruction error improve the cluster performance quickly

### Future work¶

• Explore other approaches to robustify the KSC algorithm

• Understand the interplay between rate of improvement and noise level / between subspace angles

• Extend the current framework to incorporate other types of subspace algorithms (e.g. spectral methods)

• P. S. Bradley and O. L. Mangasarian, “K-plane clustering,” Journal of Global Optimization, vol. 16, no. 1, pp. 23–32, 2000.

• J. Lipor and L. Balzano, “Margin-based active subspace clustering,” in 2015 IEEE 6th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP). IEEE, 2015, pp. 377–380.

• J. Lipor and L. Balzano, “Leveraging union of subspace structure to improve constrained clustering,” in Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017, pp. 2130–2139.

• H. W. Kuhn, “The hungarian method for the assignment problem,” Naval research logistics quarterly, vol. 2, no. 1-2, pp. 83–97, 1955.