See a map of the locations I've given a talk at.

Representing and Clustering Text Data

January 29, 2020

PyData Lancaster, Storey, Lancaster, England, UK

This talk is be a primer on text embedding and clustering. We explore the ideas behind a couple state-of-the-art text embedding techniques, and provide an intuitive understanding into the mechanics of clustering.

Subspace Clustering with Active Learning

December 11, 2019

2019 IEEE International Conference on Big Data, Los Angeles, CA, US

Subspace clustering is a growing field of unsupervised learning that has gained much popularity in the computer vision community. Applications can be found in areas such as motion segmentation and face clustering. It assumes that data originate from a union of subspaces, and clusters the data depending on their corresponding subspaces. In practice, it is reasonable to assume that a limited amount of labels can be obtained, potentially at a cost. Therefore, algorithms that can effectively and efficiently incorporate this information to improve the clustering model are desirable.

Pi-Minute Thesis

August 16, 2019

PhD Forum, STOR-i CDT, Lancaster University, Lancaster, England, UK

This is a short talk on what my PhD is about. The length of the talk is limited to $\pi$ minutes.

Subspace Clustering with Active Learning

May 03, 2019

PhD Forum, STOR-i CDT, Lancaster University, Lancaster, England, UK

We propose an active learning framework that is especially designed to be beneficial in the setting of subspace clustering, and in particular, K-Subspace Clustering (KSC). KSC is a K-means like algorithm that alternates between fitting subspaces and allocating data objects to their closest subspace. The simplicity and low computational cost of this algorithm have helped it gain much popularity in the family of subspace clustering algorithms. However, it is well-known that KSC is very sensitive to the initialisation of cluster memberships and is prone to get stuck in local minima.

Subspace Clustering of Very Sparse High-Dimensional Data

December 11, 2018

2018 IEEE International Conference on Big Data, Workshop on Advances in High Dimensional Big Data, Seattle, WA, US

In this paper we consider the problem of clustering collections of very short texts using subspace clustering. This problem arises in many applications such as product categorisation, fraud detection, and sentiment analysis.

Clustering Amazon Web-Scraped Data

April 20, 2018

PhD Forum, STOR-i CDT, Lancaster University, Lancaster, England, UK

The Office for National Statistics (ONS) have come across a lot of challenges when experimenting with using web-scraped price data for price index generation. The challenges posed by the size of the data, the frequency at which the data come from, and the quality of the data caused by the quality of the web-scrapers, just to name a few.