Posts by Collection

misc

Evolutionary Clustering Methods

Clustering is the process of grouping a set of unlabelled data objects (usually represented as a vector of measurements in a multidimensional space) into a number of clusters. The general objective of clustering is to obtain a partitioning of the data objects such that data within the same cluster are more similar to each other compared to data in different clusters. In some applications, we not only want to obtain static clustering results for one time step, but we are also interested in clustering data objects for an extended period of time. We want to make use of the clustering information from previous time steps to help produce consistent clustering results for the current time step. Ideally, we aim to produce interpretable and efficient clustering results for a set of data objects that evolve over time. Read more

Subspace Clustering With Application To Text Data

The Office for National Statistics (ONS) are experimenting with incorporating web-scraped data into the price index generating process. Clustering methods could be used to automate this process effectively and efficiently. Text data from the same category usually have a few terms in common, which can be modelled as from the same subspace. Read more

publications

Subspace Clustering of Very Sparse High-Dimensional Data

Published in 2018 IEEE International Conference on Big Data, 2018

In this paper we study the problem of clustering collections of very short texts using subspace clustering. This problem arises in many application areas such as product categorisation, fraud detection, and sentiment analysis. The main challenge lies in the fact that the vectorial representation of short texts is both high-dimensional, due to the large number of unique terms in the corpus, and extremely sparse, as each text contains a very small number of words with no repetition. Read more

Recommended citation: H. Peng, N. Pavlidis, I. Eckley and I. Tsalamanis, "Subspace Clustering of Very Sparse High-Dimensional Data", Proceedings of 2018 IEEE International Conference on Big Data , 2018.

Subspace Clustering with Active Learning

Published in 2019 IEEE International Conference on Big Data, 2019

Subspace clustering is a growing field of unsupervised learning that has gained much popularity in the computer vision community. Applications can be found in areas such as motion segmentation and face clustering. It assumes that data originate from a union of subspaces, and clusters the data depending on their corresponding subspaces. In practice, it is reasonable to assume that a limited amount of labels can be obtained, potentially at a cost. Therefore, algorithms that can effectively and efficiently incorporate this information to improve the clustering model are desirable. Read more

Recommended citation: H. Peng, and N. Pavlidis, "Subspace Clustering with Active Learning", Proceedings of 2019 IEEE International Conference on Big Data, 2019.

talks

Clustering Amazon Web-Scraped Data

Published:

The Office for National Statistics (ONS) have come across a lot of challenges when experimenting with using web-scraped price data for price index generation. The challenges posed by the size of the data, the frequency at which the data come from, and the quality of the data caused by the quality of the web-scrapers, just to name a few. Read more

Subspace Clustering with Active Learning

Published:

We propose an active learning framework that is especially designed to be beneficial in the setting of subspace clustering, and in particular, K-Subspace Clustering (KSC). KSC is a K-means like algorithm that alternates between fitting subspaces and allocating data objects to their closest subspace. The simplicity and low computational cost of this algorithm have helped it gain much popularity in the family of subspace clustering algorithms. However, it is well-known that KSC is very sensitive to the initialisation of cluster memberships and is prone to get stuck in local minima. Read more

Pi-Minute Thesis

Published:

This is a short talk on what my PhD is about. The length of the talk is limited to $\pi$ minutes. Read more

Subspace Clustering with Active Learning

Published:

Subspace clustering is a growing field of unsupervised learning that has gained much popularity in the computer vision community. Applications can be found in areas such as motion segmentation and face clustering. It assumes that data originate from a union of subspaces, and clusters the data depending on their corresponding subspaces. In practice, it is reasonable to assume that a limited amount of labels can be obtained, potentially at a cost. Therefore, algorithms that can effectively and efficiently incorporate this information to improve the clustering model are desirable. Read more

Representing and Clustering Text Data

Published:

This talk is be a primer on text embedding and clustering. We explore the ideas behind a couple state-of-the-art text embedding techniques, and provide an intuitive understanding into the mechanics of clustering. Read more

teaching

MA124, Calculus II

Undergraduate course, Boston University, Department of Mathematics & Statistics, 2015

My role: Graduate Teaching Assistant Read more

MATH101, Calculus

Undergraduate course, Lancaster University, Department of Mathematics & Statistics, 2018

My role: Graduate Teaching Assistant Read more

MATH104, Statistics

Undergraduate course, Lancaster University, Department of Mathematics & Statistics, 2019

My role: Graduate Teaching Assistant Read more

MATH101, Calculus

Undergraduate course, Lancaster University, Department of Mathematics & Statistics, 2019

My role: Graduate Teaching Assistant Read more