Subspace Clustering With Application To Text Data

The Office for National Statistics (ONS) are experimenting with incorporating web-scraped data into the price index generating process. Clustering methods could be used to automate this process effectively and efficiently. Text data from the same category usually have a few terms in common, which can be modelled as from the same subspace. We study the problem of grouping short text data using subspace clustering. The problem of clustering short text data arises in many application domains, such as sentiment analysis, product categorisation. One challenge of such tasks comes from the fact that their vectorial representation are usually high-dimensional. Additionally, the text lengths are generally short for online products.