Page Not Found
Page not found. Your pixels are in another canvas. Read more
A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.
Page not found. Your pixels are in another canvas. Read more
About me Read more
Published:
This is a page not in the main menu Read more
Published:
Published:
Published:
Published:
"Like a Shakespearean sonnet that captures the very essence of love, or a painting that brings out the beauty of the human form that is far more than just skin deep, Euler's Equation reaches down into the very depths of existence."-- Keith Devlin </blockquote> Read more
</p> </article> </div>Mathematics and Finance Hand in Hand
Published:
This week's blogging cause is an effort of putting together my recent random thoughts on the relationship between maths and the financial market. In the recent two decades or so, there has been a coup d’etat in the financial trading world, where human decisions have been largely replaced by sophisticated computer systems. Read more
</article> </div>A Peep into Kalman Filter
Published:
Deep insecurity about my non-mathy background urged me to look into things that pump into my ears all the time. Today I decided to do some quick reading on Kalman filter, and now it's time to write down my understanding. Bear with me, those "mathemagicians". Read more
</article> </div>Hey robot, why you are so smart?
Published:
Are you being misguided here by the catchy name? Lucky you, this is gonna be my last proper Gaussian Process (GP) post. Just to assure you, I did not just use "robot" in the post name to wave hands at you; but I do intend to explain in this post how to use GP to "teach" a robot to use it's arm properly. Read more
</article> </div>Boosting II: Gradient Boosting
Published:
Having briefly introduced AdaBoost in a previous post, today I want to explain briefly about another Boosting method called Gradient Boosting. In a broad sense, it's based on the same idea as used in AdaBoost, that is in every iteration we fit the residuals from the previous iteration. For regression problems, the ultimate goal is to make accurate predictions approximating the true value; for classification problems, the goal is to classify observations with the correct labels. A common way to measure the performance of a learning algorithm is by the use of a loss function. Here in Gradient Boosting, we adopt $latex L(y, F(x)) $ to denote a measure of distance between the true response value $latex y $ and an estimate or approximation $latex F(x) $. We can think of boosting as approximating an optimal $latex F^{*}(x) $ by a sequential additive expansion of the form: Read more
</article> </div>Boosting I: AdaBoost
Published:
Adaptive Boosting (AdaBoost) is one of the most commonly used Machine Learning methods for both classification and regression problems. It is a kind of ensemble learning meta-algorithm that can be applied on top of many other methods to improve performance, such as Bagging, Random Forests, etc. The idea is to combine a set of sequentially developed weak learners (rule of thumb) and come up with a single best final classifier at the end. Let me set the scene in a binary classification setting. Read more
</article> </div>New lights shed on Regression
Published:
The best way to end my weekend is, well, bragging (no...blogging) about what new stuff I found during the weekend. Equipped with a basic understanding of what Gaussian Process (GP) is from a previous masterclass, I decided to do some further reading in this fascinating area. Read more
</article> </div>Not That Model Selection
Published:
Model selection? Model selection! Maybe not the 'model selection' in your mind though. Actually, this blog is meant to be a memo on another masterclass us STOR-i students had today after the previous Gaussian Processes masterclass that I also blogged about. This masterclass was given by Prof. Gerda Claeskens from KU Leuven, who introduced us to the main criteria used in model selection. Read more
</article> </div>Musings about Soulmate
Published:
When can you find your soulmate (if you don't have one yet...), and how? Sounds like a philosophical and psychological question. Yes of course the question can be answered from those mental/spiritual perspectives, but let's do some maths here. Okay, here is the plan, I'm going to address this question in the two following ways.
Source: https://lovelace-media.imgix.net/
1. Drake Equation
Read more</article> </div>Model-Free Reinforcement Learning
Published:
In one of my previous posts, I wrote briefly about Markov Decision Processes (MDP). Today, let's move into the area of reinforcement learning (RL), which is strongly linked to MDP, in that it also deals with problems that require sequential decision making. The difference lies in that, instead of waiting till the end of the time horizon before we choose our policy (a set of actions specified for each time point at each state), we base our decisions on all the accumulative experience earned in the past. Real-life applications abound in a wide range of fields, including robotics, economics, and control theory. We term the decision maker in a system as an agent. The idea of RL is to empower the agent with both retro-dictive and pre-dictive powers. Read more
</article> </div>Bagging->Random Forests->Boosting
Published:
Today, I'm going to talk about a set of classification methods in machine learning in the order as the above title suggests. Keen readers may remember that I mentioned briefly in one of my earlier posts about classification methods for image recognition. There seems to be an everlasting discussion in machine learning community about the trade-off between prediction accuracy and model interpretability. The theme of today's post will reside more on the side of model interpretability. Irregardless of the not so self-evident names of Bagging, Random Forests, and Boosting; they are all branches of the unified tree-based methods. Read more
</article> </div>Stochastic Programming Buzz
Published:
Just immersed in another two-day masterclass about stochastic programming given by Prof. Stein Wallace, and with a book on Stochastic Programming written by him rests on my desk at arm's distance, I feel compelled to sort through my scribbled notes and write something on this. Read more
</article> </div>Falling in Love with Gaussian Processes
Published:
Today us STOR-i students had our first masterclass this year in Gaussian Processes given by a great speaker Neil Lawrence who specialises in Machine Learning. Gaussian process models are extremely flexible models in a way that it allows us to place probability distributions over functions. Read more
</article> </div>All Roads Lead to Rome - Image Recognition
Published:
Give you a picture and ask you to identify certain features in it, that's pretty easy. Give you two pictures and ask you to identify the common features in these two, that's fairly simple as well. What if you were given a stream of photos and being asked to identify certain features in each one of them, a bit intimidating? Not that bad with the help of some smart methodology and technology. Up till now, I've been made aware there are at least two types of methods to approach image recognition type of problems (Forgive me for being ignorant if you know more). Wavelets transforms can be applied to capture information in images, Classification methods are also being widely used in the setting of image pattern recognition. These two methods are my focus today, I will talk briefly about what they do, how they work, and the difference between these two. Read more
</article> </div>A Gentle Intro to MDP
Published:
Suffering from my 'memoryless property' a lot, with a MDP coursework alarm ringing in my head at the moment, and vaguely remember this dynamic programming research talk from Chris Kirkbride recently; I decided to organize all that I know about MDP in this one blog post. Hopefully after finish writing this post, I'll have a clean and organized storage of MDP in my head; and hopefully after finish reading this post, you'll get something useful as well. Read more
</article> </div>Time Series Primer/Revisited
Published:
This afternoon we had two soft intro talks to the area of Change Point Detection and Time Series Analysis (in a discrete setting mainly). In some extent, change point detection is time series as well. Many times, people are interested in detecting the changes over time and making good predictions for the future. Also, when one abrupt change happens, a natural question to ask is - Is the change intrinsic or is it just an outlier? These are interesting areas I'd like to explore further, but here let me refresh my memory with what I've done with time series so far. Read more
</article> </div>What on earth is Statistical Learning?
Published:
Everyone is talking about stats learning or machine learning, as if they are the sexiest terms on earth. Literally, does it have something to do with statistics? Or machine? Depends on the area you are in, and depends on the people you are talking to. According to the omniscient Wiki, stats learning deals with the problem of finding a predictive function based on data. As for my understanding of stats learning, it's no more than just a tool, that helps people to better understand their data and of course, to thus make better predictions. Usually, people classify stats learning into two categories, supervised and unsupervised. Read more
</article> </div>misc
Evolutionary Clustering Methods
Clustering is the process of grouping a set of unlabelled data objects (usually represented as a vector of measurements in a multidimensional space) into a number of clusters. The general objective of clustering is to obtain a partitioning of the data objects such that data within the same cluster are more similar to each other compared to data in different clusters. In some applications, we not only want to obtain static clustering results for one time step, but we are also interested in clustering data objects for an extended period of time. We want to make use of the clustering information from previous time steps to help produce consistent clustering results for the current time step. Ideally, we aim to produce interpretable and efficient clustering results for a set of data objects that evolve over time. Read more
Subspace Clustering with Application to Text Data
The Office for National Statistics (ONS) are experimenting with incorporating web-scraped data into the price index generating process. Clustering methods could be used to automate this process effectively and efficiently. Text data from the same category usually have a few terms in common, which can be modelled as from the same subspace. Read more
publications
Subspace Clustering of Very Sparse High-Dimensional Data
Published in 2018 IEEE International Conference on Big Data, 2018
In this paper we study the problem of clustering collections of very short texts using subspace clustering. This problem arises in many application areas such as product categorisation, fraud detection, and sentiment analysis. The main challenge lies in the fact that the vectorial representation of short texts is both high-dimensional, due to the large number of unique terms in the corpus, and extremely sparse, as each text contains a very small number of words with no repetition. Read more
Recommended citation: H. Peng, N. G. Pavlidis, I. A. Eckley and I. Tsalamanis, "Subspace Clustering of Very Sparse High-Dimensional Data", Proceedings of 2018 IEEE International Conference on Big Data , 2018.
Subspace Clustering with Active Learning
Published in 2019 IEEE International Conference on Big Data, 2019
Subspace clustering is a growing field of unsupervised learning that has gained much popularity in the computer vision community. Applications can be found in areas such as motion segmentation and face clustering. It assumes that data originate from a union of subspaces, and clusters the data depending on their corresponding subspaces. In practice, it is reasonable to assume that a limited amount of labels can be obtained, potentially at a cost. Therefore, algorithms that can effectively and efficiently incorporate this information to improve the clustering model are desirable. Read more
Recommended citation: H. Peng, and N. G. Pavlidis, "Subspace Clustering with Active Learning", Proceedings of 2019 IEEE International Conference on Big Data, 2019.
talks
Clustering Amazon Web-Scraped Data
Published:
The Office for National Statistics (ONS) have come across a lot of challenges when experimenting with using web-scraped price data for price index generation. The challenges posed by the size of the data, the frequency at which the data come from, and the quality of the data caused by the quality of the web-scrapers, just to name a few. Read more
Subspace Clustering of Very Sparse High-Dimensional Data
Published:
In this paper we consider the problem of clustering collections of very short texts using subspace clustering. This problem arises in many applications such as product categorisation, fraud detection, and sentiment analysis. Read more
Subspace Clustering with Active Learning
Published:
We propose an active learning framework that is especially designed to be beneficial in the setting of subspace clustering, and in particular, K-Subspace Clustering (KSC). KSC is a K-means like algorithm that alternates between fitting subspaces and allocating data objects to their closest subspace. The simplicity and low computational cost of this algorithm have helped it gain much popularity in the family of subspace clustering algorithms. However, it is well-known that KSC is very sensitive to the initialisation of cluster memberships and is prone to get stuck in local minima. Read more
Pi-Minute Thesis
Published:
This is a short talk on what my PhD is about. The length of the talk is limited to $\pi$ minutes. Read more
Subspace Clustering with Active Learning
Published:
Subspace clustering is a growing field of unsupervised learning that has gained much popularity in the computer vision community. Applications can be found in areas such as motion segmentation and face clustering. It assumes that data originate from a union of subspaces, and clusters the data depending on their corresponding subspaces. In practice, it is reasonable to assume that a limited amount of labels can be obtained, potentially at a cost. Therefore, algorithms that can effectively and efficiently incorporate this information to improve the clustering model are desirable. Read more
Representing and Clustering Text Data
Published:
This talk is be a primer on text embedding and clustering. We explore the ideas behind a couple state-of-the-art text embedding techniques, and provide an intuitive understanding into the mechanics of clustering. Read more
Subspace Clustering and Active Learning with Constraints
Published:
This talk covers some of the work I did with my supervisor Dr Nicos G. Pavlidis during my PhD. Read more
teaching
MA124, Calculus II
Undergraduate course, Boston University, Department of Mathematics & Statistics, 2015
My role: Graduate Teaching Assistant Read more
MATH220, Linear Algebra II
Undergraduate course, Lancaster University, Department of Mathematics & Statistics, 2017
My role: Graduate Teaching Assistant Read more
MATH230, Probability II
Undergraduate course, Lancaster University, Department of Mathematics & Statistics, 2017
My role: Graduate Teaching Assistant Read more
MATH245, Computational Mathematics
Undergraduate course, Lancaster University, Department of Mathematics & Statistics, 2018
My role: Graduate Teaching Assistant Read more
MSCI331, Data Mining for Direct Marketing and Finance
Undergraduate course, Lancaster University, Department of Management Science, 2018
My role: Graduate Teaching Assistant Read more
Introduction to R Programming
Undergraduate course, Lancaster University, STOR-i Centre for Doctoral Training, 2018
My role: Lecturer Read more
MATH101, Calculus
Undergraduate course, Lancaster University, Department of Mathematics & Statistics, 2018
My role: Graduate Teaching Assistant Read more
MATH111, Numbers and Relations
Undergraduate course, Lancaster University, Department of Mathematics & Statistics, 2018
My role: Graduate Teaching Assistant Read more
MATH102, Further Calculus
Undergraduate course, Lancaster University, Department of Mathematics & Statistics, 2018
My role: Graduate Teaching Assistant Read more
MATH112, Discrete Mathematics
Undergraduate course, Lancaster University, Department of Mathematics & Statistics, 2018
My role: Graduate Teaching Assistant Read more
MATH103, Probability I
Undergraduate course, Lancaster University, Department of Mathematics & Statistics, 2019
My role: Graduate Teaching Assistant Read more
MSCI331, Data Mining for Direct Marketing and Finance
Undergraduate course, Lancaster University, Department of Management Science, 2019
My role: Graduate Teaching Assistant Read more
MATH104, Statistics
Undergraduate course, Lancaster University, Department of Mathematics & Statistics, 2019
My role: Graduate Teaching Assistant Read more
MATH105, Linear Algebra I
Undergraduate course, Lancaster University, Department of Mathematics & Statistics, 2019
My role: Graduate Teaching Assistant Read more
MATH101, Calculus
Undergraduate course, Lancaster University, Department of Mathematics & Statistics, 2019
My role: Graduate Teaching Assistant Read more
MNGT213: Data Analysis for Management
Undergraduate course, Lancaster University, Department of Management Science, 2019
My role: Graduate Teaching Assistant Read more
MATH102, Further Calculus
Undergraduate course, Lancaster University, Department of Mathematics & Statistics, 2019
My role: Graduate Teaching Assistant Read more
MSCI331, Data Mining for Direct Marketing and Finance
Undergraduate course, Lancaster University, Department of Management Science, 2020
My role: Graduate Teaching Assistant Read more
MSCI526, Introduction to Intelligent Data Analysis
Postgraduate course, Lancaster University, Department of Management Science, 2020
My role: Graduate Teaching Assistant Read more