Stochastic Programming Buzz

4 minute read

Published:

Just immersed in another two-day masterclass about stochastic programming given by Prof. Stein Wallace, and with a book on Stochastic Programming written by him rests on my desk at arm's distance, I feel compelled to sort through my scribbled notes and write something on this.

Real world problems almost always include some uncertainty, and stochastic programming is one way to go. At the core, stochastic programming is based on an optimization framework, but with some additional stochastic uncertainties to it. At the beginning of the masterclass, Stein talked about two important features in models: robustness and flexibility. I like the analogy he used to think of robustness and flexibility as tree and straw man. When a storm comes, a tree is either "robust" enough to withstand the strong wind or it breaks, whereas a straw man tend be flexible and sway with the wind without breaking itself. In stochastic programming language, we want some flexible treatment of the parameters, and at the same time preserve and strengthen the robustness of the model. To put in simple words, we want to find a solution that is feasible in most scenarios, and optimal in some.

In Two-Stage Stochastic Programming, we can think of the decisions as being made in two stages: irreversible decision (decision being made at present without observing what's gonna happen next), and recourse decision (decision that serves the purpose of making the situation better with post-observation after the irreversible decision has been made). Following the notation used by Wallace, the structure of such a problem can be outlined as follows:

$latex min \hspace{0.3cm} c^{T}x + Q(x) $

$latex s.t. \hspace{0.3cm} Ax=b, x \geqslant  0 $

where $latex Q(x)=\sum_{j}p^{j}Q(x,\xi^{j} ) $ and $latex Q(x,\xi)=min\left \{ q(\xi)^{T}y|W(\xi)y=h(\xi)-T(\xi)x; y\geqslant 0 \right \} $. Here, $latex c^{T}x $ represents the irreversible decision, and $latex Q(x) $ represents the recourse decision, which is an expectation of different scenarios. $latex p^{j} $ is the probability that the jth scenario $latex \xi^{j} $ happens. This is a clear representation which takes into account the possibility of all possible scenarios, but it will surely meet its computational bottleneck at some point as the size of scenarios grow. A workaround that then deals with this issue is by replacing the expected recourse function Q(x) with a new variable $latex \theta $ into our objective function. It then serves as the upper bound for Q(x) when we put Q(x) into one of the constraints. This way, instead of solving as many number of LPs as the number of scenarios, we instead only solve one optimization problem, which can be formulated as follows:

$latex min \hspace{0.3cm} c^{T}x + \theta $

$latex s.t. \hspace{0.3cm} Ax=b,$

$latex \hspace{0.3cm} \theta \geqslant Q(x), $

$latex \hspace{0.3cm} -\gamma_{k}^{T}x \geqslant \delta_{k}, \hspace{0.3cm} for k=1,...,K $

$latex \hspace{0.3cm} x \geqslant 0, $

The new formulation of the problem is being blind in a smart way in order to save unnecessary computational cost. As you may have realized, Q(x) still represents a large number of optimization problems no matter where we put them (either in the objective function or in the constraint matrix). Well, what if we drop they totally? The answer is, that doesn't sound so wild. If it was indeed such a nasty constraint, drop it, just solve the much nicer optimization problem left. After obtaining the solution for the much easier optimization problem, we can then check the feasibility of the nasty constraint, and see whether it complains or not. If $latex \hat{\theta}\geqslant Q(\hat{x}) $, we are lucky enough and can call it a day; else, more needs to be done.

Wallace mentioned that the L-Shaped decomposition algorithm can be applied to tackle these problems in the recourse stage without numerous function evaluations for Q(x). The idea behind this algorithm is to build a master problem in x, and then only evaluate the recourse function exactly as a subproblem.  Let's spare all the technicalities, the main trick here is

$latex Q(x)=\sum_{j}p^{j}Q(x,\xi^{j} )= \sum_{j} p^{j}q(\xi ^{j})^{T} y^{j} $

with j realizations, where $latex y^{j} $ is the optimal second stage solution yielding $latex Q(\hat{x},\xi^{j}) $. This extensive block structure has a L-shape, which gives the name for this algorithm. Two types of constraints (feasibility cuts, optimality cuts) should be sequentially applied before we obtain the optimal solution.

It seems to me this procedure still relies somewhat on how lucky you are, or how smart you are when choosing the right cut. Nevertheless, it does give us a huge step-up computationally. Keen readers can refer to this nice little lecture note on how to carry out the algorithm step by step.

References:

[1] Kall P. and S.W. Wallace, Stochastic Programming, pp. 147-164, 1994.

[2] A. Shapiro and A. Philpott, A Tutorial on Stochastic Programming, 2007.