Wednesday, June 12, 2013

The Signum Function in Mathematical Programs

In mathematics, the signum function is defined by\[ \DeclareMathOperator{\sgn}{sgn} \sgn(x)=\begin{cases} +1 & x>0\\ \phantom{+}0 & x=0\\ -1 & x\le0 \end{cases}. \]Occasionally, someone wants to create the equivalent of the signum function in a mathematical programming model. Since math programs are generally written using exclusively nonnegative variables, this effectively translates to tying a binary variable $z$ to the continuous variable $x$ so that $z=1\Longleftrightarrow x>0$. (If $x$ is unrestricted in sign, we use the difference $z^+-z^-$ of two binary variables $z^+$ and $z^-$, such that $z^+=1\Longleftrightarrow x>0$ and $z^- =1\Longleftrightarrow x<0$.) It turns out that this can be done in some cases but only approximated in others.

Suppose that $x$ and $z$ are components of some larger model\[ \begin{array}{lrcl} \mathrm{optimize} & f(x,y,z)\\ \mathrm{s.t.} & (x,y,z) & \in & \mathcal{P}\\ & x & \ge & 0\\ & y & \in & \mathcal{Y}\\ & z & \in & \{0,1\} \end{array} \]where $\mathcal{P}$ and the convex hull of $\mathcal{Y}$ are closed, convex polyhedra. These are the typical assumptions for a mathematical program. By requiring $\DeclareMathOperator{\conv}{conv} \conv(\mathcal{Y})$ rather than $\mathcal{Y}$ to be convex, I'm allowing $y$ to be an arbitrary mix of continuous and integer-valued variables. I'm also going to assume that the feasible region \[\mathcal{F}=\{(x,y,z)\in\mathcal{P}\,\left|\,x\ge 0,y\in\mathcal{Y},z\in \{0,1\}\right.\}\]is bounded. Models for real-world problems usually are, and arguably always should be, bounded. If $\mathcal{F}$ is unbounded, it should be possible to bound it by asserting liberal but finite bounds on $x$ and $y$.

Let $\gamma=\inf\{x\,\left|\,(x,y,z)\in\mathcal{F}, x\gt 0\}\right.\ge0$, and let $\Gamma=\sup\{x\,\left|\,(x,y,z)\in\mathcal{F}\right.\}<\infty$. Half the battle is solved by adding the constraint \[x\le \Gamma z,\]which enforces \[x>0 \implies z=1.\]The smaller $\Gamma$ is, the tighter the linear programming (LP) relaxation of the model is, and so (likely) the shorter the time required by the solver to optimize the model.

Now comes the tricky part. If $\gamma>0$ (and, more specifically, $\gamma$ is large enough to be distinguishable from 0 when the solver allows for rounding error), we can finish the job by adding the constraint \[x\ge\gamma z,\]which enforces\[x=0\implies z=0.\]The problematic case is when $\gamma=0$. If $\gamma=0$, we can find a feasible sequence $(x_n,y_n,z_n)$ such that $\lim_{n\rightarrow\infty}x_n=0$. Presumably $z_n=1$ for all $n$. Since $\conv(\mathcal{F})$ is closed and bounded (hence compact) and $\conv(\mathcal{Y})$ is closed, we can pass to a subsequence (which I will again write $(x_n,y_n,z_n)$ to avoid creating messy subscripts) such that $y_n\rightarrow y$ for some $y\in\conv(\mathcal{Y})$ and thus $(x_n,y_n,z_n)\rightarrow(0,y,1)\in\conv(\mathcal{F})$. With $(0,y,1)\in\conv(\mathcal{F})$ for some $y$, it is not possible to enforce\[x=0\implies z=0.\]The best we can do is to choose some $\epsilon>0$ (with $\epsilon$ large enough that the solver sees it as not within rounding error of zero), add the constraint\[x\ge\epsilon z,\]and in the process modify the model by eliminating all previously feasible solutions with $0\lt x \lt \epsilon$. Hopefully, we can choose $\epsilon$ so that the true optimum is not eliminated.

Incidentally, if we do not know $\gamma$ and $\Gamma$ a priori, and do not have at hand reasonable estimates $g$ and $G$ with $0\lt g\le \gamma$ and $\Gamma \le G\lt \infty$, we can obtain them (at some computational expense) by solving two LPs, first minimizing $x$ over $\conv(\mathcal{F})$ (to find $\gamma$ and then maximizing $x$ over $\conv(\mathcal{F})$ to find $\Gamma$.

No comments:

Post a Comment

Due to intermittent spamming, comments are being moderated. If this is your first time commenting on the blog, please read the Ground Rules for Comments. In particular, if you want to ask an operations research-related question not relevant to this post, consider asking it on Operations Research Stack Exchange.