Integer Programming

The LP Procedure

Integer Programming

Formulations of mathematical programs often require that some of the decision variables take only integer values. Consider the formulation

min c^Tx

subject to

$A x \geq, =, \leq b$
$\ell \leq x \leq u$
$x_{i \in {\cal S}}$ is integer

The set of indices S identifies those variables that must take only integer values. When S does not contain all of the integers between 1 and n, inclusive, problem (mip) is called a mixed-integer program. Otherwise, it is known as an integer program. Let x^opt(mip) denote an optimal solution to (mip) . An integer variable with bounds between 0 and 1 is also called a binary variable.

Specifying the Problem

An integer or mixed-integer problem can be solved with PROC LP. To solve this problem, you must identify the integer variables. You can do this with a row in the input data set that has the keyword INTEGER for the type variable. Any variable that has a nonmissing and nonzero value for this row is interpreted as an integer variable. It is important to note that integer variables must have upper bounds explicitly defined using the UPPERBD keyword. The values in the INTEGER row not only identify those variables that must be integral, but they also give an ordering to the integer variables that can be used in the solution technique.

You can follow the same steps to identify binary variables. For the binary variables, there is no need to supply any upper bounds.

Following the rules of sparse data input format, you can also identify individual integer or binary variables.

The Branch and Bound Technique

The branch and bound approach is used to solve integer and mixed-integer problems. The following discussion outlines the approach and explains how to use several options to control the procedure.

The branch and bound technique solves an integer program by solving a sequence of linear programs. The sequence can be represented by a tree, with each node in the tree being identified with a linear program that is derived from the problems on the path leading to the root of the tree. The root of the tree is identified with a linear program that is identical to (mip) except that S is empty. This relaxed version of (mip), lp(0), can be written as:

x^opt(0) = min c^Tx

subject to

$A x \geq, =, \leq b$
$\ell \leq x \leq u$

The branch and bound approach generates linear programs along the nodes of the tree using the following scheme. Consider x^opt(0), the optimal solution to lp(0). If x^opt(0)_i is integer for all $i\in{\cal S}$ , then x^opt(0) is optimal in (mip). Suppose for some $i\in{\cal S}$ , x^opt(0)_i is nonintegral. In that case, define two new problems (lp(1)) and (lp(2)) , descendents of the parent problem (lp(0)). The problem (lp(1)) is identical to (lp(0)) except for the additional constraint

$x_i\leq \lfloor x^{opt}(0)_i \rfloor$

and the problem (lp(2)) is identical to (lp(0)) except for the additional constraint

$x_i\geq \lceil x^{opt}(0)_i \rceil$

The notation $\lceil y \rceil$ means the smallest integer greater than or equal to y, and the notation $\lfloor y \rfloor$ means the largest integer less than or equal to y. Note that the two new problems do not have x^opt(0) as a feasible solution, but because the solution to (mip) must satisfy one of the preceding constraints, x^opt(mip)_i must satisfy one of the new constraints. The two problems thus defined are called active nodes in the branch and bound tree, and the variable i is called the branching variable.

Next, the algorithm chooses one of the problems associated with an active node and attempts to solve it using the dual simplex algorithm. The problem may be infeasible, in which case the problem is dropped. If it can be solved, and it in turn does not have an integer solution (that is a solution for which x_i is integer for all $i\in{\cal S}$ ), then it defines two new problems. These new problems each contain all of the constraints of the parent problems plus the appropriate additional one.

Branching continues in this manner until either there are no active nodes or an integer solution is found. When an integer solution is found, its objective value provides a bound for the objective of (mip). In particular, if z is the objective value of the current best integer solution, then any active problems whose parent problem has objective value $\geq z$ can be discarded (assuming that the problem is a minimization). This can be done because all problems that descend from this parent will also have objective value $\geq z$ .This technique is known as fathoming. When there are no active nodes remaining to be solved, the current integer solution is optimal in (mip). If no integer solution has been found, then (mip) is (integer) infeasible.

It is important to realize that integer programs are NP-complete. Roughly speaking, this means that the effort required to solve them grows exponentially with the size of the problem. For example, a problem with 10 binary variables can, in the worst case, generate 2¹⁰ =1024 nodes in the branch and bound tree. A problem with 20 binary variables can, in the worst case, generate 2²⁰ =1048576 nodes in the branch and bound tree. Although the algorithm is unlikely to have to generate every single possible node, the need to explore even a small fraction of the potential number of nodes for a large problem can be resource intensive.

The Integer Iteration Log

To help monitor the growth of the branch and bound tree, the LP procedure reports on the status of each problem that is solved. The report, displayed in the Integer Iteration Log, can be used to reconstruct the branch and bound tree. Each row in the report describes the results of the attempted solution of the linear program at a node in the tree. In the following discussion, a problem on a given line in the log is called the current problem. The following columns are displayed in the report:

Iter

identifies the number of the branch and bound iteration.

Problem

identifies how the current problem fits in the branch and bound tree.

Condition

reports the result of the attempted solution of the current problem. Values for Condition are

ACTIVE: The current problem was solved successfully.
INFEASIBLE: The current problem is infeasible.
FATHOMED: The current problem cannot lead to an improved integer solution and therefore it is dropped.
SINGULAR: A singular basis was encountered in attempting to solve the current problem. Solution of this relaxed problem is suspended and will be attempted later if necessary.
SUBOPTIMAL: The current problem has an integer feasible solution.

Objective

reports the objective value of the current problem.

Branched

names the variable that is branched in subtrees defined by the descendents of this problem.

Value

gives the current value of the variable named in the column labeled Branched.

Sinfeas

gives the sum of the integer infeasibilities in the optimal solution to the current problem.

Active

reports the total number of nodes currently active in the branch and bound tree.

Proximity

reports the gap between the best integer solution and the current lower (upper for maximizations) bound of all active nodes.

To reconstruct the branch and bound tree from this report, consider the interpretation of iteration j. If Iter=j and Problem=k, then the problem solved on iteration j is identical to the problem solved on iteration | k | with an additional constraint. If k >0, then the constraint is an upper bound on the variable named in the Branched column on iteration | k |. On the other hand, if k <0 then the constraint is a lower bound on that variable. The value of the bound can be obtained from the value of Value in iteration | k | as described in the previous section.

Example 3.8 in the section "Examples" shows an Integer Iteration Log in its output.

Controlling the Branch and Bound Search

There are several options you can use to control branching. This is accomplished by controlling the program's choice of the branching variable and of the next active node. In the discussion that follows, let

$f_i(k)= x^{opt}(k)_i-\lfloor x^{opt}(k)_i \rfloor$

where x^opt(k) is the optimal solution to the problem solved on iteration k.

The CANSELECT= option directs the choice of the next active node. Valid keywords for this option include LIFO, FIFO, OBJ, PROJECT, PSEUDOC, and ERROR. The following list describes the action that each of these causes when the procedure must choose for solution a problem from the list of active nodes.

LIFO

chooses the last problem added to the tree of active nodes. This search has the effect of a depth first search of the branch and bound tree.

FIFO

chooses the first node added to the tree of active nodes. This search has the effect of a breadth first search of the branch and bound tree.

OBJ

chooses the problem whose parent has the least (largest if the problem is a maximization) objective value.

PROJECT

chooses the problem with the largest (least if the problem is a maximization) projected objective value. The projected objective value is evaluated using the sum of integer infeasibilities, s(k), associated with an active node (lp(k)), defined by:

$s(k) = \Sigma_{i\in{\cal S}}{\rm min} \{f_i(k), (1-f_i(k))\}$ .

An empirical measure of the rate of increase (decrease) in the objective value is defined as

$\lambda = (z^*-z(0)) / s(0)$

where

z(k): is the optimal objective value for (lp(k)).
z^*: is the objective value of the current best integer solution.

The projected objective value for problems (lp(k+1)) and (lp(k+2)) is defined as

$z(k)+ \lambda s(k)$

PSEUDOC

chooses the problem with the largest (least if the problem is a maximization) projected pseudocost. The projected pseudocost is evaluated using the weighted sum of infeasibilities s_w(k) associated with an active problem (lp(k)), defined by

$s_w(k)=\Sigma_{i\in{\cal S}}{\rm min} \{d_i(k)f_i(k), u_i(k)(1-f_i(k))\}$

The weights u_i and d_i are initially equal to the absolute value of the ith objective coefficient and are updated at each integer iteration. They are modified by examining the empirical marginal change in the objective as additional constraints are placed on the variables in S along the path from (lp(0)) to a node associated with an integer feasible solution. In particular, if the definition of problems (lp(k+1)) and (lp(k+2)) from parent (lp(k)) involve the addition of constraints $x_i \leq \lfloor x^{opt}(k)_i\rfloor$ and $x_i \geq \lceil x^{opt}(k)_i\rceil$ , respectively, and one of them is on a path to an integer feasible solution, then either

d_i(k)=(z(k+1)-z(k))/f_i(k)

u_i(k)=(z(k+2)-z(k))/(1-f_i(k))

Note the similarity between s_w(k) and s(k). The weighted quantity s_w(k) accounts to some extent for the influence of the objective function. The projected pseudocost for problems (lp(k+1)) and (lp(k+2)) is defined as

$z_w(k) \equiv z(k)+s_w(k)$

ERROR

chooses the problem with the largest error. The error associated with problems (lp(k+1)) and (lp(k+2)) is defined as

(z^*-z_w(k))/(z^*-z(k))

The BACKTRACK= option controls the search for the next problem. This option can take the same values as the CANSELECT= option. In addition to the case outlined under the DELTAIT= option, backtracking is required as follows based on the CANSELECT= option in effect:

If CANSELECT=LIFO and there is no active node in the portion of the active tree currently under exploration with a bound better than the value of WOBJECTIVE=, then the procedure must backtrack.
If CANSELECT=FIFO, PROJECT, PSEUDOC, and ERROR and the bound corresponding to the node under consideration is not better than the value of WOBJECTIVE=, then the procedure must backtrack.

The default value is OBJ.

The VARSELECT= option directs the choice of branching variable. Valid keywords for this option include CLOSE, FAR, PRIOR, PSEUDOC, PRICE, and PENALTY. The following list describes the action that each of these causes when x^opt(k), an optimal solution of problem (lp(k)), is used to define active problems (lp(k+1)) and (lp(k+2)).

CLOSE: chooses as branching variable the variable x_i such that i minimizes
$\{{\rm min} \{f_j(k),(1-f_j(k))\}| j \in{\cal S}$ and IEPSILON $\leq f_j(k)\leq1-$ IEPSILON }
FAR: chooses as branching variable the variable x^opt(k)_i such that i maximizes
$\{{\rm min} \{f_j(k),(1-f_j(k))\}| j \in{\cal S}$ and IEPSILON $\leq f_j(k)\leq1-$ IEPSILON }
PRIOR: chooses as branching variable x_i such that $i\in{\cal S}$ ,x^opt(k)_i is nonintegral, and variable i has the minimum value in the INTEGER row in the input data set. This choice for the VARSELECT= option is recommended when you have enough insight into the model to identify those integer variables that have the most significant effect on the objective value.
PENALTY: chooses as branching variable x_i such that $i\in{\cal S}$ and a bound on the increase in the objective of (lp(k)) (penalty) resulting from adding the constraint
$x_i \leq \lfloor x^{opt}(k)_i\rfloor$ or
$x_i \geq \lceil x^{opt}(k)_i\rceil$
is maximized. The bound is calculated without pivoting using techniques of sensitivity analysis (Garfinkel and Nemhauser 1972). Because the cost of calculating the maximum penalty can be large if S is large, you may want to limit the number of variables in S for which the penalty is calculated. The penalty is calculated for PENALTYDEPTH= variables in S .
PRICE: chooses as branching variable x_i such that $i\in{\cal S}$ , x_i is nonintegral, and variable x_i has the minimum price coefficient (maximum for maximization).
PSEUDOC: chooses as branching variable the variable x_i such that i maximizes
$\{{\rm min} \{d_jf_j(k),u_j(1-f_j(k))\} | j\in {\cal S}$ and IEPSILON $\leq f_j(k)\leq1-$ IEPSILON }

The weights u_j and d_j are initially equal to the absolute value of the jth objective coefficient and are updated whenever an integer feasible solution is encountered. See the discussion on the CANSELECT= option for details on the method of updating the weights.

Customizing Search Heuristics

Often a good heuristic for searching the branch and bound tree of a problem can be found. You are tempted to continue using this heuristic when the problem data changes but the problem structure remains constant. The ability to reset procedure options interactively enables you to experiment with search techniques in an attempt to identify approaches that perform well. Then you can easily reapply these techniques to subsequent problems.

For example, the PIP branch and bound strategy (Crowder, Johnson, and Padberg 1983) describes one such heuristic. The following program uses a similar strategy. Here, the OBJ rule (choose the active node with least parent objective function in the case of a minimization problem) is used for selecting the next active node to be solved until an integer feasible solution is found. Once such a solution is found, the search procedure is changed to the LIFO rule: choose the problem most recently placed in the list of active nodes.

   proc lp canselect=obj ifeasiblepause=1;
   run;
      reset canselect=lifo ifeasiblepause=9999999;
   run;

Further Discussions on AUTO and CONTROL= options

Let us consider a minimization problem. At each integer iteration, PROC LP will select a node to solve from a pool of active nodes. The best bound strategy (CANSELECT=OBJ) will pick the node with the smallest projected objective value. This strategy improves the lower bound of the integer program and usually takes less number of integer iterations. One disadvantage is that it needs to recalculate the inverse of the basis matrix almost every integer iteration, which is relatively expensive. Another disadvantage is this strategy does not pay attention to improving the upper bound of the integer program. Thus the number of active nodes tends to grow rapidly if PROC LP cannot find an optimal integer solution soon.

LIFO strategy is very efficient and does not need to calculate the inverse of the basis matrix unless the previous node is fathomed. It is a depth first strategy so it tends to find an integer feasible solution quickly. However, this strategy will pick nodes locally and usually takes longer integer iterations than the best bound strategy.

There is another strategy that most people have ignored. Here we called it best upper bound strategy. With this strategy, when we select an active node each time, instead of picking the node with the smallest projected objective value we select the one with the largest projected objective value. This strategy is as efficient as the LIFO strategy. Moreover, it selects active nodes globally. This strategy tries to improve the upper bound of the integer program by search for new integer feasible solutions. It also fathomes active nodes quickly and keeps the total number of active nodes below current level. A disadvantage is that is may evaluate more nodes that do not have any potential in finding an optimal integer solution.

The best bound strategy has the advantage of improving the lower bound. The LIFO strategy has the advantages of efficiency and finding a local integer feasible solution. The best upper bound strategy has the advantages of keeping the size of active nodes under control and at the same time trying to identify any potential integer feasible solution globally.

In general the best bound strategy is the prefer strategy by people. In many situations, other strategies are doing better. For example, if we have found an integer optimal solution but we do not know that no solutions are better than it, we still have to enumerate all possible active nodes. Then the three strategies will take basically the same number of integer iterations after an optimal solution is found but not identified. Since LIFO and best upper bound strategies are very efficient per integer iteration, both of them will outperform the best bound strategy.

The CONTROL= option combines the above three strategies naturally and provides a simple control parameter in [0, 1] dealing with different integer programming problems and different solution situation. The AUTO option automatically sets and adjusts the CONTROL= parameter so that we do not need to know any problem structure or deciding a node selection strategy in advance.

Since the LIFO strategy is so cheap, we shall try to use it as much as possible in the combinations. We call the following process a diving process. Starting from an active node, we apply the LIFO strategy as much as we can until the current node becomes feasible or is fathomed, or exceed a preset limit. During this process, there is no inverse matrix calculation involved except for the first node. When the diving process is over, we shall apply one of the above three strategies to select the next starting node. We shall call the set of combinations a cycle.

The control parameter r controls the frequency of the three applying strategies and the depth of the diving process. It starts with a pure best bound strategy at r=0, then gradually increase the frequency of the diving process and its depth in a cycle as r increases. At r=0.5, the cycle contains a best bound strategy plus a full diving process. After r=0.5, the number of the diving processes will gradually increase in a cycle. In addition, the best upper bound strategy will join the cycle. As r continues to increase, the frequency of the best upper bound strategy will increase. At r=1, it becomes a pure best upper bound strategy.

The AUTO option will automatically adjust the value of the CONTROL= option. At the start, it sets CONTROL=0.7, which emphasizes on finding an upper bound. After an integer feasible solution is found, it sets CONTROL=0.5, which emphasizes on efficiency and lower bound improvement. When the number of active nodes grows over the default or user defined limit m, it is an indication that a better upper bound is needed. The AUTO option will start to increase the value of CONTROL= from 0.5. If the size of active nodes continues to grow, so will be the value of CONTROL= option. When the size of active nodes reaches to the default or user defined crash point, CONTROL= will be set to one. At this stage, the growth of active nodes is stopped. When the size of active nodes reduces, AUTO will decrease the value of CONTROL= option.

Saving and Restoring the List of Active Nodes

The list of active nodes can be saved in a SAS data set for use at a subsequent invocation of PROC LP. The ACTIVEOUT= option in the PROC LP statement names the data set into which the current list of active nodes is saved when the procedure terminates due to an error termination condition. Examples of such conditions are time limit exceeded, integer iterations exceeded, and phase 3 iterations exceeded. The ACTIVEIN= option in the PROC LP statement names a data set that can be used to initialize the list of active nodes. To achieve the greatest benefit when restarting PROC LP, use the PRIMALOUT= and PRIMALIN= options in conjunction with the ACTIVEOUT= and ACTIVEIN= options. See Example 3.10 in the section "Examples" for an illustration.

Chapter Contents
Previous
Next
Top