Integer Programming
Formulations of mathematical
programs often require that some of the decision variables
take only integer values. Consider the formulation
min cTx
subject to
The set of indices S identifies those variables
that must take only integer values.
When S does not contain all of the integers between 1 and
n, inclusive, problem (mip) is called a mixed-integer program.
Otherwise, it is known as an integer program.
Let xopt(mip) denote an optimal solution to
(mip) . An integer variable with bounds between 0 and 1 is
also called a binary variable.
An integer or mixed-integer problem can be
solved with PROC LP. To solve this problem, you must identify the
integer variables. You can do this with a row in the input data set
that has the keyword INTEGER for the type variable. Any variable that
has a nonmissing and nonzero value for this row is interpreted as an
integer variable. It is important to note that integer variables must
have upper bounds explicitly defined using the UPPERBD keyword. The
values in the INTEGER row not only identify those variables that must be
integral, but they also give an ordering to the integer variables that
can be used in the solution technique.
You can follow the same steps to identify binary variables. For the binary
variables, there is no need to supply any upper bounds.
Following the rules of sparse data input format, you can also identify
individual integer or binary variables.
The branch and bound approach is
used to solve integer and mixed-integer problems.
The following discussion outlines the approach and explains how
to use several options to control the procedure.
The branch and bound technique solves an integer program by
solving a sequence of linear programs.
The sequence can be represented by a tree, with
each node in the tree being identified with a linear program
that is derived from the problems on the path
leading to the root of the tree.
The root of the tree is identified with a linear program
that is identical to (mip) except that S is empty.
This relaxed version of (mip), lp(0), can be written as:
xopt(0) = min cTx
subject to
The branch and bound approach generates
linear programs along the nodes of the tree using
the following scheme.
Consider xopt(0), the optimal solution
to lp(0).
If xopt(0)i is integer for all
, then xopt(0)
is optimal in (mip).
Suppose for some , xopt(0)i is nonintegral.
In that case, define two new problems (lp(1)) and (lp(2)) ,
descendents of the parent problem (lp(0)).
The problem (lp(1)) is identical to (lp(0)) except for
the additional constraint
and the problem (lp(2)) is identical to (lp(0)) except for
the additional constraint
The notation means the smallest integer
greater than or equal to y, and the notation
means the largest integer less than or equal to y.
Note that the two new problems
do not have xopt(0) as a feasible solution, but because
the solution to (mip) must satisfy one of
the preceding constraints, xopt(mip)i must
satisfy one of the new constraints.
The two problems thus defined are
called active nodes in the branch and bound tree, and the
variable i is called the branching variable.
Next, the algorithm chooses one of the problems associated with an
active node and attempts to solve it using the dual simplex algorithm.
The problem may be infeasible, in which case the problem is dropped.
If it can be solved, and it in turn does not have an integer solution
(that is a solution for which xi is
integer for all ), then it
defines two new problems.
These new problems each contain all of the
constraints of the parent problems plus the appropriate additional one.
Branching continues in this manner until either there are no
active nodes or an integer solution is found.
When an integer solution is found, its objective value
provides a bound for the objective of (mip).
In particular, if z is the objective value of
the current best integer solution, then any active problems
whose parent problem has objective value can
be discarded (assuming that the problem is a minimization).
This can be done because all problems that descend from
this parent will also have objective value .This technique is known as fathoming.
When there are no active nodes remaining to be solved,
the current integer solution is optimal in (mip).
If no integer solution has been found,
then (mip) is (integer) infeasible.
It is important to realize that integer programs are
NP-complete.
Roughly speaking, this means that the effort required
to solve them grows exponentially with the size of the problem.
For example, a problem with 10 binary variables can, in the
worst case, generate 210 =1024 nodes in the branch and bound tree.
A problem with 20 binary variables can, in the
worst case, generate 220 =1048576 nodes in the branch and
bound tree.
Although the algorithm is unlikely to have to
generate every single possible node, the need to explore
even a small fraction of
the potential number of nodes for
a large problem can be resource intensive.
To help monitor the growth of the branch and bound tree, the LP
procedure reports on the status of each
problem that is solved.
The report, displayed in the Integer Iteration
Log, can be used to reconstruct the branch and bound tree.
Each row in
the report describes the results of the attempted solution of the linear
program at a node in the tree.
In the following discussion, a problem on
a given line in the log is called the current problem.
The following columns are displayed in the report:
- Iter
- identifies the number of the branch and bound iteration.
- Problem
- identifies how the current problem fits in the branch and
bound tree.
- Condition
- reports the result of the attempted solution of
the current problem. Values for Condition are
- ACTIVE
- The current problem was solved successfully.
- INFEASIBLE
- The current problem is infeasible.
- FATHOMED
- The current problem cannot lead to an improved
integer solution and therefore it is dropped.
- SINGULAR
- A singular basis was encountered
in attempting to solve the current problem.
Solution of this relaxed problem is suspended
and will be attempted later if necessary.
- SUBOPTIMAL
- The current problem has an integer feasible solution.
- Objective
- reports the objective value of the current problem.
- Branched
- names the variable that is branched in
subtrees defined by the descendents of this problem.
- Value
- gives the current value of the variable named in the
column labeled Branched.
- Sinfeas
- gives the sum of the integer infeasibilities in the
optimal solution to the current problem.
- Active
- reports the total number of nodes currently active in
the branch and bound tree.
- Proximity
- reports the gap between the best integer solution
and the current lower (upper for maximizations)
bound of all active nodes.
To reconstruct
the branch and bound tree from this report, consider the interpretation
of iteration j.
If Iter=j and Problem=k, then the
problem solved on iteration j is identical to the problem solved
on iteration | k | with an additional constraint.
If k >0, then
the constraint is an upper bound on the variable named in the Branched
column on iteration | k |.
On the other hand, if k <0 then the
constraint is a lower bound on that variable.
The value of the bound
can be obtained from the value of Value in iteration | k | as
described in the previous section.
Example 3.8 in the section "Examples"
shows an Integer Iteration Log in its output.
There are several options
you can use to control branching. This is accomplished by controlling
the program's choice of the branching variable and of the next active
node. In the discussion that follows, let
where xopt(k) is the optimal solution to the problem
solved on iteration k.
The CANSELECT=
option directs the choice of the next active node.
Valid keywords for this option include LIFO, FIFO, OBJ,
PROJECT, PSEUDOC, and ERROR.
The following list describes the action
that each of these causes when the procedure must choose for solution a
problem from the list of active nodes.
- LIFO
- chooses the last problem added to the tree of active nodes.
This search has the effect of a depth first search
of the branch and bound tree.
- FIFO
- chooses the first node added to the tree of active nodes.
This search has the effect of a breadth first search of the
branch and bound tree.
- OBJ
- chooses the problem whose parent has the least (largest
if the problem is a maximization) objective value.
- PROJECT
- chooses the problem with the largest (least if the
problem is a maximization) projected objective value.
The projected objective value is evaluated using
the sum of integer infeasibilities, s(k),
associated with an active node (lp(k)), defined by:
.
An empirical measure of the rate of increase (decrease)
in the objective value is defined as
where
- z(k)
- is the optimal objective value for (lp(k)).
- z*
- is the objective value of the current best
integer solution.
The projected objective value for problems (lp(k+1)) and (lp(k+2))
is defined as
- PSEUDOC
- chooses the problem with the largest
(least if the problem is a maximization) projected pseudocost.
The projected pseudocost is evaluated using the weighted sum
of infeasibilities sw(k) associated with an active problem
(lp(k)), defined by
The weights ui and di are initially equal to
the absolute value of the ith objective coefficient and are
updated at each integer iteration.
They are modified by examining the empirical marginal change
in the objective as additional constraints are placed on the variables
in S along the path from
(lp(0)) to a node associated
with an integer feasible solution.
In particular, if the definition of problems (lp(k+1)) and
(lp(k+2)) from
parent (lp(k)) involve the addition of constraints
and
, respectively,
and one of them is on a path to an integer feasible solution, then
either
di(k)=(z(k+1)-z(k))/fi(k)
or
ui(k)=(z(k+2)-z(k))/(1-fi(k))
Note the similarity between sw(k) and s(k).
The weighted quantity sw(k) accounts to some extent
for the influence of the objective function.
The projected pseudocost for problems (lp(k+1)) and
(lp(k+2)) is defined as
- ERROR
- chooses the problem with the largest error.
The error associated with problems (lp(k+1)) and (lp(k+2))
is defined as
(z*-zw(k))/(z*-z(k))
The BACKTRACK=
option controls the search for the next problem.
This option can take the same values as the CANSELECT= option.
In addition to the case outlined under the DELTAIT= option, backtracking
is required as follows based on the
CANSELECT= option in effect:
-
If CANSELECT=LIFO and there is no active node in the portion of the
active tree currently under exploration with a bound better than the value
of WOBJECTIVE=, then the procedure must backtrack.
-
If CANSELECT=FIFO, PROJECT, PSEUDOC, and ERROR and the bound corresponding
to the node under consideration is not better than the value of
WOBJECTIVE=, then the procedure must backtrack.
The default value is OBJ.
The VARSELECT=
option directs the choice of branching variable.
Valid keywords for this option include CLOSE, FAR, PRIOR,
PSEUDOC, PRICE, and PENALTY.
The following list describes the action that each of
these causes when xopt(k), an optimal solution
of problem (lp(k)), is used to define active
problems (lp(k+1)) and (lp(k+2)).
- CLOSE
- chooses as branching variable
the variable xi such that i minimizes
and
IEPSILON
IEPSILON
}
- FAR
- chooses as branching variable the variable
xopt(k)i such that i maximizes
and
IEPSILON
IEPSILON
}
- PRIOR
- chooses as branching variable xi such that ,xopt(k)i is nonintegral, and variable i
has the minimum value in the INTEGER row in the input data set.
This choice for the VARSELECT= option is recommended when you
have enough insight into the model to identify
those integer variables that have the most significant
effect on the objective value.
- PENALTY
- chooses as branching variable xi
such that and a bound on the increase in the objective of
(lp(k)) (penalty) resulting from adding the constraint
or
is maximized.
The bound is calculated without pivoting using techniques
of sensitivity analysis (Garfinkel and Nemhauser 1972).
Because the cost of calculating the maximum penalty can be
large if S is large, you may want to limit the number of
variables in S for which the penalty is calculated.
The penalty is calculated for PENALTYDEPTH= variables in S .
- PRICE
- chooses as branching variable
xi such that , xi is nonintegral,
and variable xi has the minimum price coefficient (maximum for maximization).
- PSEUDOC
- chooses as branching variable the variable
xi such that i maximizes
and
IEPSILON
IEPSILON
}
The weights uj and dj are initially equal to
the absolute value of the jth objective coefficient and are
updated whenever an integer feasible solution is encountered.
See the discussion on the CANSELECT= option for details on the
method of updating the weights.
Often a good heuristic for
searching the branch and bound tree of a problem can be found.
You are
tempted to continue using this heuristic when the problem data changes
but the problem structure remains constant.
The ability to reset procedure options interactively enables you
to experiment with search techniques
in an attempt to identify approaches that perform well.
Then you can easily reapply these techniques to subsequent problems.
For example, the PIP branch and bound strategy (Crowder,
Johnson, and Padberg 1983) describes one such heuristic.
The following program uses a similar strategy.
Here, the OBJ rule (choose the active node with
least parent objective function in the case of a minimization problem)
is used for selecting the next active
node to be solved until an integer feasible solution is found.
Once such a solution is found, the search procedure is changed
to the LIFO rule: choose the problem most recently placed in the
list of active nodes.
proc lp canselect=obj ifeasiblepause=1;
run;
reset canselect=lifo ifeasiblepause=9999999;
run;
Let us consider a minimization problem. At each integer
iteration, PROC LP will select a node to solve from
a pool of active nodes. The best bound strategy (CANSELECT=OBJ)
will pick the node with the smallest projected objective value.
This strategy improves the lower bound of the integer program
and usually takes less number of integer iterations.
One disadvantage is that it needs to recalculate the inverse of
the basis matrix almost every integer iteration,
which is relatively expensive.
Another disadvantage is this strategy does not pay attention to
improving the upper bound of the integer program.
Thus the number of active nodes tends to grow rapidly
if PROC LP cannot find an optimal integer solution soon.
LIFO strategy is very efficient and does not need to calculate
the inverse of the basis matrix unless the previous node is fathomed.
It is a depth first strategy so it tends to find an integer
feasible solution quickly. However, this strategy will pick nodes
locally and usually takes longer integer iterations
than the best bound strategy.
There is another strategy that most people have ignored.
Here we called it best upper bound strategy. With this
strategy, when we select an active node each time, instead of
picking the node with the smallest projected objective
value we select the one with the largest projected objective
value. This strategy is as efficient as the LIFO
strategy. Moreover, it selects active nodes globally.
This strategy tries to improve the upper bound of the integer
program by search for new integer
feasible solutions. It also fathomes active nodes quickly
and keeps the total number of active nodes below
current level. A disadvantage is that is may evaluate
more nodes that do not have any potential in
finding an optimal integer solution.
The best bound strategy has the advantage of improving the lower
bound. The LIFO strategy has the advantages of efficiency
and finding a local integer feasible solution. The best upper
bound strategy has the advantages of keeping the size of active
nodes under control and at the same time trying to identify any
potential integer feasible solution globally.
In general the best bound strategy is the prefer strategy by
people. In many situations, other strategies are doing better.
For example, if we have found an integer optimal solution but we do
not know that no solutions are better than it, we still have
to enumerate all possible active nodes. Then the three
strategies will take basically the same number of integer
iterations after an optimal solution is found but not identified.
Since LIFO and best upper bound strategies are very efficient
per integer iteration, both of them will outperform the
best bound strategy.
The CONTROL= option combines the above three strategies naturally
and provides a simple control parameter in [0, 1]
dealing with different
integer programming problems and different solution situation.
The AUTO option automatically sets and adjusts the CONTROL=
parameter so that we do not need to know any problem structure
or deciding a node selection strategy in advance.
Since the LIFO strategy is so cheap, we shall try to use it
as much as possible in the combinations.
We call the following process a
diving process. Starting from an active node,
we apply the LIFO strategy as much as we can until the
current node becomes feasible or is fathomed, or exceed a
preset limit. During this process, there is no inverse
matrix calculation involved except for the first node.
When the diving process is over, we shall apply one
of the above three strategies to select the next starting
node. We shall call the set of combinations a cycle.
The control parameter r controls the frequency of the three applying
strategies and the depth of the diving process. It starts with
a pure best bound strategy at r=0, then gradually increase
the frequency of the diving process and its depth in a
cycle as r increases. At r=0.5, the cycle contains a best
bound strategy plus a full diving process. After r=0.5,
the number of the diving processes will gradually increase
in a cycle. In addition, the best upper bound strategy will join the
cycle. As r continues to increase, the frequency of the best
upper bound strategy will increase. At r=1, it becomes
a pure best upper bound strategy.
The AUTO option will automatically adjust the value of
the CONTROL= option. At the start, it sets CONTROL=0.7, which
emphasizes on finding an upper bound. After an integer
feasible solution is found, it sets CONTROL=0.5, which
emphasizes on efficiency and lower bound improvement.
When the number of active nodes grows over
the default or user defined limit m, it is an indication that
a better upper bound is needed. The AUTO option will start
to increase the value of CONTROL= from 0.5. If the
size of active nodes continues to grow, so will be the value of
CONTROL= option. When the size of active nodes reaches to the
default or user defined crash point, CONTROL= will
be set to one. At this stage, the growth of active
nodes is stopped. When the size of active nodes
reduces, AUTO will decrease the value of CONTROL= option.
The list of active nodes can be saved in a SAS data set
for use at a subsequent invocation of PROC LP.
The ACTIVEOUT= option in the PROC LP statement
names the data set into which the
current list of active nodes is saved when the
procedure terminates due to an error termination condition.
Examples of such conditions are time limit exceeded,
integer iterations exceeded, and phase 3 iterations exceeded.
The ACTIVEIN= option in the PROC LP statement
names a data set that can be used
to initialize the list of active nodes.
To achieve the greatest benefit when restarting PROC LP,
use the PRIMALOUT= and PRIMALIN=
options in conjunction with the ACTIVEOUT=
and ACTIVEIN= options.
See Example 3.10 in the section "Examples" for an illustration.
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.