Parallel MIP Optimizer

The CPLEX Parallel MIP Optimizer is quite robust with respect to parallelism, so it achieves remarkable speedups on a wide variety of models-particularly difficult ones that process a large number of nodes in the branch & cut search tree while proving optimality. The parallel MIP optimizer provides several different opportunities for applying multiple processors to the solution of a problem.

The following sections discuss details and tradeoffs associated with these options.

Platform Considerations

The parallel MIP optimizer is available on all the parallel platforms that ILOG CPLEX supports.

Memory Considerations and the Parallel MIP Optimizer

Before the parallel MIP optimizer invokes parallel processing, it makes separate, internal copies of the initial problem. The individual processors use these copies during computation, so each of them requires an amount of memory roughly equal to the original model after it is presolved.

Output from the Parallel MIP Optimizer

The parallel MIP optimizer generates slightly different output from the serial MIP optimizer (described in Termination and Post-Solution Information in a MIP). The following paragraphs explain those differences.

Timing Statistics from the Parallel MIP Optimizer

We explained that you can control the amount of information that ILOG CPLEX displays and records in its log files.

To make ILOG CPLEX record elapsed time for the MIP optimizer:

In the parallel MIP optimizer, these elapsed times are always wall-clock times, regardless of the clock-type parameter.

ILOG CPLEX prints a summary of timing statistics specific to the parallel MIP optimizer at the end of optimization. You can see typical timing statistics in the following sample run.

Problem 'fixnet6.mps' read.

Read time = 0.04 sec.

CPLEX> o

Tried aggregator 1 time.

MIP Presolve modified 308 coefficients.

Aggregator did 1 substitutions.

Reduced MIP has 477 rows, 877 columns, and 1754 nonzeros.

Presolve time = 0.02 sec.

Clique table members: 2

MIP emphasis: optimality

Root relaxation solution time = 0.04 sec.

Nodes Cuts/

Node Left Objective IInf Best Integer Best Node ItCnt Gap

0 0 3192.0420 12 3192.0420 305

3263.9220 19 Cuts: 36 341

3393.0917 17 Cuts: 24 403

3444.9996 19 Flowcuts: 9 439

3479.7206 24 Flowcuts: 6 470

3489.7893 21 Flowcuts: 3 482

3500.4789 24 Flowcuts: 4 494

3502.0646 26 Flowcuts: 4 499

3526.8260 20 Flowcuts: 2 502

3527.0669 19 Flowcuts: 1 504

3527.2559 22 Flowcuts: 1 506

3527.6402 24 Flowcuts: 1 508

3529.7853 18 Flowcuts: 1 515

* 0+ 0 4116.0000 0 4116.0000 3529.7853 515 14.24%

* 40+ 24 4077.0000 0 4077.0000 3808.7017 1191 6.58%

* 46 23 3983.0000 0 3983.0000 3854.2312 1198 3.23%

Sequential (before b&b):

CPU time = 0.79

Parallel b&b, 4 threads:

Real time = 0.48

Critical time (total) = 0.00

Spin time (average) = 0.00

-------

Total (sequential+parallel) = 1.27 sec.

Cover cuts applied: 1

Flow cuts applied: 43

Gomory fractional cuts applied: 8

Integer optimal solution: Objective = 3.9830000000e+03

Solution time = 1.65 sec. Iterations = 1415 Nodes = 74

The summary at the end of the sample tells us that 0.79 seconds were spent in CPU time (since the clock-type parameter was the default) in the sequential phase, mainly in preprocessing by the presolver and in solving the initial linear-programming relaxation. The parallel part of this sample run took 0.48 seconds of real time (that is, elapsed time for that phase).

Other parts of the sample report indicate that the processors spent an average of 0.00 seconds of real time spinning (that is, waiting for work while there were too few active nodes available). The real critical time was a total of 0.00 seconds, time spent by individual processors in updating global, shared information. Since only one processor can access the critical region at any instant in time, the amount of time spent in this region really is crucial: any other processor that tries to access this region must wait, thus sitting idle, and this idle time is counted separately from the spin time.

Logs from the Parallel MIP Optimizer

There is also a difference in the way logging occurs in the parallel MIP optimizer. When this optimizer is called, it makes a number of copies of the problem. These copies are known as clones. The parallel MIP optimizer creates as many clones as there are threads available to it. When the optimizer exits, these clones and all their paraphernalia are discarded.

If a log file is active when the clones are created, then ILOG CPLEX creates a clone log file for each clone. The clone log files are named cloneK.log, where K is the index of the clone, ranging from 0 (zero) to the number of threads minus one. Since the clones are created at each call to the parallel MIP optimizer and discarded when it exits, the clone logs are opened at each call and closed at each exit. (The clone log files are not removed when the clones themselves are discarded.)

The clone logs contain information normally recorded in the ordinary log file (by default, cplex.log) but inconvenient to send through the normal log channel. The information likely to be of most interest to you are special messages, such as error messages, that result from calls to the LP optimizers called for the subproblems.

Nested Parallelism

If a ILOG CPLEX parallel LP optimizer (for example, parallel simplex or parallel barrier) is available along with the parallel MIP optimizer, then you have the option to choose the strategy for solving the MIP model in parallel in different ways:

If you want to make the branching parallel, then you should set the thread-limit parameter for the subproblem optimizer to 1 (one). For example:

If you want all the parallelism to occur in solutions of the subproblems, not at the branching level, then you should set the MIP thread-limit parameter to 1 (one). For example:

If you want the strong branching variable selection to occur in parallel, you should set the MIP thread-limit parameter to 1 (one) and the strong branching thread-limit parameter to a value greater than one. For example:

On systems that support nested parallelism, you can make both the branch & cut and the subproblem solutions work in parallel by setting the thread-limit parameters for both optimizers (MIP for the branch & cut, barrier or simplex for the subproblems) to values greater than one. For example, on a six-processor system, If the MIP thread-limit were set to 3, and the barrier thread-limit were set to 2, then three subproblems could be solved simultaneously, each using two processors.
Nested Parallelism Platform Considerations

Nested parallelism is not supported on DEC, HP, nor SGI parallel platforms. Consequently, on these platforms, you cannot call parallel optimizers from within other parallel optimizers. In particular, you cannot call the ILOG CPLEX Parallel Simplex or ILOG CPLEX Parallel Barrier Optimizer from within the ILOG CPLEX Parallel MIP Optimizer.

For example, if you invoke the MIP optimizer and you have set the MIP thread-limit parameter to use more than one thread, when you call the simplex optimizer on MIP subproblems, it will use the serial, sequential algorithm. To use the parallel simplex optimizer or the parallel barrier optimizer on MIP subproblems, you must set the MIP thread-limit to 1 (one). That is, you must make it serial, sequential (not parallel).

MIP First Rule and its Exceptions

Since the parallelism available in the parallel MIP optimizer is normally applicable to a very large class of problems, and since it normally benefits them greatly, it will more often be best to use the parallel MIP optimizer, rather than parallel simplex or parallel barrier optimizers on the subproblems. We call this rule of thumb, "MIP first."

One exception to this MIP-first rule occurs in problems with an extremely large aspect ratio (that is, the number of columns divided by number of rows) where the memory requirements of the parallel MIP optimizer exceed available resources.

Another exception, again for problems with a large aspect ratio, occurs for any problem that generates relatively few active nodes during a MIP optimization.

Finally, large aspect problems that spend a large fraction of the total solution time searching for the first incumbent may be good candidates for the parallel simplex optimizer (rather than MIP first).


Previous Page: Using Parallel Optimizers in the CPLEX Component Libraries  Return to Top Next Page: Parallel Barrier Optimizer