An Optimizer in Kafka's Castle

2013-01-30

Analyzing the search space

We are concerned today with extending the simple matching problem. The first extension is to allow representative of each row to match to more than one representative of the columns. A typical example of this is that the rows represent instructors, the columns represent courses and each instructors teaches more than one course.

Do our data will have the form of a cost matrix representing the value of assigning a row to a column, and a load vector to indicate how many columns each row should be assigned.

For instance:


K	1	2	3	4	5
3	10	12	13	11	14
2	15	13	14	12	16

In general we will have a set \(S\), a set \(T\), a cost matrix \(C_{ij}\), \(i\in S\) and \(j \in T\) a load vector \(K_i\), \(i\in S\). The data in the load vector must satisfy \(\sum_i K_i = |T|\) to ensure that each column is assigned.

We will contrast two formulations, one based on binary variables (the traditional Integer Programming model) and one based on integer variables using the AllDifferent constraint.

The IP type model is based on \begin{equation} \min \sum_i \sum_j c_{ij}x_{ij} \end{equation} Subject to: \begin{equation} \sum_j x_{ij} = k_i \quad \forall i \in S \end{equation} \begin{equation} \sum_i x_{ij} = 1 \quad \forall j \in T \end{equation} This is a simple extension of the previous problem but leads to interesting difference in behaviour. Here are the two models I want to contrast, written in python using Google's or-tools. First the IP-type model using a real IP solver:

def realipsolve(C,K):
    m = len(C)
    n = len(C[0])
    #For IP-model with IP-solver
    from linear_solver import pywraplp
    solver = pywraplp.Solver('CoinsGridCLP',
                             pywraplp.Solver.CBC_MIXED_INTEGER_PROGRAMMING)
    #solver = pywrapcp.Solver('IPModel')
    x = [[solver.IntVar(0,1, 'x[%i,%i]' % (i,j)) for j in range(n)] for i in range(m)]    
    for i in range(m):
        solver.Add(solver.Sum([x[i][j] for j in range(n)]) == K[i])
    for j in range(n):
        solver.Add(solver.Sum([x[i][j] for i in range(m)]) == 1)
    value = solver.IntVar(0,sum([C[i][j] for i in range(m) for j in range(n)]),'value')
    solver.Add(value == solver.Sum([x[i][j]*C[i][j] for i in range(m) for j in range(n)]))
    solver.Minimize(value)
    solver.Solve()
    best = round(value.SolutionValue())
    solx = [(i,j,C[i][j]) for i in range(m) for j in range(n) if x[i][j].SolutionValue()>0]
    return best,solx

Now, essentially the same model, but solved using the CP solver:

def ipsolve(C,K):
    m = len(C)
    n = len(C[0])
    from constraint_solver import pywrapcp
    solver = pywrapcp.Solver('IPModel')
    x = [[solver.IntVar(0,1, 'x[%i,%i]' % (i,j)) for i in range(n)] for j in range(m)]
    for i in range(m):
        solver.Add(solver.Sum([x[i][j] for j in range(n)]) == K[i])
    for j in range(n):
        solver.Add(solver.Sum([x[i][j] for i in range(m)]) == 1)
    value = solver.IntVar(0,sum([C[i][j] for i in range(m) for j in range(n)]),'value')
    solver.Add(value == solver.Sum([x[i][j]*C[i][j] for i in range(m) for j in range(n)]))
    objective = solver.Minimize(value,1)
    x_flat = [x[i][j] for i in range(m) for j in range(n)]
    db = solver.Phase(x_flat,
                      solver.CHOOSE_MIN_SIZE_LOWEST_MAX,
                      solver.ASSIGN_MIN_VALUE)
    solution = solver.Assignment()
    solution.Add(x_flat)
    solution.AddObjective(value)
    solver.NewSearch(db, [objective])
    while solver.NextSolution():
        best = int(value.Value())
        solx = [(i,j,C[i][j]) for i in range(m) for j in range(n) if x[i][j].Value()>0]
    return best,solx

Finally the real CP model. The one I expect to shine (eventually):

def cpsolve(C,K):
    from constraint_solver import pywrapcp
    m = len(C)
    n = len(C[0])
    solver = pywrapcp.Solver('CPModel')
    x = [[solver.IntVar(0,n-1,'x[%i]' % i) for j in range(K[i])] for i in range(m)]
    value = solver.IntVar(0,sum([C[i][j] for i in range(m) for j in range(n)]),'value')
    solver.Add(value == solver.Sum([solver.Element(C[i],x[i][j]) for i in range(m) for j in range(K[i])]))
    objective = solver.Minimize(value,1)
    xflat = [x[i][j] for i in range(m) for j in range(K[i])]
    solver.Add(solver.AllDifferent(xflat))
    db = solver.Phase(xflat,
                      solver.INT_VAR_SIMPLE,
                      solver.ASSIGN_MIN_VALUE)
    solver.NewSearch(db, [objective])
    while solver.NextSolution():
        best = int(value.Value())
        solx = [(i,j,int(x[i][j].Value())) for i in range(m) for j in range(K[i])]
    return best,solx

For a first experiment, I will run the three models on random matrices of size \(m \times 3m\) with increasing \(m\). The results follow:


m	n	IP model	IP Model	CP Model
		IP Solver	CP Solver	CP Solver
3	9	0.01	0.01	0.13
4	12	0.01	0.02	103
5	15	0.01	0.10	54000

I expected the CP model to do much better than this. What is going on here? Let us look at the search space in more detail. First, with an example. Say \(m=2, n=4, K = [2,2]\) and the search picks the first variable available with the first value possible. This leads to the following space in the case of the IP model

Search space of IP model

Each node is labelled with the variable that is considered next in the search space; the edge indicate the value taken.

In general, given \(m\) rows, \(n\) columns and load given by \(k_i\), the number of leaves is given by:

\begin{displaymath} {n \choose k_1} {{n-k_1} \choose k_2} {{n-k_1-k_2} \choose k_3} \cdots {k_m \choose k_m} = \frac{n!}{k_1! k_2! \cdots k_m!} = \frac{n!}{\prod_i k_i!} \end{displaymath}

Let's contrast with the search space of the CP model.

Search space of CP model

In general, the number of leaves can be counted as: \begin{displaymath} n^{\underline{k_1}} (n-k_1)^{\underline{k_2}} (n-k_1-k_2)^{\underline{k_3}} \cdots (k_m)^{\underline{k_m}} = n! \end{displaymath} Where the notation \(n^{\underline{k}}\) is the falling factorial: \begin{equation} n^{\underline{k}} = n (n-1)(n-2)\cdots (n-k+1) \end{equation} Now the reason the CP model is so slow is apparent: there are many more leaves to the search tree. Fortunately, the solution is also apparent. If we consider the leaves, we see repetitions of a certain kind: the solution \(x_1=[1,2]\) is not, for our purposes, different from the solution \(x_1 = [2,1]\). In both cases it means that row one is assigned to columns 1 and 2. We need to modify the model to produce only one version of these identical solutions. Let us choose a canonical representation, where the values are sorted in increases order. To enforce this we need to add constraints of the form \begin{displaymath} x_{ij} < x_{i{j+1}} \quad \forall i \in S \end{displaymath} In our python code, this becomes:

# Symmetry breaking
for i in range(m):
    for j in range(K[i]-1):
        solver.Add(x[i][j] < x[i][j+1])

What is the effect of these additional constraints on the search space? It cuts off a number of branches. For instance, at the first level of the tree, it cuts off the whole subtree starting on \(x_{11}=4\) since the domain of \(x_{12}\) would be empty. The resulting search space is a tree with exactly six leaves:

Notice that the tree is not symmetric. It has the same number of leaves as the IP model, but fewer internal nodes, which explains why it will solver faster. Indeed here is a sample of experiments:


m	n	IP model	IP Model	CP Model
		IP Solver	CP Solver	CP Solver
3	9	0.01	0.01	0.00
4	12	0.01	0.02	0.01
5	15	0.01	0.42	0.10
6	18	0.01	10.74	2.06
7	21	0.01	36.75	24.43
8	24	0.01	5325.91	1465.26

Now, the central core of the problem we want to solve is well modelled and solves in a reasonable time, though still much slower than what an IP solver can do. But we are nowhere done. We will next investigate how to guide the search more effectively.

Date: 2013-01-30 09:43:58 EST

Author: Serge Kruk

Org version 7.8.08 with Emacs version 24

Validate XHTML 1.0

2013-01-10

A good IP model may not be a good CP model

I have been experimenting with Google's or-tools, in the form of the python interface (http://code.google.com/p/or-tools/). Much of what I know about modeling with that solver comes from the great blog of Håkan Kjellerstrand (http://www.hakank.org/constraint_programming_blog/)

In the process of converting a timetabling model I created years ago from AMPL (http://www.ampl.com/) to Google or-tools, I had a hunch that a word-for-word translation would not be optimal. The model I was working on is fairly complex but, at its core, it hides an assignment problem:

Given a set \(S\) and a set \(T\) and a cost matrix \(C_{ij}\), \(i\in S\) and \(j \in T\) find the assignment of every element in \(S\) to one element in \(T\) minimizing the total cost. The paradigm is workers and tasks where each task is done by each worker at a certain cost (the cost could represent money, duration, quality, etc… any number of possible objectives).

The classical Integer Programming (IP) Model for the simplest version of this problem uses a binary variable \(x_{ij}, i\in S, j\in T\) where \(x_{ij}=1\) if and only if we assign element \(i\) to element \(j\). This leads to \begin{equation} \min \sum_i \sum_j c_{ij}x_{ij} \end{equation} Subject to: \begin{equation} \sum_j x_{ij} = 1 \quad \forall i \in S \end{equation} \begin{equation} \sum_i x_{ij} = 1 \quad \forall j \in T \end{equation} There are a multitude of variations, some of which I will consider later, with larger right-hand-side and inequalities but this case illustrates perfectly what I want to highlight.

The same model, in AMPL:

set S;
set T;
param cost {S, T} >= 0, default 1;
var x {S, T} >= 0, <= 1;
minimize v:
   sum {i in S, j in T} c[i,j] * x[i,j];

subject to Row_limit {i in S}:  
   sum {j in T} x[i,j] = 1;
subject to Column_limit {j in T}:  
   sum {i in S} x[i,j] = 1;

As a first step, I wrote this in the Python wrapper to Google or-tools, calling an IP solver, IBM's CBC, (https://projects.coin-or.org/Cbc) to do the real work. For the sake of the comparison let us call this the IP Model solved with a real IP solver.

def realipsolve(C):
  from linear_solver import pywraplp
  solver = pywraplp.Solver('CoinsGridCLP',
                           pywraplp.Solver.CBC_MIXED_INTEGER_PROGRAMMING)
  n = len(C)
  x=[[solver.IntVar(0,1,'x[%i,%i]'%(i,j)) for i in range(n)] for j in range(n)]    
  for i in range(n):
      solver.Add(solver.Sum([x[i][j] for j in range(n)]) == 1)
  for j in range(n):
      solver.Add(solver.Sum([x[i][j] for i in range(n)]) == 1)
  v=solver.IntVar(0,sum([C[i][j] for i in range(n) for j in range(n)]),'v')
  solver.Add(v==solver.Sum([x[i][j]*C[i][j] for i in range(n) for j in range(n)]))
  solver.Minimize(v)
  solver.Solve()

If we keep the same model but change the solver to use Google's CP solver, the only change is in the first two lines

from constraint_solver import pywrapcp
solver = pywrapcp.Solver('IPModel')

Let us call this the IP Model solved by a CP solver (sort of a fake IP).

What is arguably wrong with this model is the large number of binary variables. We know that a CP solver uses, deep down, a backtracking algorithm with domain reduction via propagation. (I have not looked at the source of this particular CP solver (I tend to have profoundly allergic reactions to reading C++ code) but I assume it follows this approach.) In such a case, pruning is important, but reducing the number of variables is often more important, and much more easily achieved from the modeler's perspective.

In this simple problem, reducing the number of variables is trivial. Instead of two-dimensional binary variables, we can use a one-dimensional assignment variable of size \(|S|\) with domain in \(T\). With the alldifferent and the element constraints, we obtain this combination CP Model with CP solver:

def cpsolve(C):
  from constraint_solver import pywrapcp
  n = len(C)
  solver = pywrapcp.Solver('CPModel')
  x = [solver.IntVar(0,n-1,'x[%i]' % i) for i in range(n)]
  solver.Add(solver.AllDifferent(x))
  v=solver.IntVar(0,sum([C[i][j] for i in range(n) for j in range(n)]),'v')
  xv = [solver.IntVar(1,100,'xv[%i]' % i) for i in range(n)]
  for i in range(n):
      solver.Add(xv[i] == solver.Element(C[i],x[i]))
  solver.Add(v == solver.Sum([xv[i] for i in range(n)]))
  objective = solver.Minimize(value,1)
  db = solver.Phase(x,
                    solver.INT_VAR_SIMPLE,
                    solver.ASSIGN_MIN_VALUE)

The variable xv is not essential; we could combine the element and the sum into one constraint.

The reduction in the number of variables is notable, from \(|S| \times |T|\) to \(|S|\).

Now, how does it compare in running time? I experimented with increasing sizes, with random cost matrices. Here is one such experiment:


n	IP Model	IP Model	CP Model
	IP solver	CP solver	CP Solver
10	0.01	0.07	0.00
11	0.01	0.32	0.01
12	0.01	0.60	0.01
13	0.01	1.98	0.02
14	0.01	5.83	0.02
15	0.02	35.16	0.06
16	0.02	47.35	0.07
17	0.02	133.21	0.14
18	0.02	434.83	0.33
19	0.03	1882.80	0.28
TOT	0.16	2542.15	0.93

Two obvious conclusions: First, comparing the last two columns, both using Google's CP solver, we notice that reducing the number of variables has, as expected, a huge impact on the running time of the solver. The literal translation of the AMPL IP model, with its binary variable is decidedly the wrong thing to do. My original hunch was correct.

Less encouraging is the comparison between the first and last column. It seems that even with a good CP model, the CP solver is no match for an IP solver for this simple problem. We can verify this by running both side-by-side on larger instances. Here is one such run:


n	IP Model	CP Model
	IP solver	CP Solver
10	0.01	0.01
11	0.01	0.01
12	0.01	0.01
13	0.01	0.02
14	0.01	0.04
15	0.02	0.06
16	0.02	0.07
17	0.02	0.12
18	0.02	0.32
19	0.03	0.29
20	0.03	0.65
21	0.03	1.18
22	0.04	2.59
23	0.04	3.10
24	0.04	9.90
25	0.05	9.01
26	0.04	15.39
27	0.05	73.33
28	0.05	53.48
29	0.05	57.74
30	0.06	286.55
31	0.06	276.21
32	0.06	412.36
33	0.07	537.88
34	0.07	324.70
35	0.08	2004.02
36	0.08	6390.71

Caveat: I did not try to play with the search logic, choosing variables and values more intelligently. Or, rather, I did try to play a little, but did not see a noticeable difference in runtime.

Is this surprising? Maybe, maybe not. This problem is really trivial, from an IP perspective. It is a bipartite matching of minimum cost. If the IP solver recognizes this structure, it may apply the Network Simplex method and solve gigantic instances in the blink of an eye. The problem, in a sense, is too pure, too simple, and IP solvers have had over fifty years to tune their solution techniques to such problems. Moreover, the problems I am truly interested in solving, even if they have an assignment problem at their core, have a slew of side-constraints in addition.

What to do now? I could play with the search logic. That is, choosing the next variable and the next value with more care. More interesting to me is to add more and more side-constraints to see at what point, if any, will the CP solver dominate. My hunch is that, after a number of additional constraints, the CP solver will dominate. More later …

Date: 2013-01-10 16:00:04 EST