$title Closest String Problem (CSP,SEQ=306) $onText The closest-string problem (CSP) finds a center string that minimizes the Hamming distance between the center string and all other strings. The Hamming distance counts the nonmatching characters. For example, 'CATCC' and 'CTTGC' have a Hamming distance of 2. Three formulations are presented. Formulations 1 and 2 can only be used for small problems. Formulation 3 is the most intuitive formulation and works well with general purpose MIP codes. It should be noted that the root node heuristic of CPLEX performs very well. With an absolute gap of one or two, CPLEX finds solution within a few seconds for all suggested sizes. For example, the setting below will result in solutions that are either within 1.0 from he global optimum or less than 1 percent: option optCr = 0.01, optCa = 1.99; Meneses, C N, Lu, Z, Oliveira, C A S, and Pardalos, P M, Optimal Solutions for the Closest-String Problem via Integer Programming. INFORMS Journal on Computing 16, 4 (2004), 419-429. Keywords: mixed integer linear programming, closest-string problem, computational biology $offText $eolCom // Set n 'strings' m 'character sequence' a 'alphabet'; Parameter x(n,m) 'string values'; $if not set letters $set letters 26 $if not set strings $set strings 4 $if not set length $set length 6 Set n / s1*s%strings% / m / c1*c%length% / a / a1*a%letters% /; * sample data from paper Table x(n,m) c1 c2 c3 c4 c5 c6 s1 4 9 6 6 5 18 // differ s2 13 5 4 9 1 14 // median s3 12 5 14 7 20 8 // length s4 13 5 4 9 21 13 // medium; * recognize sample data $if not %strings%+%length%+%letters% == 4+6+26 x(n,m) = uniformInt(1,card(a)); if(card(n)*card(m) > 50, option limCol = 0, limRow = 0, solPrint = off, resLim = 10, optCr = 0.01, optCa = 1.999; else option resLim = 5, optCr = 0.0, optCa = 0.999; ); * Formulation P1 Variable d 'maximum difference between t and x' t(m) 'reference string' z(n,m) 'string is different'; Binary Variable z; Equation e1(n) 'lower bound for d' e2(n,m) 'lower bound on difference' e3(n,m) 'upper bound on difference'; e1(n).. sum(m, z(n,m)) =l= d; * x <> t e2(n,m).. t(m) =l= t.up(m)*z(n,m) + x(n,m)*(1 - z(n,m)); e3(n,m).. t(m) =g= t.lo(m)*z(n,m) + x(n,m)*(1 - z(n,m)); t.lo(m) = smin(n, x(n,m)); t.up(m) = smax(n, x(n,m)); Model p1A / e1, e2, e3 /; Parameter report 'summary report'; report(m,'t.lo') = t.lo(m); report(m,'t.up') = t.up(m); solve p1A min d using mip; report(m,'p1A') = t.l(m); report('objective','p1A') = p1a.objVal; report('Est Global','p1A') = ceil(p1a.objEst - 1e-6); * Formulation P2 Set ma(m,a) 'possible characters by position'; ma(m,a) = sum(n, ord(a) = x(n,m)); * display ma; Binary Variable v(m,a) 'selection of characters'; Equation e4(m) 'select only one' e5(m) 'assign character value to t'; e4(m).. sum(ma(m,a), v(ma)) =e= 1; e5(m).. t(m) =e= sum(ma(m,a), ord(a)*v(ma)); Model p2 / e1, e2, e3, e4, e5 /; solve p2 min d using mip; report(m,'p2') = t.l(m); report('objective','p2') = p2.objval; report('Est Global','p2') = ceil(p2.objEst - 1e-6); * Formulation P3 Equation e6(n) 'count matching characters'; e6(n).. card(m) - sum(ma(m,a),(x(n,m) = ord(a))*v(ma)) =l= d; Model p3 / e4, e6 /; solve p3 min d using mip; t.l(m) = sum(ma(m,a), ord(a)*v.l(ma)); report(m,'p3') = t.l(m); report('objective','p3') = p3.objVal; report('Est Global','p3') = ceil(p3.objEst - 1e-6); display report;