$title Herves (Transposable Element) Activity Calculations (HERVES,SEQ=305) $onText This program takes raw data from an Excel sheet, transforms the data and finds the optimal values for alpha and beta following a procedure outlined by Stephen Wright. An integral is computed numerically using a trapezoidal approximations. The optimization results are then written back into the Excel file. This procedure was originally coded by David O'Brochta in Mathematica and recast in a GAMS optimization framework by Wilhelmine Meeraus. The Excel file contains the data from the field and labwork described in W Meeraus, 2004. References Stephen I Wright, Quang Hien Le, Daniel J Schoen and Thomas E Bureau, Population Dynamics of an Ac-like Transposable Element in Self- and Cross-Pollinating Arabidopsis, Genetics 158: 127-1288 (July 2001). David O'Brochta, Private Communications, University of Maryland Biotechnology Institute, 2004. Wilhelmine H Meeraus, A Combined Fieldwork and Laboratory Investigation into the Behaviour of the Herves Transposable Element, London School of Hygiene & Tropical Medicine, Theses, 2004. Keywords: nonlinear programming, discontinuous derivatives, genetics, transposable element activity, trapezoidal approximation $offText $eolCom // * you can use command line --xlsxFile=xxx and --sheet=xxx which * will allow further automation * the sheet names are Aa,Ag,Am,ZU,ZAg $if not set xlsxFile $set xlsxFile 'hervesio.xlsx' // work book name $if not set sheet $set sheet 'Ag' // sheet name $set rng '%sheet%!A4' // starting range to read data $set rngout '%sheet%_Opt!A1' // starting range to store results * 1. Extract raw data from an Excel sheet into a GDX file * and load into GAMS. The result of this step are * the set of samples i and the counts cnt. Empty or all * zero rows and columns will be dropped automatically from * the Excel data. Set imax 'max sample size' / 1*50 / i(imax) 'samples'; Parameter cnt(imax) 'counts'; Alias (*,code,LabId,Cut); Parameter raw(code,LabId,Cut) 'raw observations'; Set rr(code,LabID) 'row labels including empty rows'; * empty rows will be reported $onEmbeddedCode Connect: - ExcelReader: file: %xlsxFile% symbols: - name: raw rowDimension: 2 columnDimension: 1 range: %rng% - Projection: name: raw(code,LabId,Cut) newName: rr(code,LabId) asSet: True - GAMSWriter: symbols: all duplicateRecords: first $offEmbeddedCode Set rid(code,labID) 'row identifier' cid(Cut) 'column identifier'; Parameter ct 'column totals'; option rid < raw, cid < raw, rid:0:0:1; // map domains and set print option abort$(card(rid) > card(imax)) 'size of imax is to small', imax, rid; ct(cid) = sum(rid, raw(rid,cid)); display raw, ct; Set rrdiff 'all zero rows'; rrdiff(rr) = yes - rid(rr); display rrdiff; i(imax) = ord(imax) <= card(rid); cnt(imax) = sum(cid, ct(cid) = ord(imax)); display i, cnt; * 2. Prepare model constants and set up numerical integration * data sets. Parameter diploid 'sample size' nhat bin(imax) 'binomial for diploid'; diploid = card(i); bin(i(imax)) = fact(card(i))/fact(card(i) - ord(imax))/fact(ord(imax)); * numerical integration data Set s 'integral discritization steps' / s0*s1000 / ss(s) 'excluding first element'; Parameter xx(s) 'values for steps s (0-1)' ws(s) 'weight for trapezoid approximation' deltaxx; ss(s) = ord(s) > 1; // drop values for x = 0 deltaxx = 1/(card(s) - 1); xx(s) = (ord(s) - 1)*deltaxx; ws(s) = 1; ws(s)$(ord(s) = 1) = 0.5; ws(s)$(ord(s) = card(s)) = 0.5; Parameter t0(s) 'intermediate value one' t1(imax,s) 'intermediate value two'; t0(s) = sqr(xx(s))+ 2*xx(s)*(1 - xx(s)); t1(i(imax),s) = ws(s)*power(t0(s),ord(imax))*power(1 - t0(s),card(i) - ord(imax)); * 3. Define optimization model and set relevant bounds. * Note that not all parameters are known at this point. Variable chi 'sum of CHI squares' a 'alpha values' b 'beta values' est(imax) 'estimates'; Equation defchi 'definition of CHI' defest(imax) 'definition of estimates'; defchi.. chi =e= sum(i, sqr(cnt(i) - est(i))/est(i)); defest(i).. est(i) =e= nhat*(a + b)/a/beta(a,b)*bin(i)*deltaxx * sum(s, t1(i,s)*xx(s)**(a - 1)*(1 - xx(s))**(b - 1)); Model m / all /; * set bounds and initial values a.lo = 0.01; a.up = 1.0; b.lo = 1; b.up = 10; a.l = .1; b.l = 5; est.l(i) = 1; option domLim = 1000, limCol = 0, limRow = 0, solPrint = on; Parameter rep(*,*) 'summary report'; * 4. First Optimization nhat = 4/3*sum(i(imax), ord(imax)*cnt(i))/card(i); solve m minimizing chi using dnlp; display diploid, nhat, a.l, b.l, chi.l; rep(i ,'w-obs') = cnt(i); rep(i ,'w-est') = est.l(i); rep(i ,'w-chi') = sqr(cnt(i) - est.l(i))/est.l(i); rep('total','w-chi') = sum(i, rep(i,'w-chi')); rep('total','w-obs') = sum(i, cnt(i)); rep('alpha','w-est') = a.l; rep('beta' ,'w-est') = b.l; rep('nhat' ,'w-est') = nhat; rep('SStat' ,'w-est') = m.solveStat; rep('MStat' ,'w-est') = m.modelStat; * 5. Second solve with modified counts // use a cutoff of 70% and solve again cnt(i(imax))$(ord(imax) > 0.7*card(i)) = 0; nhat = 4/3*sum(i(imax), ord(imax)*cnt(i))/card(i); display nhat; solve m min chi using dnlp; display diploid, nhat, a.l, b.l, chi.l; rep(i ,'wo-obs') = cnt(i); rep(i ,'wo-est') = est.l(i); rep(i ,'wo-chi') = sqr(cnt(i) - est.l(i))/est.l(i); rep('total','wo-obs') = sum(i, cnt(i)); rep('total','wo-chi') = sum(i, rep(i,'wo-chi')); rep('alpha','wo-est') = a.l; rep('beta' ,'wo-est') = b.l; rep('nhat' ,'wo-est') = nhat; rep('SStat' ,'wo-est') = m.solveStat; rep('MStat' ,'wo-est') = m.modelStat; display rep; * 6. Update Excel workbook with results in a new sheet embeddedCode Connect: - GAMSReader: symbols: - name: rep - ExcelWriter: file: %xlsxFile% symbols: - name: rep range: %rngout% endEmbeddedCode