$title Demonstrate the use of numpy.linalg.lstsq on the diabetes test problem using linalg ols (LeastSquares,SEQ=141) $onText Use the tool linalg.ols to solve the least squares regression problem. This model demonstrates how to get the same statistical values as with the LS solver (dropped in GAMS 34). It also demonstrates how to use the feature in a loop with subsets of the feature set p. $offText Set i 'patients' p 'features'; Parameter A(i<,p<) 'features collected from a cohort of diabetes patients' y(i) 'quantitative measure of disease progression one year after baseline'; * Data from https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_diabetes.html $gdxIn diabetes_data.gdx $load A y $gdxIn $if not set EXPLICITINTERCEPT $set EXPLICITINTERCEPT 1 $ifThen %EXPLICITINTERCEPT%==1 $onMulti set p / b0 Intercept /; $offMulti A(i,'b0') = 1; $endIf Scalars rss 'sum of squared errors' sigma 'Standard error' r2 'R-Squared' df 'Degrees of freedom' resvar 'Residual variance'; Parameters estimate(p) 'Estimated coefficients' se(p) 'Standard errors' tval(p) 't values' pval(p) 'p values' resid(i) 'residuals' fitted(i) 'fitted values for dependent variable' covar(p,p) 'variance-covariance matrix'; $funcLibIn stolib stodclib Functions cdfT / stolib.CDFStudentT / icdfT / stolib.iCDFstudentT /; executeTool.checkErrorLevel 'linalg.ols i p A y estimate covar=covar r2=r2 df=df rss=rss resvar=resvar se=se tval=tval fitted=fitted resid=resid sigma=sigma'; * Symbols estimate, covar, r2, df, rss, resvar, se, tval, fitted, resid, and sigma) have been loaded * implicitly by executeTool../modlib/maxcut.338. The compiler instruction in the next line supresses * errors about presumably unassigned symbols $onImplicitAssign pval(p) = 2*(1-cdfT(abs(tval(p)),df)); Set alpha / "90%", "95%", "97.5%", "99%" /; Parameter av(alpha) / "90%" .9 "95%" .95 "97.5%" .975 "99%" .99 /; parameter confint(alpha,p,*) 'confidence intervals' scalar q, t; loop(alpha, t = -icdfT((1-av(alpha))/2, df); confint(alpha, p, 'LO') = estimate(p) - t*se(p); confint(alpha, p, 'UP') = estimate(p) + t*se(p); ); execute_unload 'ls_np.gdx',confint,covar,df,estimate,fitted,pval,r2,resid,resvar,rss,se,sigma,tval; execute.checkErrorLevel 'gdxdiff ls.gdx ls_np.gdx eps=1e-4 releps=1e-4 > %system.nullFile%' * Now demonstrate dynamic use of OLS with subsets of features Set pp(p) 'subset of features' iter 'iterations' / iter1*iter20 / bestp(p) 'best subset of features' Parameter Ap(i,p) 'reduced A' bestrss / inf / beste(p) 'best estimate'; loop(iter, option clear=pp, clear=Ap; pp(p)$(uniform(0,1)<0.5) = yes; $if %EXPLICITINTERCEPT%==1 pp('b0') = yes; Ap(i,pp) = A(i,pp); executeTool.checkErrorLevel 'linalg.ols i pp Ap y estimate rss=rss'; put_utility 'log' / iter.tl:10 ': rss=' rss; if (bestrss>rss, bestp(p) = pp(p); bestrss = rss; beste(p) = estimate(p)); ); display bestrss, bestp, beste;