--- title: "Quick Tour of Package **optimCheck**" author: "Martin Lysy" date: "`r Sys.Date()`" output: html_vignette: toc: true bibliography: references.bib csl: taylor-and-francis-harvard-x.csl link-citations: true vignette: > %\VignetteIndexEntry{optimCheck: A Quick Tour} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, eval = FALSE, echo = FALSE} rmarkdown::render("optimCheck-quicktut.Rmd") # R code to render vignette ``` \newcommand{\xx}{\boldsymbol{x}} \newcommand{\hxx}{\hat{\boldsymbol{x}}} \newcommand{\txx}{\tilde{\boldsymbol{x}}} \newcommand{\bb}{\boldsymbol{b}} \renewcommand{\AA}{\boldsymbol{A}} ## Introduction The **optimCheck** package provides a set of tools to check that output of an optimization algorithm is indeed at a local mode of the objective function. The tools include both visual and numerical checks, the latter serving to automate formalized unit tests with e.g., the **R** packages [**testthat**](https://CRAN.R-project.org/package=testthat) or [**RUnit**](https://CRAN.R-project.org/package=RUnit). ## Example: Quadratic Objective Function A brief overview of the package functionality is illustrated with the following example. Let $$ Q(\xx) = \xx'\AA\xx - 2 \bb'\xx $$ denote a quadratic objective function in $\xx \in \mathbb R^d$. If $\AA_{d \times d}$ is a positive-definite matrix, then the unique minimum of $Q(\xx)$ is $\hxx = \AA^{-1}\bb$. Let us now ignore this information and try to minimize $Q(\xx)$ using **R**'s simplest built-in mode-finding routine, provided by the **R** function `stats::optim()`. In its simplest configuration, `stats::optim()` requires only the objective function and a starting value $\xx_0$ to initialize the mode-finding procedure. Let's consider a difficult setting for `stats::optim()`, with a relatively large $d = 12$ and a starting value $\xx_0$ which is far from the optimal value $\hxx$. ```{r, echo = -1} set.seed(2608) # set seed to fix output d <- 12 # dimension of optimization problem # create the objective function: Q(x) = x'Ax - 2b'x A <- crossprod(matrix(rnorm(d^2), d, d)) # positive definite matrix b <- rnorm(d) objfun <- function(x) crossprod(x, A %*% x)[1] - 2 * crossprod(b, x)[1] xhat <- solve(A, b) # analytic solution # numerical mode-finding using optim xfit <- optim(fn = objfun, # objective function par = xhat * 5, # initial value is far from the solution control = list(maxit = 1e5)) # very large max. number of iterations ``` ### Visual Checks with `optim_proj()` Like most solvers, `stats::optim()` utilizes various criteria to determine whether its algorithm has converged, which can be assess with the following command: ```{r} # any value other than 0 means optim failed to converge xfit$convergence ``` Here `stats::optim()` reports that its algorithm has converged. Now let's check this visually with **optimCheck** using *projection plots*. That is, let $\txx$ denote the potential optimum of $Q(\xx)$. Then for each $i = 1,\ldots,d$, we plot $$ Q_i(x_i) = Q(x_i, \txx_{-i}), \qquad \txx_{-i} = (\tilde x_1, \ldots, \tilde x_{i-1}, \tilde x_{i+1}, \ldots, \tilde x_d). $$ In other words, projection plot $i$ varies only $x_i$, while holding all other elements of $\xx$ fixed at the value of the potential solution $\txx$. These plots are produced with the **optimCheck** function `optim_proj()`: ```{r, fig.width = 10, fig.height = 6, out.width = "97%"} require(optimCheck) # load package # projection plots xnames <- parse(text = paste0("x[", 1:d, "]")) # variable names oproj <- optim_proj(fun = objfun, # objective function xsol = xfit$par, # potential solution maximize = FALSE, # indicates that a local minimum is sought xrng = .5, # range of projection plot: x_i +/- .5*|x_i| xnames = xnames) ``` In each of the projection plots, the potential solution $\tilde x_i$ is plotted in red. The `xrng` argument to `optim_proj()` specifies the plotting range. Among various ways of doing this, perhaps the simplest is a single scalar value indicating that each plot should span $x_i \pm$ `xrng` $\cdot |x_i|$. Thus we can see from these plots that `stats::optim()` was sometimes up to 10% away from the local mode of the projection plots. ### Quantification of Projection Plots Projection plots are a powerful method of assessing the convergence of mode-finding routines to a local mode. While great for interactive testing, plots are not well-suited to automated unit testing as part of an **R** package development process. To this end, **optimCheck** provides a means of quantifying the result of a call to `optim_proj()`. Indeed, a call to `optim_proj()` returns an object of class `optproj` with the following elements: ```{r} sapply(oproj, function(x) dim(as.matrix(x))) ``` As described in the function documentation, `xproj` and `yproj` are matrices of which each column contains the $x$-axis and $y$-axis coordinates of the points contained in each projection plot. The `summary()` method for `optproj` objects coverts these to absolute and relative errors in both the potential solution and the objective function. The `print()` method conveniently displays these results: ```{r} oproj # same print method as summary(oproj) ``` The documentation for `summary.optproj()` describes the various calculations it provides. Perhaps the most useful of these are the elementwise absolute and relative differences between the potential solution $\tilde{\xx}$ and $\hxx_\mathrm{proj}$, the vector of optimal 1D solutions in each projection plot. For convenience, these can be extracted with the `diff()` method: ```{r} diff(oproj) # equivalent to summary(oproj)$xdiff # here's exactly what these are xsol <- summary(oproj)$xsol # candidate solution xopt <- summary(oproj)$xopt # optimal solution in each projection plot xdiff <- cbind(abs = xopt-xsol, rel = (xopt-xsol)/abs(xsol)) range(xdiff - diff(oproj)) ``` Thus it is proposed that a combination of `summary()` and `diff()` methods for projection plots can be useful for constructing automated unit tests. In this case, plotting itself can be disabled by passing `optim_proj()` the argument `plot = FALSE`. See the `optimCheck/tests` folder for **testthat** examples featuring: - Logistic Regression (`stats::glm()` function). - Quantile Regression (`quantreg::rq()` function in [**quantreg**](https://CRAN.R-project.org/package=quantreg)) - *Multivariate normal mixtures* (`mclust::emEEE()` in [**mclust**](https://CRAN.R-project.org/package=mclust)). You can run these tests with the command ```{r eval = FALSE} testthat::test_package("optimCheck", reporter = "progress") ``` ## `optim_refit()`: A Numerical Alternative to Projection Plots There are some situations in which numerical quantification of projection plots leaves to be desired: Generating all projection plots requires `N = 2 * npts * length(xsol)` evaluations of the objective function (where the default value is `npts = 100`), which can belabor the process of automated unit testing. A different test for mode-finding routines is to recalculate the optimal solution with an "very good" starting point: the current potential solution. This is the so-called "**refi**ne op**t**izimation" -- or `refit` -- strategy. The `optim_refit()` function refines the optimization with a call to **R**'s built-in general-purpose optimizer: the function `stats::optim()`. In particular, it selects the default Nelder-Mead simplex method with a simplified parameter interface. As seen in the unit tests above, the `refit` checks are 2-3 times faster than their projection plot counterparts. Consider now the example of refining the original `stats::optim()` solution to the quadratic objective function: ```{r} orefit <- optim_refit(fun = objfun, # objective function xsol = xfit$par, # potential solution maximize = FALSE) # indicates that a local minimum is sought summary(orefit) # same print method as orefit ``` Thus we can see that the first and second run of `stats::optim()` are quite different. Of course, this does not mean that the refit solution produced by `stats::optim()` is a local mode: ```{r, fig.width = 10, fig.height = 6, out.width = "97%"} # projection plots with refined solution optim_proj(xsol = orefit$xopt, fun = objfun, xrng = .5, maximize = FALSE) ``` Indeed, the default `stats::optim()` method is only accurate when initialized close to the optimal solution. Therefore, one may wish to run the refit test with a different optimizer. This can be done externally to `optim_refit`, prior to passing the refit solution to the function via its argument `xopt`. This is illustrated below using `stats::optim()`'s gradient-based quasi-Newton method: ```{r} # gradient of the objective function objgrad <- function(x) 2 * drop(A %*% x - b) # mode-finding using quasi-Newton method xfit2 <- optim(fn = objfun, # objective function gr = objgrad, # gradient par = xfit$par, # initial value (first optim fit) method = "BFGS") # external refit test with optimizer of choice orefit2 <- optim_refit(fun = objfun, xsol = xfit$par, # initial value (first optim fit) xopt = xfit2$par, # refit value (2nd fit with quasi-Newton method maximize = FALSE) # project plot test on refit solution optim_proj(xsol = orefit2$xopt, fun = objfun, xrng = .5, maximize = FALSE, plot = FALSE) ``` ## Future Work: Constrained Optimization Many constrained statistical optimization problems, seek a "sparse" solution, i.e., one for which some of the elements of the optimal solution are equal to zero. In such cases, the relative difference between potential and optimal solution is an unreliable metric. A working proposal is to flag these "true zeros" in `optim_proj()` and `optim_refit()`, so as to add a 1 to the relative difference denominators. Other suggestions on this and **optimCheck** in general are [welcome](mailto:mlysy@uwaterloo.ca).