--- title: "LMN: Inference for Linear Models with Nuisance Parameters" author: "Martin Lysy, Bryan Yates" date: "`r Sys.Date()`" pkgdown: as_is: true output: bookdown::html_vignette2: toc: true number_sections: no bibliography: references.bib csl: taylor-and-francis-harvard-x.csl link-citations: true vignette: > %\VignetteIndexEntry{Getting Started with LMN} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set(out.width = "\\textwidth") suppressMessages({ require(LMN) require(kableExtra) }) cran_link <- function(pkg) paste0("[**", pkg, "**](https://CRAN.R-project.org/package=", pkg, ")") build_cache <- FALSE oldpar <- par() ``` \newcommand{\YY}{{\boldsymbol{Y}}} \newcommand{\XX}{{\boldsymbol{X}}} \newcommand{\RR}{{\boldsymbol{R}}} \newcommand{\VV}{{\boldsymbol{V}}} \newcommand{\ZZ}{{\boldsymbol{Z}}} \newcommand{\SS}{{\boldsymbol{S}}} \newcommand{\TT}{{\boldsymbol{T}}} \newcommand{\EE}{{\boldsymbol{E}}} \newcommand{\bbe}{{\boldsymbol{\beta}}} \newcommand{\BBe}{{\boldsymbol{B}}} \newcommand{\SSi}{{\boldsymbol{\Sigma}}} \newcommand{\tth}{{\boldsymbol{\theta}}} \newcommand{\LLa}{{\boldsymbol{\Lambda}}} \newcommand{\OOm}{{\boldsymbol{\Omega}}} \newcommand{\PPs}{{\boldsymbol{\Psi}}} \newcommand{\TTh}{{\boldsymbol{\Theta}}} \newcommand{\PPh}{{\boldsymbol{\Phi}}} \newcommand{\mmu}{{\boldsymbol{\mu}}} \newcommand{\eet}{{\boldsymbol{\eta}}} \newcommand{\MN}{\textrm{MatNorm}} \newcommand{\mniw}{\textrm{MNIW}} \newcommand{\iwish}{\textrm{InvWish}} \newcommand{\N}{\mathcal{N}} \newcommand{\bz}{\boldsymbol{0}} \renewcommand{\vec}{\textrm{vec}} \newcommand{\ud}{\mathrm{d}} \newcommand{\eps}{\varepsilon} \newcommand{\a}{\alpha} \newcommand{\g}{\gamma} \newcommand{\s}{\sigma} \newcommand{\l}{\lambda} \newcommand{\dt}{\Delta t} \newcommand{\var}{\mathrm{var}} \newcommand{\cov}{\mathrm{cov}} \newcommand{\iid}{\stackrel {\textrm{iid}}{\sim}} \newcommand{\ind}{\stackrel {\textrm{ind}}{\sim}} \newcommand{\llp}{\ell_{\textrm{prof}}} \newcommand{\diag}{\textrm{diag}} ## Introduction Consider a statistical model $p(\YY \mid \tth, \BBe, \SSi)$ of the form \begin{equation} \YY \sim \MN(\XX_\tth \BBe, \VV_\tth, \SSi), (\#eq:lmn) \end{equation} $\XX_\tth = \XX_{n \times p}(\tth)$ is the design matrix which depends on parameters $\tth$, $\BBe_{p \times q}$ are regression coefficients, $\VV_\tth = \VV_{n \times n}(\tth)$ and $\SSi_{q \times q}$ are between-row and between-column variance matrices, and the [Matrix-Normal](https://en.wikipedia.org/wiki/Matrix_normal_distribution) distribution is defined as $$ \ZZ_{p \times q} \sim \MN(\LLa_{p \times q}, \OOm_{p \times p}, \SSi_{q \times q}) \quad \iff \quad \vec(\ZZ) \sim \N(\vec(\LLa), \SSi \otimes \OOm), $$ where $\vec(\ZZ)$ is a vector stacks the columns of $\ZZ$, and $\SSi \otimes \OOm$ denotes the [Kronecker product](https://en.wikipedia.org/wiki/Kronecker_product). Model \@ref(eq:lmn) is referred to as a Linear Model with Nuisance parameters (LMN) $(\BBe, \SSi)$ for parameters of interest $\tth$. The **LMN** package provides tools to efficiently conduct Frequentist or Bayesian inference on all parameters $\TTh = (\tth, \BBe, \SSi)$ by estimating $\tth$ first, and subsequently $(\BBe, \SSi)$, as illustrated in the examples below. ## Example 1: Nonlinear Regression with Correlated Errors ```{r toy_sim_calc, include = FALSE} require(LMN) # problem dimensions qq <- 2 # number of responses n <- 200 # number of observations # parameters Beta <- matrix(c(.3, .5, .7, .2), 2, qq) Sigma <- matrix(c(.005, -.001, -.001, .002), qq, qq) alpha <- .4 lambda <- .1 # simulate data xseq <- seq(0, 10, len = n) # x vector X <- cbind(1, xseq^alpha) # covariate matrix V <- exp(-(outer(xseq, xseq, "-")/lambda)^2) # between-row variance Mu <- X %*% Beta # mean matrix Y <- mniw::rMNorm(n = 1, Lambda = Mu, SigmaR = V, SigmaC = Sigma) # response matrix ## # response matrix ## Y <- matrix(rnorm(n*qq), n, qq) ## Y <- crossprod(chol(V), Y) %*% chol(Sigma) + Mu # plot data par(mfrow = c(1,1), mar = c(4,4,.5,.5)+.1) plot(x = 0, type = "n", xlim = range(xseq), ylim = range(Mu, Y), xlab = "x", ylab = "y") lines(x = xseq, y = Mu[,1], col = "red") lines(x = xseq, y = Mu[,2], col = "blue") points(x = xseq, y = Y[,1], pch = 16, col = "red") points(x = xseq, y = Y[,2], pch = 16, col = "blue") legend("bottomright", legend = c("Response 1", "Response 2", "Expected", "Observed"), lty = c(NA, NA, 1, NA), pch = c(22, 22, NA, 16), pt.bg = c("red", "blue", "black", "black"), seg.len = 1) ``` Consider a toy example of the form \begin{equation} y_{ij} = \beta_{0j} + \beta_{1j} x_i^\alpha + \eps_{ij}, (\#eq:toy) \end{equation} where $i=1,\ldots,n$, $j=1,\ldots,q$, and the $\eps_{ij}$ are multivariate normal with $E[\eps_{ij}] = 0$ and $$ \cov(\eps_{ik}, \eps_{jm}) = \Sigma_{ij} \times \exp\left\{-\frac{(x_k - x_m)^2}{\lambda^2}\right\}. $$ Then this toy example can be written in the form of \@ref(eq:lmn) with $\tth = (\alpha, \lambda)$ and $$ \begin{aligned} \YY_{n\times q} & = \begin{bmatrix} y_{11} & \cdots & y_{1q} \\ \vdots & & \vdots \\ y_{n1} & \cdots & y_{nq} \end{bmatrix}, & \VV_{n\times n}(\tth) & = \begin{bmatrix} e^{-\frac{(x_1-x_1)^2}{\lambda^2}} & \cdots & e^{-\frac{(x_1-x_n)^2}{\lambda^2}} \\ \vdots & & \vdots \\ e^{-\frac{(x_n-x_1)^2}{\lambda^2}} & \cdots & e^{-\frac{(x_n-x_n)^2}{\lambda^2}} \end{bmatrix}, & \XX_{n\times 2}(\tth) & = \begin{bmatrix} 1 & x_1^\alpha \\ \vdots & \vdots \\ 1 & x_n^\alpha \end{bmatrix}, \\ \BBe_{2\times q} & = \begin{bmatrix} \beta_{01} & \cdots & \beta_{0q} \\ \beta_{11} & \cdots & \beta_{1q} \end{bmatrix}, & \SSi_{q\times q} & = \begin{bmatrix} \Sigma_{11} & \cdots & \Sigma_{1q} \\ \vdots & & \vdots \\ \Sigma_{q1} & \cdots & \Sigma_{qq} \end{bmatrix}. \end{aligned} $$ Sample data is generated below with $n = `r n`$ and $q = `r qq`$. Note the use of `mniw::rMNorm()` from the **mniw** package to sample from the Matrix Normal. ```{r toysim, ref.label = "toy_sim_calc", fig.height = 4, fig.width = 6.5, fig.cap = paste0("Simulated data from the toy model \\@ref(eq:toy) with $n = ", n, "$ and $q = ", qq, "$.")} ``` The workhorse function in **LMN** is `lmn_suff()`, which calculates the sufficient statistics for $(\BBe, \SSi)$ for a given value of $\tth$. ```{r} suff <- lmn_suff(Y = Y, X = X, V = V, Vtype = "full") sapply(suff, function(x) paste0(dim(as.array(x)), collapse = "x")) ``` Namely, the elements of `suff` are: - `Bhat`: The $p \times q$ matrix $\hat \BBe_\tth = (\XX_\tth'\VV_\tth^{-1}\XX_\tth)^{-1}\XX_\tth'\VV_\tth^{-1}\YY$. - `T`: The $p \times p$ matrix $\TT_\tth = \XX_\tth'\VV_\tth^{-1}\XX_\tth$. - `S`: The $q \times q$ matrix $\SS_\tth = (\YY - \XX_\tth\hat\BBe_\tth)'\VV_\tth^{-1}(\YY - \XX_\tth\hat\BBe_\tth)$. - `ldV`: The log-determinant $\log |\VV_\tth|$. - `n`, `p`, `q`: The dimensions of the problem. In particular, the MLEs of $(\BBe, \SSi)$ given a particular value of $\tth$ are $\hat \BBe_\tth$ and $\hat \SSi_\tth = \SS_\tth/n$. ### Profile Likelihood The profile (log)likelihood function for the LMN model corresponds to evaluating the full likelihood as a function of $\tth$ at the conditional MLE of $(\BBe, \SSi)$: $$ \llp(\tth \mid \YY) = \ell_{\textrm{full}}(\tth, \BBe = \hat \BBe_\tth, \SSi = \hat \SSi_\tth \mid \YY). $$ The following R code shows how to calculate the full MLE $\hat \TTh = (\hat \tth, \hat \BBe, \hat \SSi)$ from only a 2D optimization of $\llp(\alpha, \lambda \mid \YY)$. The first step is to now that for equally-spaced $x_i$ as in the simulated data above, the between-row variance matrix $\VV_\tth$ is in fact [Toeplitz](https://en.wikipedia.org/wiki/Toeplitz_matrix). It is entirely determined by its first row, and can be inverted much more efficiently than dense matrices using the R package `r cran_link("SuperGauss")`. ```{r, results = "hold"} # check than dense matrix and Toeplitz matrix calculations are the same # autocorrelation function, or first row of V_theta toy_acf <- function(lambda) exp(-((xseq-xseq[1])/lambda)^2) # check that calculation of suff is the same all.equal(suff, lmn_suff(Y = Y, X = X, V = toy_acf(lambda), Vtype = "acf")) # check that it's much faster to use Vtype = "acf" system.time({ # using dense variance matrix replicate(100, lmn_suff(Y = Y, X = X, V = V, Vtype = "full")) }) system.time({ # using Toeplitz variance matrix replicate(100, lmn_suff(Y = Y, X = X, V = toy_acf(lambda), Vtype = "acf")) }) ``` Next, let's find the value of the MLE $\hat \TTh$ using `stats::optim()`. For display purposes the parameters will be written as $$ \TTh = \left(\alpha, \lambda, \beta_{01}, \beta_{02}, \beta_{11}, \beta_{12}, \sigma_1 = \Sigma_{11}^{1/2}, \sigma_2 = \Sigma_{22}^{1/2}, \rho_{12} = \frac{\Sigma_{12}}{\sigma_1\sigma_2}\right). $$ Also, as the **SuperGauss** computations require some memory allocation, we'll declare a `SuperGauss::Toeplitz()` object once and re-use it within the optimization routine. ```{r} # pre-allocate memory for Toeplitz matrix calcuations Tz <- SuperGauss::Toeplitz$new(N = n) # sufficient statistics for the toy model # sufficient statistics for toy model toy_suff <- function(theta) { X <- cbind(1, xseq^theta[1]) Tz$set_acf(acf = toy_acf(theta[2])) lmn_suff(Y = Y, X = X, V = Tz, Vtype = "acf") } # _negative_ profile likelihood for theta toy_prof <- function(theta) { if(theta[2] < 0) return(-Inf) # range restriction lambda > 0 suff <- toy_suff(theta) -lmn_prof(suff = suff) } # MLE of theta opt <- optim(par = c(alpha, lambda), # starting value fn = toy_prof) # objective function theta_mle <- opt$par # MLE of (Beta, Sigma) suff <- toy_suff(theta_mle) Beta_mle <- suff$Bhat Sigma_mle <- suff$S/suff$n # display: # convert variance matrix to vector of standard deviations and correlations. cov2sigrho <- function(Sigma) { sig <- sqrt(diag(Sigma)) n <- length(sig) # dimensions of Sigma names(sig) <- paste0("sigma",1:n) # indices of upper triangular elements iupper <- matrix(1:(n^2),n,n)[upper.tri(Sigma, diag = FALSE)] rho <- cov2cor(Sigma)[iupper] rnames <- apply(expand.grid(1:n, 1:n), 1, paste0, collapse = "") names(rho) <- paste0("rho",rnames[iupper]) c(sig,rho) } Theta_mle <- c(theta_mle, t(Beta_mle), cov2sigrho(Sigma_mle)) names(Theta_mle) <- c("alpha", "lambda", "beta_01", "beta_02", "beta_11", "beta_12", "sigma_1", "sigma_2", "rho_12") signif(Theta_mle, 2) ``` Finally, let's calculate standard errors for each parameter using $\sqrt{\diag(\widehat{\var}(\hat \TTh))}$, where $$ \widehat{\var}(\hat \TTh) = -\left[\frac{\partial^2 \ell(\TTh \mid \YY)}{\partial \TTh\partial\TTh'} \right]^{-1}. $$ This can be done numerically using the R package `r cran_link("numDeriv")` function `numDeriv::hessian()`: ```{r} # full _negative_ loglikelihood for the toy model toy_nll <- function(Theta) { theta <- Theta[1:2] Beta <- rbind(Theta[3:4], Theta[5:6]) Sigma <- diag(Theta[7:8]^2) Sigma[1,2] <- Sigma[2,1] <- Theta[7]*Theta[8]*Theta[9] # calculate loglikelihood suff <- toy_suff(theta) -lmn_loglik(Beta = Beta, Sigma = Sigma, suff = suff) } # uncertainty estimate: # variance estimator Theta_ve <- solve(numDeriv::hessian(func = toy_nll, x = Theta_mle)) Theta_se <- sqrt(diag(Theta_ve)) # standard errors # display tab <- rbind(true = c(alpha, lambda, t(Beta), cov2sigrho(Sigma)), mle = Theta_mle, se = Theta_se) colnames(tab) <- paste0("$\\", gsub("([0-9]+)", "{\\1}", names(Theta_mle)), "$") rownames(tab) <- c("True Value", "MLE", "Std. Error") kableExtra::kable(as.data.frame(signif(tab,2))) ``` ## Example 2: Generalized Cox-Ingersoll-Ross Process ```{r gcir_setup, include = FALSE} set.seed(7) # for reproducible results # simulate data from the gcir model gcir_sim <- function(N, dt, Theta, x0) { # parameters gamma <- Theta[1] mu <- Theta[2] sigma <- Theta[3] lambda <- Theta[4] Rt <- rep(NA, N+1) Rt[1] <- x0 for(ii in 1:N) { Rt[ii+1] <- rnorm(1, mean = Rt[ii] - gamma * (Rt[ii] - mu) * dt, sd = sigma * Rt[ii]^lambda * sqrt(dt)) } Rt } # true parameter values Theta <- c(gamma = .07, mu = .01, sigma = .6, lambda = .9) dt <- 1/12 # interobservation time (in years) N <- 12 * 20 # number of observations (20 years) ``` In @chan.etal92, a model for the interest rate $R_t$ as a function of time is given by the stochastic differential equation (SDE) \begin{equation} \ud R_t = -\g(R_t - \mu) \dt + \s R_t^\l\ud B_t, (\#eq:chan) \end{equation} where $B_t$ is Brownian motion and the parameters are restricted to $\gamma, \mu, \sigma, \lambda > 0$. Suppose the data $\RR = (R_0, \ldots, R_N)$ consists of equispaced observations $R_n = R_{n\cdot \dt}$ with interobservation time $\dt$. A commonly-used discrete-time approximation is given by \begin{equation} R_{n+1} \mid R_0,\ldots,R_n \sim \N\big(R_n - \g(R_n - \mu)\dt, \s^2 R_n^{2\l} \dt\big). (\#eq:euler) \end{equation} Sample data from the discrete-time approximation \@ref(eq:euler) to the so-called generalized [Cox-Ingersoll-Ross](https://en.wikipedia.org/wiki/Cox%E2%80%93Ingersoll%E2%80%93Ross_model) (gCIR) model \@ref(eq:chan) is generated below, with $\dt = 1/12$ (one month) $N = `r N`$ (20 years), and true parameter values `r paste0("$\\TTh = (\\gamma, \\mu, \\sigma, \\lambda) = (", paste0(Theta, collapse = ", "), ")$")`. ```{r gcir_sim, include = FALSE} Rt <- gcir_sim(N = N, dt = dt, Theta = Theta, x0 = Theta["mu"]) par(mar = c(4,4,.5,.5)) plot(x = 0:N*dt, y = 100*Rt, pch = 16, cex = .8, xlab = "Time (years)", ylab = "Interest Rate (%)") ``` ```{r, gcirdata, ref.label = c("gcir_setup", "gcir_sim"), echo = -1, fig.height = 4, fig.width = 6.5, fig.cap = paste0("Simulated interest rates from discrete-time approximation \\@ref(eq:euler) to the gCIR model \\@ref(eq:chan), with $N = ", N, "$, $\\dt = 1/12$, and $\\TTh = (", paste0(Theta, collapse = ", "), ")$.")} ``` ### Bayesian Inference The likelihood for $\TTh$ is of LMN form \@ref(eq:lmn) with $\tth = \lambda$, $\BBe_{2 \times 1} = (\gamma, \gamma \mu)$, $\SSi_{1 \times 1} = \sigma^2$, and $$ \begin{aligned} \YY_{N\times 1} & = \begin{bmatrix} R_1 - R_0 \\ \vdots \\ R_N - R_{N-1} \end{bmatrix}, & \XX_{N \times 2} & = \begin{bmatrix} - R_0 \dt & \dt \\ \vdots & \vdots \\ - R_{N-1} \dt & \dt \end{bmatrix}, & \VV_{N \times N}(\lambda) & = \begin{bmatrix} X_0^{2\lambda} \dt & & 0 \\ & \ddots & \\ 0 & & X_{N-1}^{2\lambda} \dt\end{bmatrix}. \end{aligned} $$ Thus, we may proceed to maximum likelihood estimation via profile likelihood as above, using the `Vtype = diag` argument to `lmn_suff()` to optimize calculations for diagonal $\VV$. ```{r gcir_mle} # precomputed values Y <- matrix(diff(Rt)) X <- cbind(-Rt[1:N], 1) * dt # since Rt^(2*lambda) is calculated as exp(2*lambda * log(Rt)), # precompute 2*log(Rt) to speed up calculations lR2 <- 2 * log(Rt[1:N]) # sufficient statistics for gCIR model gcir_suff <- function(lambda) { lmn_suff(Y = Y, X = X, V = exp(lambda * lR2) * dt, Vtype = "diag") } # _negative_ profile likelihood for gCIR model gcir_prof <- function(lambda) { if(lambda <= 0) return(Inf) -lmn_prof(suff = gcir_suff(lambda)) } # MLE of Theta via profile likelihood # profile likelihood for lambda opt <- optimize(f = gcir_prof, interval = c(.001, 10)) lambda_mle <- opt$minimum # conditional MLE for remaining parameters suff <- gcir_suff(lambda_mle) Theta_mle <- c(gamma = suff$Bhat[1,1], mu = suff$Bhat[2,1]/suff$Bhat[1,1], sigma = sqrt(suff$S[1,1]/suff$n), lambda = lambda_mle) Theta_mle ``` However, we can see that the profile likelihood method produces negative estimates of $\gamma$ and $\mu$, i.e., outside of the parameter support. We could try to restrict or penalize the likelihood optimization problem to obtain an admissible MLE, but then the profile likelihood simplifications would no longer apply. Instead, consider the following Bayesian approach. First, we note that the conjugate prior for $(\BBe, \SSi)$ in the LMN model conditional on $\tth$ is the Matrix-Normal Inverse-Wishart (MNIW) distribution, $$ \BBe, \SSi \mid \tth \sim \mniw(\LLa_\tth, \OOm_\tth, \PPs_\tth, \nu_\tth) \qquad \iff \qquad \begin{aligned} \BBe \mid \SSi & \sim \MN(\LLa_\tth, \OOm_\tth^{-1}, \SSi) \\ \SSi & \sim \iwish(\PPs_\tth, \nu_\tth), \end{aligned} $$ where $\iwish$ denotes the [Inverse-Wishart](https://en.wikipedia.org/wiki/Inverse-Wishart_distribution) distribution, and the hyperparameters $\PPh_\tth = (\LLa_\tth, \OOm_\tth, \PPs_\tth, \nu_\tth)$ can depend on $\tth$. Thus, for the prior distribution $\pi(\BBe, \SSi, \tth)$ given by \begin{equation} \begin{aligned} \tth & \sim \pi(\tth) \\ \BBe, \SSi \mid \tth & \sim \mniw(\LLa_\tth, \OOm_\tth, \PPs_\tth, \nu_\tth), \end{aligned} (\#eq:conjprior) \end{equation} we have the following analytical results: 1. The conjugate posterior distribution $p(\BBe, \SSi \mid \YY, \tth)$ is also $\mniw$, with closed-form expressions for the hyperparameters $\hat \PPh_\tth = (\hat \LLa_\tth, \hat \OOm_\tth, \hat \PPs_\tth, \hat \nu_\tth)$ calculated by `lmn_post()`. 2. The marginal posterior distribution $p(\tth \mid \YY)$ has a closed-form expression calculated by `lmn_marg()`. Both closed-form expressions are provided [below](#app). Putting these results together, we can efficiently conduct Bayesian inference for LMN models by first sampling $\tth^{(1)}, \ldots, \tth^{(B)} \sim p(\tth \mid \YY)$, then conditionally sampling $(\BBe^{(b)}, \SSi^{(b)}) \ind \mniw(\hat \PPh_{\tth^{(b)}})$ for $b = 1,\ldots, B$. This is done in the R code below with the default prior $$ \pi(\TTh) \propto |\SSi|^{-(q+1)/2}, $$ which is obtained from `lmn_prior()`: ```{r gcir_prior} prior <- lmn_prior(p = 2, q = 1) # default prior prior ``` First, we implement a grid-based sampler for $\lambda \sim p(\lambda \mid \RR)$: ```{r gcir_post_lambda, fig.height = 4, fig.width = 6.5} # log of marginal posterior p(lambda | R) gcir_marg <- function(lambda) { suff <- gcir_suff(lambda) post <- lmn_post(suff, prior) lmn_marg(suff = suff, prior = prior, post = post) } # grid sampler for lambda ~ p(lambda | R) # estimate the effective support of lambda by taking # mode +/- 5 * sqrt(quadrature) lambda_mode <- optimize(f = gcir_marg, interval = c(.01, 10), maximum = TRUE)$maximum lambda_quad <- -numDeriv::hessian(func = gcir_marg, x = lambda_mode)[1] lambda_rng <- lambda_mode + c(-5,5) * 1/sqrt(lambda_quad) # plot posterior on this range lambda_seq <- seq(lambda_rng[1], lambda_rng[2], len = 1000) lambda_lpdf <- sapply(lambda_seq, gcir_marg) # log-pdf # normalized pdf lambda_pdf <- exp(lambda_lpdf - max(lambda_lpdf)) lambda_pdf <- lambda_pdf / sum(lambda_pdf) / (lambda_seq[2]-lambda_seq[1]) par(mar = c(2,4,2,.5)) plot(lambda_seq, lambda_pdf, type = "l", xlab = expression(lambda), ylab = "", main = expression(p(lambda*" | "*bold(R)))) ``` The grid appears to have captured the effective support of $p(\lambda \mid \RR)$, so we may proceed to conditional sampling. To do this effectively we use the function `mniw::rmniw()` in the **mniw** package, which vectorizes simulations over different MNIX parameters $\hat \PPh_{\tth^{(1)}}, \ldots, \PPh_{\tth^{(1)}}$. ```{r gcir_post, cache = build_cache} npost <- 5e4 # number of posterior draws # marginal sampling from p(lambda | R) lambda_post <- sample(lambda_seq, size = npost, prob = lambda_pdf, replace = TRUE) # conditional sampling from p(B, Sigma | lambda, R) BSig_post <- lapply(lambda_post, function(lambda) { lmn_post(gcir_suff(lambda), prior) }) BSig_post <- list2mniw(BSig_post) # convert to vectorized mniw format BSig_post <- mniw::rmniw(npost, Lambda = BSig_post$Lambda, Omega = BSig_post$Omega, Psi = BSig_post$Psi, nu = BSig_post$nu) # convert to Theta = (gamma, mu, sigma, lambda) Theta_post <- cbind(gamma = BSig_post$X[1,1,], mu = BSig_post$X[2,1,]/BSig_post$X[1,1,], sigma = sqrt(BSig_post$V[1,1,]), lambda = lambda_post) apply(Theta_post, 2, min) ``` We can see that the posterior sampling scheme above for $p(\TTh \mid \RR)$ did not always produce positive values for $\gamma$ and $\mu$. However, we can correct this post-hoc by making use of the following fact: > **Rejection Sampling.** Suppose that for a given prior $\TTh \sim \pi(\TTh)$ and likelihood function $p(\RR \mid \TTh)$, we obtain a sample $\TTh^{(1)}, \ldots, \TTh^{(B)}$ from the posterior distribution $p(\TTh \mid \RR)$. Then if we keep only the samples such that $\TTh^{(b)} \in \mathcal{S}$, this results in samples from the posterior distribution with likelihood $p(\RR \mid \TTh)$ and constrained prior distribution $\TTh \sim \pi(\TTh \mid \TTh \in S)$. In other words, if we eliminate from `Theta_post` all rows for which $\gamma < 0$ or $\mu < 0$, we are left with a sample form the posterior distribution with prior $$ \pi(\TTh) \propto 1/\sigma^2 \times \boldsymbol{\textrm{I}}\{\gamma, \mu > 0\}. $$ Posterior parameter distributions from the corresponding rejection sampler are displayed below. ```{r gcirhist, fig.width = 6.5, fig.height = 5, fig.cap = "Posterior distribution and true value for each gCIR parameter $\\TTh = (\\gamma, \\mu, \\sigma, \\lambda)$."} # keep only draws for which gamma, mu > 0 ikeep <- pmin(Theta_post[,1], Theta_post[,2]) > 0 mean(ikeep) # a good number of draws get discarded Theta_post <- Theta_post[ikeep,] # convert mu to log scale for plotting purposes Theta_post[,"mu"] <- log10(Theta_post[,"mu"]) # posterior distributions and true parameter values Theta_names <- c("gamma", "log[10](mu)", "sigma", "lambda") Theta_true <- Theta Theta_true["mu"] <- log10(Theta_true["mu"]) par(mfrow = c(2,2), mar = c(2,2,3,.5)+.5) for(ii in 1:ncol(Theta_post)) { hist(Theta_post[,ii], breaks = 40, freq = FALSE, xlab = "", ylab = "", main = parse(text = paste0("p(", Theta_names[ii], "*\" | \"*bold(R))"))) abline(v = Theta_true[ii], col = "red", lwd = 2) if(ii == 1) { legend("topright", inset = .05, legend = c("Posterior Distribution", "True Parameter Value"), lwd = c(NA, 2), pch = c(22, NA), seg.len = 1.5, col = c("black", "red"), bg = c("white", NA), cex = .85) } } ``` ## Appendix: Conjugate Posteriors for LMNs {#app} For the regression model \@ref(eq:lmn) with conjugate prior \@ref(eq:conjprior), the posterior distribution $p(\BBe, \SSi, \tth \mid \YY)$ is given by $$ \begin{aligned} \tth \mid \YY & \sim p(\tth \mid \YY) \propto \pi(\tth) \frac{\Xi(\PPs_{\tth}, \nu_{\tth})}{\Xi(\hat\PPs_{\tth},\hat\nu_{\tth})} \left(\frac{|\OOm_{\tth}|(2\pi)^{-n}}{|\hat \OOm_{\tth}||\VV_{\tth}|}\right)^{q/2} \\ \BBe, \SSi \mid \tth, \YY & \sim \mniw(\hat \LLa_\tth, \hat\OOm_\tth, \hat \PPs_\tth, \hat \nu_\tth), \end{aligned} $$ where $$ \begin{aligned} \Xi(\PPs,\nu) & = \frac{|\PPs|^{\nu/2}}{\sqrt{2^{\nu q}} \Gamma_q(\tfrac \nu 2)}, & \Gamma_q(a) & = \pi^{q(q-1)/4} \prod_{j=1}^q \Gamma[a + \tfrac 1 2(1-j)], \\ \hat \OOm_{\tth} & = \OOm_{\tth} + \TT_{\tth}, & \hat \LLa_{\tth} & = \hat \OOm_{\tth}^{-1}(\TT_{\tth} \hat \BBe_{\tth} + \OOm_{\tth} \LLa_{\tth}), \\ \hat \nu_{\tth} & = \nu_{\tth} + n, & \hat \PPs_{\tth} & = \PPs_{\tth} + \SS_{\tth} + \hat \BBe_{\tth}' \TT_{\tth} \hat \BBe_{\tth} + \LLa_{\tth}' \OOm_{\tth} \LLa_{\tth} - \hat \LLa_{\tth}' \hat \OOm_{\tth} \hat \LLa_{\tth}, \end{aligned} $$ and $\TT_{\tth}$, $\SS_{\tth}$ and $\hat \BBe_{\tth}$ are outputs from `lmn_suff()` defined [above](#eqsuff). ```{r teardown, include = FALSE} par(oldpar) ``` ## References