# More on Joint Bayesian Verification

### Derivations

The followings are some detailed derivatation of the formulars in the paper Bayesian Face Revisited: A Joint Formulation.

Eq.4 Background
$x=\mu+\epsilon$ where $$\mu$$ and $$\epsilon$$ follow two independent Gaussians $$N(0,S_\mu)$$ and $$N(0,S_\epsilon)$$. The covariance matrices of $$p(x_1,x_2|H_I)$$ and $$p(x_1,x_2|H_E)$$ are given by $\Sigma_I=\left[ \begin{matrix} S_{\mu}+S_{\epsilon} & S_{\mu} \ S_{\mu} & S_{\mu}+S_{\epsilon} \end{matrix} \right],$ $\Sigma_E=\left[ \begin{matrix} S_{\mu}+S_{\epsilon} & 0 \ 0 & S_{\mu}+S_{\epsilon} \end{matrix} \right].$

Eq.4
$r(x_1,x_2)=\log\frac{p(x_1,x_2|H_I)}{p(x_1,x_2|H_E)}=x_1^TAx_1+x_2^TAx_2-2x_1^TGx_2+const,$ where $A=(S_\mu+S_\epsilon)^{-1}-(F+G),$ $\left[ \begin{matrix} F+G & G \ G & F+G \end{matrix} \right] = \left[ \begin{matrix} S_{\mu}+S_{\epsilon} & S_{\mu} \ S_{\mu} & S_{\mu}+S_{\epsilon} \end{matrix} \right]^{-1}.$

Derivation
$p(x_1,x_2|H_I)=\frac{1}{norm}exp(-\frac{1}{2} \left[ \begin{matrix} x_1^T & x_2^T \end{matrix} \right] \Sigma_I^{-1} \left[ \begin{matrix} x_1 \\ x_2 \end{matrix} \right]),$ where $\left[ \begin{matrix} x_1^T & x_2^T \end{matrix} \right] \Sigma_I^{-1} \left[ \begin{matrix} x_1 \\ x_2 \end{matrix} \right] = \left[ \begin{matrix} x_1 & x_2 \end{matrix} \right] \left[ \begin{matrix} F+G & G \ G & F+G \end{matrix} \right] \left[ \begin{matrix} x_1 \\ x_2 \end{matrix} \right] = x_1^TAx_1+x_2^TAx_2-2x_1^TGx_2.$ $p(x_1,x_2|H_E)=\frac{1}{norm}exp(-\frac{1}{2} \left[ \begin{matrix} x_1^T & x_2^T \end{matrix} \right] \Sigma_E^{-1} \left[ \begin{matrix} x_1 \\ x_2 \end{matrix} \right]),$ where $\left[ \begin{matrix} x_1^T & x_2^T \end{matrix} \right] \Sigma_E^{-1} \left[ \begin{matrix} x_1 \\ x_2 \end{matrix} \right] = \left[ \begin{matrix} x_1 & x_2 \end{matrix} \right] \left[ \begin{matrix} (S_{\mu}+S_{\epsilon})^{-1} & 0 \ 0 & (S_{\mu}+S_{\epsilon})^{-1} \end{matrix} \right] \left[ \begin{matrix} x_1 \\ x_2 \end{matrix} \right] = x_1^T(S_{\mu}+S_{\epsilon})^{-1}x_1+x_2^T(S_{\mu}+S_{\epsilon})^{-1}x_2.$ $\therefore r(x_1,x_2)=\log\frac{p(x_1,x_2|H_I)}{p(x_1,x_2|H_E)}=x_1^T((S_\mu+S_\epsilon)^{-1}-(F+G))x_1+x_2^T((S_\mu+S_\epsilon)^{-1}-(F+G))x_2-2x_1^TGx_2+const=x_1^TAx_1+x_2^TAx_2-2x_1^TGx_2+const$

Eq.8 Background
$\mathbf{h}=[\mu;\epsilon_1;…;\epsilon_m]$ $\mathbf{x}=[x_1;…;x_m]$ $\mathbf{x}=P\mathbf{h}$ $P=\left[\begin{matrix} I & I & 0 & \dots & 0 \ I & 0 & I & \dots & 0 \ \vdots & \vdots & \vdots & \ddots & \vdots \ I & 0 & 0 & … & I \end{matrix}\right]$ The distribution the hidden variable $$\mathbf{h}$$ is $$N(0,\Sigma_h)$$, where $$\Sigma_h=diag(S_\mu,S_\epsilon,\dots,S_\epsilon)$$.

Eq.8
$E(\mathbf{h}|\mathbf{x})=\Sigma_hP^T\Sigma_x^{-1}\mathbf{x}$

Derivation
The distribution of $$\mathbf{x}$$ is another Gaussian $$N(0,\Sigma_x)$$ where $\Sigma_x=P\Sigma_hP^T=\left[\begin{matrix} S_{\mu}+S_{\epsilon} & S_{\mu} & \dots & S_{\mu} \ S_{\mu} & S_{\mu}+S_{\epsilon} & \dots & S_{\mu} \ \vdots & \vdots & \ddots & \vdots \ S_{\mu} & S_{\mu} & … & S_{\mu}+S_{\epsilon} \end{matrix}\right]$ ref: Linear combinations of normal random variables.
More details can be found in this question on zhihu.

### Discussion

Why don’t we just solve $$\mathbf{h}=P^{\dagger}\mathbf{x}$$ directly instead of rewriting it in terms of $$S_{\mu}$$ and $$S_{\epsilon}$$?

Because it is not the only solution, since $$\mathbf{h}$$ has one more degree of freedom than $$\mathbf{x}$$. The scatter matrices of Linear Discriminant Analysis (LDA) mentioned in the paper can be thought of as another solution for $$\mathbf{x}=P\mathbf{h}$$. This is sort of analogous to the biased variance estimation because we don’t know where is the true mean $$\mu$$.

So the paper has shown doing EM is a smarter choice.