Dimension Reduction Project

In the context of the high dimension data, we always want to find some low dimensional vector to represent the original high dimension data.

Factor Analysis Model

Suppose we need to use the low dimensional vector to represent the original data, the original data $Y$ is not centralized yet. Such representation follows the following expression:

\[Y_i^{*}=\mu+\Lambda X_i+\epsilon_i\]

And we can have the joint distribution as follows:

\[\begin{bmatrix} X\\ Y^{*} \end{bmatrix} \sim N_{p+k}( \begin{bmatrix} \mathbb{0}\\ \mu \end{bmatrix},\begin{bmatrix} I&\Lambda^T\\ \Lambda&\Lambda\Lambda^T+\Phi\\ \end{bmatrix})\]

EM algorithm to solve the $\Lambda$ and $\Phi$

However, we can not solve the $\Lambda$ directly since $\Lambda$ is not identifiable. So we may use the EM algorithm to solve this problem.

\[P(Y\mid X, \Lambda, \Phi)=\frac{P(X\mid Y, \Phi, \Lambda)P(Y\mid \Lambda, \Phi)}{P(X\mid \Lambda, \, \Phi)} \Rightarrow P(Y\mid \Lambda, \Phi)=\frac{P(Y\mid X, \Lambda, \Phi)P(X)}{P(X\mid Y, \Phi, \Lambda)}\]

We can do a log-transformation to make the computation easier:

\[\begin{aligned} l(\theta)&=\ln P(Y\mid \Lambda, \Phi)\\ &=\ln P(Y\mid X, \Lambda, \Phi)+ \ln \frac{P(X)}{P(X\mid Y, \Phi, \Lambda)} \ \ \ \ \text{take the expectation in the two sides.}\\ &=\int \ln P(Y\mid X, \Lambda, \Phi) P(X) dX+\int \ln \frac{P(X)}{P(X\mid Y, \Phi, \Lambda)} P(X)dX\\ &=E_{X\sim P(X)}\{\ln P(Y\mid X, \Lambda, \Phi) \}+\mathbb{KL}(P(X)\mid P(X\mid Y, \Phi, \Lambda)) \end{aligned}\]

Since, KL divergence my always be positive so we can have the following inequality:

\[l(\theta)\geq E_{X\sim P(X)}\{\ln P(Y\mid X, \Lambda, \Phi) \}\]

E-step

To make the equality hold. The KL divergence must be zero and it indicates that

\[\mathbb{KL}(P(X)\mid P(X\mid Y, \Phi, \Lambda))=0 \Rightarrow P(X) \overset{d}{=}P(X\mid Y, \Phi, \Lambda)\]

Hence, the likelihood function transfer into a expectation as follows:

\[l(\theta)=E_{X\sim P(X\mid Y,\Lambda, \Phi)}\{P(Y\mid X, \Lambda, \Phi) \}\]

M-step

To update the $\lambda$ and $\Phi$, we can maximize the $l(\theta)$ and update the parameters as follows:

\[(\Lambda^{(t+1)},\Phi^{(t+1)})=\arg\max_{\Lambda,\Phi}E_{X\sim P(X\mid Y,\Lambda^{(t)}, \Phi^{(t)})}\{\ln P(Y\mid X, \Lambda, \Phi) \}\]

We can know the distribution of $Y$ given $X$ as follows:

\[\begin{aligned} \ln f_{Y\mid X}(y)&=-\frac{Np}{2}\ln(2\pi)-\frac{N}{2}\ln|\Phi|-\text{Trace}\{\frac{1}{2}\sum_{i=1}^{N}(y_i-\Lambda x_i)^T\Phi^{-1}(y_i-\Lambda x_i)\}\\ &=-\frac{Np}{2}\ln(2\pi)-\frac{N}{2}\ln|\Phi|-\frac{1}{2}\text{Trace}\{\Phi^{-1}\sum_{i=1}^{N}(y_i-\Lambda x_i)(y_i-\Lambda x_i)^T\}\\ &=-\frac{Np}{2}\ln(2\pi)-\frac{N}{2}\ln|\Phi|-\frac{N}{2}\text{Trace}\{\Phi^{-1}S_y\} \ \ \ \ \text{where }S_y=\frac{1}{N}\sum_{i=1}^{N}(y_i-\Lambda x_i)(y_i-\Lambda x_i)^T\\ \Rightarrow\\ E_{X\sim P(X\mid Y,\Lambda, \Phi)}\{P(Y\mid X, \Lambda, \Phi) \}&=-\frac{Np}{2}\ln(2\pi)-\frac{N}{2}\ln|\Phi|-\frac{N}{2}\text{Trace}\{\Phi^{-1}E(S_y)\} \end{aligned}\]

We need to solve the $E(S_y)$, the the detail derivation is expressed as follows:

\[\begin{aligned} E(S_y)&=\frac{1}{N}E\{\sum_{i=1}^{N}(y_i-\Lambda x_i)(y_i^T-x_i^T\Lambda^T )\}\\ &=\frac{1}{N}E\{\sum_{i=1}^{N}y_iy_i^T-y_ix_i^T\Lambda^T -\Lambda x_iy_i^T-\Lambda x_ix_i^T\Lambda^T\}\\ &=\frac{1}{N}E\{Y^TY-2\Lambda X^TY+\Lambda X^TX\Lambda^T\}\\ &=\frac{1}{N}[Y^TY-2\Lambda E(X)^T Y+\Lambda E(X^TX)\Lambda^T] \end{aligned}\]

Take the differentiate with respective to $\Lambda$, we can get the following result:

\[\begin{aligned} \frac{\partial}{\partial \Lambda}E_{X\sim P(X\mid Y,\Lambda, \Phi)}\{P(Y\mid X, \Lambda, \Phi) \}&=-\frac{N}{2}\text{Trace}\{\frac{1}{N}[\Phi^{-1}Y^TY-2\Phi^{-1}\Lambda E(X)^T Y+\Phi^{-1}\Lambda E(X^TX)\Lambda^T]\}\\ &=-\frac{N}{2}[-\Phi^{-1}Y^TE(X)+2\Phi^{-1}\Lambda E(X^TX)]\\ &=0\\ \Rightarrow \hat{\Lambda}^{(t+1)}&=Y^TE(x)[E(X^TX)]^{-1}\\ \end{aligned}\]

Take the differentiate with respective to $\Lambda$, we can get the following result:

\[\begin{aligned} \frac{\partial}{\partial \Phi}E_{X\sim P(X\mid Y,\Lambda, \Phi)}\{P(Y\mid X, \Lambda, \Phi) \}&=\frac{N}{2}\Phi-\frac{N}{2}E(S)=0 \Rightarrow \hat{\Phi}=E(S) \end{aligned}\]

We can calculate the $E(S)$ as follows:

\[\begin{aligned} E(S)&=\frac{1}{N}E\{(Y^T-\hat{\Lambda}^{(t+1)} X^T)(Y^T-\hat{\Lambda}^{(t+1)} X^T)^T\}\\ &=\frac{1}{N}E\{(Y^T-\hat{\Lambda}^{(t+1)} X^T)(Y-X{\hat{\Lambda}^{(t+1)}}^T)\}\\ &=\frac{1}{N}E\{(Y^T-\hat{\Lambda}^{(t+1)} X^T)(Y-X{\hat{\Lambda}^{(t+1)}}^T)\}\\ &=\frac{1}{N}E\{Y^TY-2\hat{\Lambda}^{(t+1)}X^T Y+\hat{\Lambda}^{(t+1)} X^TX{\hat{\Lambda}^{(t+1)}}^T\}\\ &=\frac{1}{N}[Y^TY-2\hat{\Lambda}^{(t+1)}E(X)^TY+hat{\Lambda}^{(t+1)} E(X^TX){\hat{\Lambda}^{(t+1)}}^T] \end{aligned}\]

According this algorithm, I can have the the code shown as follows:

FA_generalized=function(Y,K=2,sigma,epsilon = 1e-3,max_iter=200){
    #set.seed(601)
    n=dim(Y)[1]
    p=dim(Y)[2]

    lambda=matrix(rnorm(p*K,0,sigma),nrow=p,ncol=K)
    phi=diag(rnorm(p,0,1))

    loss=c()
    
    original=matrix(c(rep(1,4),numeric(3),numeric(3),rep(1,4)),nrow=7,ncol=2)%*%t(matrix(c(rep(1,4),numeric(3),numeric(3),rep(1,4)),nrow=7,ncol=2))
    new=lambda%*%t(lambda)
    diff=norm(new-original,type="F")
    
    t=0


    while(diff>epsilon){
        t=t+1
    
        Varx=diag(1,K)-t(lambda)%*%solve(lambda%*%t(lambda)+phi)%*%lambda
        Ex=Y%*%solve(lambda%*%t(lambda)+phi)%*%lambda
        Extx=n*Varx+t(Ex)%*%Ex
    
        lambda_new=t(Y)%*%Ex%*%solve(Extx)
        

        lambda=lambda_new
    
        phi_new=diag(diag(1/n*(t(Y)%*%Y-2*t(Y)%*%Ex%*%t(lambda)+lambda%*%Extx%*%t(lambda))))
        phi=phi_new

        Sigma_y=lambda%*%t(lambda)+phi+epsilon * diag(p) 

        loss<-c(loss,(-n*p/2)*log(2*pi)-(n/2)*log(abs(det(Sigma_y)))-0.5*sum(diag(solve(Sigma_y)%*%t(Y)%*%Y)))
        new=lambda%*%t(lambda)
        diff=norm(new-original,type="F")
        if(t>max_iter){
          break
        }
        }
    #print(paste("Times of iteration=",i))
    return(list(lambda=lambda,phi=phi,loss=loss))
    }

Share on

X (formerly Twitter) Facebook LinkedIn

Lingjun Gao

Factor Analysis Model

EM algorithm to solve the $\Lambda$ and $\Phi$

E-step

M-step

Share on