LQR Forward pass

Notation

In this part we consider a normal distribution over x{\bold x} and u{\bold u}:

(xu)  N(μ,Σ) \\begin{pmatrix}{\\bold x}\\\\{\\bold u}\\end{pmatrix}\\ \\sim\\ {\\mathscr N}({\\bold \\mu}, {\\bold \\Sigma})

where

μ=(μxμu) {\\bold \\mu}=\\begin{pmatrix}{\\bold \\mu}\_{\\bold x}\\\\{\\bold \\mu}\_{\\bold u}\\end{pmatrix}
Σ=(Σx,xΣx,uΣu,xΣu,u) {\\bold \\Sigma}=\\begin{pmatrix}{\\bold \\Sigma}\_{{\\bold x},{\\bold x}}&{\\bold \\Sigma}\_{{\\bold x},{\\bold u}}\\\\{\\bold \\Sigma}\_{{\\bold u},{\\bold x}}&{\\bold \\Sigma}\_{{\\bold u},{\\bold u}}\\end{pmatrix}

Code

The forward recursion function takes as arguments

def forward(self, traj_distr, traj_info):

We get the number of timesteps and the dimensions of the states and the actions from the policy object:

T = traj_distr.T dimU = traj_distr.dU dimX = traj_distr.dX

We use slice syntax so that sigma[index_x, index_u] means Σx,u{\bold \Sigma}_{{\bold x},{\bold u}} etc.

index_x = slice(dimX) index_u = slice(dimX, dimX + dimU)

Get dynamics and covariances

Fm = traj_info.dynamics.Fm fv = traj_info.dynamics.fv dyn_covar = traj_info.dynamics.dyn_covar

We allocate space for μ{\bold \mu} and Σ{\bold \Sigma} and set the initial values for μ0,x{\bold \mu}_{0,{\bold x}} and Σ0,xx{\bold \Sigma}_{0,{\bold x}{\bold x}}:

sigma = np.zeros((T, dimX + dimU, dimX + dimU)) mu = np.zeros((T, dimX + dimU)) mu[0, index_x] = traj_info.x0mu sigma[0, index_x, index_x] = traj_info.x0sigma

We iterate over tt and compute:

μt,u=Ktμt,x+kt {\\bold \\mu}\_{t,{\\bold u}} = {\\bold K}\_t{\\bold \\mu}\_{t,{\\bold x}}+{\\bold k}\_t
Σt,x,u=Σt,x,xKtT {\\bold \\Sigma}\_{t,{\\bold x},{\\bold u}} = {\\bold \\Sigma}\_{t,{\\bold x},{\\bold x}}{\\bold K}\_t^T
Σt,u,x=KtΣt,x,x {\\bold \\Sigma}\_{t,{\\bold u},{\\bold x}} = {\\bold K}\_t{\\bold \\Sigma}\_{t,{\\bold x},{\\bold x}}
Σt,uu=KtΣt,x,xKtT+Σpol,t {\\bold \\Sigma}\_{t,{\\bold u}{\\bold u}} = {\\bold K}\_t{\\bold \\Sigma}\_{t,{\\bold x},{\\bold x}}{\\bold K}\_t^T+{\\bold \\Sigma\_{pol,t}}
for t in range(T): mu[t, index_u] = traj_distr.K[t, :, :].dot(mu[t, index_x]) + \ traj_distr.k[t, :] sigma[t, index_x, index_u] = \ sigma[t, index_x, index_x].dot(traj_distr.K[t, :, :].T) sigma[t, index_u, index_x] = \ traj_distr.K[t, :, :].dot(sigma[t, index_x, index_x]) sigma[t, index_u, index_u] = \ traj_distr.K[t, :, :].dot(sigma[t, index_x, index_x]).dot( traj_distr.K[t, :, :].T ) + traj_distr.pol_covar[t, :, :]

for t<Tt<T we compute:

μt+1,x=Ftμt+ft {\\bold \\mu}\_{t+1,{\\bold x}}={\\bold F}\_t{\\bold \\mu}\_t+{\\bold f}\_t
Σt+1,x,x=FtΣtFtT+Σdyn {\\bold \\Sigma}\_{t+1,{\\bold x},{\\bold x}}={\\bold F}\_t{\\bold \\Sigma}\_t{\\bold F}\_t^T+{\\bold \\Sigma}\_{dyn}
if t < T - 1: mu[t+1, index_x] = Fm[t, :, :].dot(mu[t, :]) + fv[t, :] sigma[t+1, index_x, index_x] = \ Fm[t, :, :].dot(sigma[t, :, :]).dot(Fm[t, :, :].T) + \ dyn_covar[t, :, :]

After that the loop ends and we return mu and sigma:

return mu, sigma