In this part we consider a normal distribution over
where
The forward recursion function takes as arguments
traj_distr: The policy object containing the linear gaussian policytraj_info: This object contains the dynamicsdef forward(self, traj_distr, traj_info):
We get the number of timesteps and the dimensions of the states and the actions from the policy object:
T = traj_distr.T
dimU = traj_distr.dU
dimX = traj_distr.dX
We use slice syntax so that sigma[index_x, index_u] means
index_x = slice(dimX)
index_u = slice(dimX, dimX + dimU)
Get dynamics and covariances
Fm = traj_info.dynamics.Fm
fv = traj_info.dynamics.fv
dyn_covar = traj_info.dynamics.dyn_covar
We allocate space for
sigma = np.zeros((T, dimX + dimU, dimX + dimU))
mu = np.zeros((T, dimX + dimU))
mu[0, index_x] = traj_info.x0mu
sigma[0, index_x, index_x] = traj_info.x0sigma
We iterate over
for t in range(T):
mu[t, index_u] = traj_distr.K[t, :, :].dot(mu[t, index_x]) + \
traj_distr.k[t, :]
sigma[t, index_x, index_u] = \
sigma[t, index_x, index_x].dot(traj_distr.K[t, :, :].T)
sigma[t, index_u, index_x] = \
traj_distr.K[t, :, :].dot(sigma[t, index_x, index_x])
sigma[t, index_u, index_u] = \
traj_distr.K[t, :, :].dot(sigma[t, index_x, index_x]).dot(
traj_distr.K[t, :, :].T
) + traj_distr.pol_covar[t, :, :]
for
if t < T - 1:
mu[t+1, index_x] = Fm[t, :, :].dot(mu[t, :]) + fv[t, :]
sigma[t+1, index_x, index_x] = \
Fm[t, :, :].dot(sigma[t, :, :]).dot(Fm[t, :, :].T) + \
dyn_covar[t, :, :]
After that the loop ends and we return mu and sigma:
return mu, sigma