Dynamics fitting
Notation
In this part we consider a normal distribution over
y=⎝⎛xux′⎠⎞
where x′ is the state that results from taking the action u in state x. We divide the parameters of the normal distribution y ∼ N(μ,Σ) as follows:
μ=(μxuμx′)
Σ=(Σxu,xuΣx′,xuΣxu,x′Σx′,x′)
Code
The fitting function takes as arguments:
X: States of the trajectory samples from the previous policy
U: Actions of the trajectory samples from the previous policy
def fit(self, X, U):
We get the number of samples, the number of timesteps and the dimensions of the states and the actions from the policy object:
N, T, dimX = X.shape
dimU = U.shape[2]
We use slice syntax so that sigma[index_xu, index_x] means Σxu,x′ etc.
index_xu = slice(dimX + dimU)
index_x = slice(dimX + dimU, dimX + dimU + dimX)
We obtain the regularization term for the covariance:
sig_reg = np.zeros((dimX + dimU + dimX, dimX + dimU + dimX))
sig_reg[index_xu, index_xu] = self._hyperparams['regularization']
We compute the weight vector and matrix, used to compute sample mean and sample covariance:
dwts = (1.0 / N) * np.ones(N)
D = np.diag((1.0 / (N - 1)) * np.ones(N))
We allocate space for F, f and Σdyn:
self.Fm = np.zeros([T, dimX, dimX + dimU])
self.fv = np.zeros([T, dimX])
self.dyn_covar = np.zeros([T, dimX, dimX])
We iterate over t and assemble
ytn=⎝⎛xtnutnxt+1n⎠⎞
where the superscript n denotes the number of the sample.
for t in range(T - 1):
Ys = np.c_[X[:, t, :], U[:, t, :], X[:, t + 1, :]]
We obtain the hyperparameters of the normal-inverse-Wishart prior NIW(μ0,Φ,m,n0)
mu0, Phi, mm, n0 = self.prior.eval(dimX, dimU, Ys)
We compute the empirical mean and empirical covariance
μemp,t=N1n=1∑Nytn
Σemp,t=N−11n=1∑N(ytn−μemp,t)(ytn−μemp,t)T
empmu = np.sum((Ys.T * dwts).T, axis=0)
diff = Ys - empmu
empsig = diff.T.dot(D).dot(diff)
empsig = 0.5 * (empsig + empsig.T)
We use the empirical mean as our estimated mean and use the normal-inverse-Wishart posterior to get the estimate for the covariance:
μt=μemp,t
Σt=N+n0Φ+(N−1)Σemp,t+N+mNm(μemp,t−μ0)(μemp,t−μ0)T
mu = empmu
sigma = (Phi + (N - 1) * empsig + (N * mm) / (N + mm) *
np.outer(empmu - mu0, empmu - mu0)) / (N + n0)
sigma = 0.5 * (sigma + sigma.T)
sigma += sig_reg
Σt can contain singularities so that its inverse contains infinities. To prevent that we add a small regularization term.
Now we condition the gaussian on x and u:
F=Σx′,xuΣxu,xu−1=(Σxu,xu−1Σxu,x′)T
f=μx−Fμxu
Σdyn=Σx′,x′−FΣxu,xuFT
Fm = np.linalg.solve(sigma[index_xu, index_xu],
sigma[index_xu, index_x]).T
fv = mu[index_x] - Fm.dot(mu[index_xu])
dyn_covar = sigma[index_x, index_x] - Fm.dot(sigma[index_xu, index_xu]).dot(Fm.T)
dyn_covar = 0.5 * (dyn_covar + dyn_covar.T)
We store F, f and Σdyn:
self.Fm[t, :, :] = Fm
self.fv[t, :] = fv
self.dyn_covar[t, :, :] = dyn_covar
After that, the loop ends and we return F, f and Σdyn:
return self.Fm, self.fv, self.dyn_covar