all AI news
Continuing task broken into episodes
I want to train an RL agent on a continuing task (there is no start and end), but I can only simulate a fixed amount of steps. Therefore, I need to train the agent simulating several pieces of tragectories.
Now, in the common episodic task, I would learn the value function using the target y_t = r_t + gamma * V(s_t+1) and, for the last step of the episode, y_T = r_T.
However, in my case, there is no "last …!-->