all AI news
Average reward algorithms when reward distribution changes
Assume my RL agent has to manage the load balancing of a series of servers. This problem fits very well in the average reward formulation, since we do not have episodes, but only an infinite-length task where we want to optimize the average throughput and minimize the average delay.
Now, assume that the traffic on servers is high during the day and low during the night. Therefore, the average reward that the agent can achieve will depend on the time …!-->