all AI news
Combining reward functions with different scales and meaning
What are the best practices to use a reward function that is a combination of several types of rewards, that can have very different scales and meanings?
Take the example of a robot serving customers at a restaurant. I want it to maximize the number of dishes it serves during the day, but I also want to penalize it for making customers wait more than a certain amount of time. Note that without the penalization the robot might decide to …!-->