all AI news
Bilinear value networks. (arXiv:2204.13695v1 [cs.AI])
April 29, 2022, 1:11 a.m. | Zhang-Wei Hong, Ge Yang, Pulkit Agrawal
cs.LG updates on arXiv.org arxiv.org
The dominant framework for off-policy multi-goal reinforcement learning
involves estimating goal conditioned Q-value function. When learning to achieve
multiple goals, data efficiency is intimately connected with the generalization
of the Q-function to new goals. The de-facto paradigm is to approximate Q(s, a,
g) using monolithic neural networks. To improve the generalization of the
Q-function, we propose a bilinear decomposition that represents the Q-value via
a low-rank approximation in the form of a dot product between two vector
fields. The first …
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
IT Commercial Data Analyst - ESO
@ National Grid | Warwick, GB, CV34 6DA
Stagiaire Data Analyst – Banque Privée - Juillet 2024
@ Rothschild & Co | Paris (Messine-29)
Operations Research Scientist I - Network Optimization Focus
@ CSX | Jacksonville, FL, United States
Machine Learning Operations Engineer
@ Intellectsoft | Baku, Baku, Azerbaijan - Remote
Data Analyst
@ Health Care Service Corporation | Richardson Texas HQ (1001 E. Lookout Drive)