June 10, 2024, 4:41 a.m. | Chengang Hu, Xiao Liu, Yansong Feng

cs.CL updates on arXiv.org arxiv.org

arXiv:2406.04669v1 Announce Type: new
Abstract: Most of the existing compositional generalization datasets are synthetically-generated, resulting in a lack of natural language variation. While there have been recent attempts to introduce non-synthetic datasets for compositional generalization, they suffer from either limited data scale or a lack of diversity in the forms of combinations. To better investigate compositional generalization with more linguistic phenomena and compositional diversity, we propose the DIsh NamE Recognition (DiNeR) task and create a large realistic Chinese dataset. Given …

arxiv cs.cl dataset type

