IEPile: Unearthing Large-Scale Schema-Based Information Extraction Corpus | allainews.com

Feb. 23, 2024, 5:43 a.m. | Honghao Gui, Hongbin Ye, Lin Yuan, Ningyu Zhang, Mengshu Sun, Lei Liang, Huajun Chen

cs.LG updates on arXiv.org arxiv.org

arXiv:2402.14710v1 Announce Type: cross
Abstract: Large Language Models (LLMs) demonstrate remarkable potential across various domains; however, they exhibit a significant performance gap in Information Extraction (IE). Note that high-quality instruction data is the vital key for enhancing the specific capabilities of LLMs, while current IE datasets tend to be small in scale, fragmented, and lack standardized schema. To this end, we introduce IEPile, a comprehensive bilingual (English and Chinese) IE instruction corpus, which contains approximately 0.32B tokens. We construct IEPile …

abstract arxiv capabilities cs.ai cs.cl cs.db cs.ir cs.lg current data datasets domains extraction gap information information extraction key language language models large language large language models llms performance quality scale schema small type vital

More from arxiv.org / cs.LG updates on arXiv.org

Differentially private Bayesian tests 1 day, 13 hours ago | arxiv.org

abstract arxiv bayesian cs.cr +20

What Are We Optimizing For? A Human-centric Evaluation of Deep Learning-based Movie Recommenders 1 day, 13 hours ago | arxiv.org

abstract accuracy arxiv benchmark +21

Attention-Enhanced Reservoir Computing 1 day, 13 hours ago | arxiv.org

abstract accuracy arxiv attention +11

Learning finitely correlated states: stability of the spectral reconstruction 1 day, 13 hours ago | arxiv.org

abstract arxiv cs.et cs.lg +10

Transfer Learning in Robotics: An Upcoming Breakthrough? A Review of Promises and Challenges 1 day, 13 hours ago | arxiv.org

abstract agents arxiv challenges +17

The Perception-Robustness Tradeoff in Deterministic Image Restoration 1 day, 13 hours ago | arxiv.org

abstract arxiv behavior consistent +13

Conformal Decision Theory: Safe Autonomous Decisions from Imperfect Predictions 1 day, 13 hours ago | arxiv.org

abstract algorithms arxiv autonomous +20

Fin-Fact: A Benchmark Dataset for Multimodal Financial Fact Checking and Explanation Generation 1 day, 13 hours ago | arxiv.org

arxiv benchmark cs.ai cs.ce +6

TExplain: Explaining Learned Visual Features via Pre-trained (Frozen) Language Models 1 day, 13 hours ago | arxiv.org

abstract arxiv capabilities challenge +16

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net

AI Engineer Intern, Agents

@ Occam AI | US

View on ai-jobs.net

AI Research Scientist

@ Vara | Berlin, Germany and Remote

View on ai-jobs.net

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net