Similar Data Points Identification with LLM: A Human-in-the-loop Strategy Using Summarization and Hidden State Insights | allainews.com

April 9, 2024, 4:50 a.m. | Xianlong Zeng, Fanghao Song, Ang Liu

cs.CL updates on arXiv.org arxiv.org

arXiv:2404.04281v1 Announce Type: new
Abstract: This study introduces a simple yet effective method for identifying similar data points across non-free text domains, such as tabular and image data, using Large Language Models (LLMs). Our two-step approach involves data point summarization and hidden state extraction. Initially, data is condensed via summarization using an LLM, reducing complexity and highlighting essential information in sentences. Subsequently, the summarization sentences are fed through another LLM to extract hidden states, serving as compact, feature-rich representations. This …

abstract arxiv cs.ai cs.cl data domains extraction free hidden human identification image image data insights language language models large language large language models llm llms loop simple state strategy study summarization tabular text type

More from arxiv.org / cs.CL updates on arXiv.org

Conversational Speech Recognition by Learning Audio-textual Cross-modal Contextual Representation 8 hours ago | arxiv.org

abstract arxiv asr audio +22

Holmes: Towards Distributed Training Across Clusters with Heterogeneous NIC Environment 8 hours ago | arxiv.org

abstract accuracy arxiv continuous +17

MLLM-Bench: Evaluating Multimodal LLMs with Per-sample Criteria 8 hours ago | arxiv.org

arxiv cs.cl llms mllm +5

The Ups and Downs of Large Language Model Inference with Vocabulary Trimming by Language Heuristics 8 hours ago | arxiv.org

abstract arxiv challenges computational +18

HeLM: Highlighted Evidence augmented Language Model for Enhanced Table-to-Text Generation 8 hours ago | arxiv.org

abstract apis arxiv costs +22

Prompt have evil twins 8 hours ago | arxiv.org

abstract arxiv behavior call +9

Reconstructing Materials Tetrahedron: Challenges in Materials Information Extraction 8 hours ago | arxiv.org

abstract arxiv challenges cond-mat.mtrl-sci +16

SSHR: Leveraging Self-supervised Hierarchical Representations for Multilingual Automatic Speech Recognition 8 hours ago | arxiv.org

abstract arxiv asr attention +19

An Interactive Framework for Profiling News Media Sources 8 hours ago | arxiv.org

abstract arxiv cs.cl fake +10

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

Research Scientist (Computer Science)

@ Nanyang Technological University | NTU Main Campus, Singapore

View on ai-jobs.net

Intern - Sales Data Management

@ Deliveroo | Dubai, UAE (Main Office)

View on ai-jobs.net