Web: https://www.reddit.com/r/LanguageTechnology/comments/rakwq0/okapi_bm25_with_using_hierarchically_clusterized/

Dec. 6, 2021, 11:30 p.m. | /u/pauloamed

Natural Language Processing reddit.com

Hey, all! Hope you are doing well!

Do you know any work which tries to do Okapi BM25 matching using hierarchically clusterized words?

Relabeling all tokens of a subtree to the same value would combine similar words into the same token_id. Lower subtrees imply in closer words This would be a query and document enrichment. And now, with robust word embeddings and clustering algorithms, this approach seems feasible.

Also this is a quite immediate idea so someone must have already done it. Do you know any work on this?

Cheersss …

languagetechnology

Statistics and Computer Science Specialist

@ Hawk-Research | Remote

Data Scientist, Credit/Fraud Strategy

@ Fora Financial | New York City

Postdoctoral Research Associate - Biomedical Natural Language Processing and Deep Learning

@ Oak Ridge National Laboratory - Oak Ridge, TN | Oak Ridge, TN, United States

Senior Machine Learning / Computer Vision Engineer

@ Glass Imaging | Los Altos, CA

Research Scientist in Biomedical Natural Language Processing and Deep Learning

@ Oak Ridge National Laboratory | Oak Ridge, TN

W3-Professorship for Intelligent Energy Management

@ Universität Bayreuth | Bayreuth, Germany