Web: https://www.reddit.com/r/LanguageTechnology/comments/s7r2pj/seeking_string_readability_metric/

Jan. 19, 2022, 2:17 p.m. | /u/polandtown

Natural Language Processing reddit.com

Hello fellow enthusiasts,

I have a corpus of 150k documents, and their respective OCR outputs.

I'd like to assign a Readability score to each document, is there a metric out there for something like that?

In retrospect to my OCR extraction, which took almost a month of runtime to run, I could have extracted an OCR-accuracy score along with my strings. I'd like to find an alternative solution instead of re-running it. Knowledge for next time, anyways...

I'm open to …

languagetechnology string

Research Scientist, 3D Reconstruction

@ Yembo | Remote, US

Clinical Assistant or Associate Professor of Management Science and Systems

@ University at Buffalo | Buffalo, NY

Data Analyst

@ Colorado Springs Police Department | Colorado Springs, CO

Predictive Ecology Postdoctoral Fellow

@ Lawrence Berkeley National Lab | Berkeley, CA

Data Analyst, Patagonia Action Works

@ Patagonia | Remote

Data & Insights Strategy & Innovation General Manager

@ Chevron Services Company, a division of Chevron U.S.A Inc. | Houston, TX