Aug. 23, 2022, 7:13 a.m. | /u/Melodic_Stomach_2704

Machine Learning www.reddit.com

Recently, I've been researching extracting tables from image documents. First I tried with pdfs, however, the data extraction libraries like camelot are inconsistent. I found a deep learning model called [CascadeTabNet](https://github.com/DevashishPrasad/CascadeTabNet). The detection results are okay but cell recognition is poor. I even found [Multi-Type-TD-TSR](https://github.com/Psarpei/Multi-Type-TD-TSR) for table extraction. It uses image processing techniques to find the grids. It performs well on structured and bordered tables. However, it messes up if the cell is not properly aligned. Even if extraction is …

extraction machinelearning table extraction

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne