June 27, 2024, 11:38 p.m. | /u/coolcloud

Computer Vision www.reddit.com

Hey all,

We've spent a lot of time building new techniques for parsing and searching PDFs. They've lead to a significant improvement in our RAG search and I wanted to share what we've learned.

**Some examples:**

Table - SEC Docs are notoriously hard for PDF -> tables. We tried the top results on google & some opensource thins not a single one succeeded on this table.

Couple examples of who we looked at:

* ilovepdf
* Adobe
* Gonitro
* …

analysis building computervision examples hey improvement parsing pdf pdfs rag search searching sec table tables

Sr. Data Analyst (Revenue Assurance)

@ Rogers Communications | Toronto, ON, CA

Senior Data Scientist

@ Similarweb | Tel Aviv

Technical Growth / Engineering Manager. 1-2 years experience

@ Growth Kitchen | London, England, United Kingdom

Consumer Marketing Retention Officer/ Sr. Officer

@ Umniah | Amman, Amman Governorate, Jordan

SFE and BI Business Partner

@ Merck Group | Bonifacio Global City, Metro Manila, PH, 1630

Software Engineer - Machine Learning Pipelines

@ RWE | Bellevue, WA, US, WA 98004