April 20, 2024, 5:54 p.m. | /u/Ghulam_Nabi

Machine Learning www.reddit.com

https://preview.redd.it/q93ittqnaovc1.png?width=777&format=png&auto=webp&s=aec5c1767690bd3269ba9e601623e4d85378fd37

This is the image which i captured from the Pdf file, one thing the pdf text is selectable like I can select all the text written in heading and in tables as well. I have tried the couple of technique:



1: Used the MultiModalVectorStoreIndex using the llama-index-multi-modal-llms-openai (GPT4-API) by first converting the PDF into the Images using the OCR and then retrived the tables from the PDF but one thing I need to define the number of pages from …

api file gpt4 image images index llama llms machinelearning modal multi-modal ocr openai pdf tables text

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne