July 3, 2023, 3:31 p.m. | /u/Outlandish_MurMan

Computer Vision www.reddit.com

For those struggling to use native Pix2Struct checkpoints with the google cloud dependencies, I converted the Pix2Struct model (RefExp finetuned one) to HuggingFace format. This might make your life a bit easier! You can find the converted model here: ([https://huggingface.co/gitlost-murali/pix2struct-refexp-large](https://huggingface.co/gitlost-murali/pix2struct-refexp-large))

**Background**: Pix2Struct is a pretrained image-to-text model for parsing webpages, screenshots, etc. Though the Google team converted all other Pix2Struct model checkpoints, they did not upload the ones finetuned on the RefExp dataset to huggingface.

Even the conversion script had …

cloud computervision conversion dataset etc google google cloud huggingface image image-to-text installation parsing talks team text

Software Engineer for AI Training Data (School Specific)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Python)

@ G2i Inc | Remote

Software Engineer for AI Training Data (Tier 2)

@ G2i Inc | Remote

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US