March 22, 2024, 2:49 p.m. | /u/sgpfc

Machine Learning www.reddit.com

Dataset: [https://github.com/declare-lab/LLM-PuzzleTest/tree/master/PuzzleVQA](https://github.com/declare-lab/LLM-PuzzleTest/tree/master/PuzzleVQA)

Paper: [https://arxiv.org/abs/2403.13315](https://arxiv.org/abs/2403.13315)

Large multimodal models extend the impressive capabilities of large language models by integrating multimodal understanding abilities. However, it is not clear how they can emulate the general intelligence and reasoning ability of humans. As recognizing patterns and abstracting concepts are key to general intelligence, we introduce PuzzleVQA, a collection of puzzles based on abstract patterns. With this dataset, we evaluate large multimodal models with abstract patterns based on fundamental concepts, including colors, numbers, sizes, and shapes. …

abstract capabilities clear collection concepts dataset general however humans intelligence key language language models large language large language models large multimodal models machinelearning multimodal multimodal models patterns reasoning understanding

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Principal Data Engineering Manager

@ Microsoft | Redmond, Washington, United States

Machine Learning Engineer

@ Apple | San Diego, California, United States