Sept. 18, 2023, 4:02 p.m. | Yuichi Inoue

Towards Data Science - Medium

Developing LLM-integrated GIT vision language models.

Summary of this article:

  • Explaining GIT, a Vision Language Model developed by Microsoft.
  • Replacing GIT’s language model with large language models (LLMs) using PyTorch and Hugging Face’s Transformers.
  • Introducing how to fine-tune GIT-LLM models using LoRA.
  • Testing and discussing the developed models.
  • Investigating if “Image Embeddings” embedded by the Image Encoder of GIT indicate specific characters in the same space as “Text Embedding”.

Large language models (LLM) are showing their value more and more. …

artificial intelligence data science deep learning large language models vision-and-language

