all AI news
[P] Classification finetuning experiments on small GPT-2 sized LLMs
April 27, 2024, 12:31 p.m. | /u/seraschka
Machine Learning www.reddit.com
|Model|Weights|Trainable token|Trainable layers|Context length|CPU/GPU|Training time|Training acc|Validation acc|Test acc|
|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|
|1|gpt2-small (124M)|pretrained|last|last\_block|longest train ex. (120)|V100|0.39 min|96.63%|97.99%|
|2|gpt2-small (124M)|pretrained|first|last\_block|longest train ex. (120)|V100|0.37 min|78.46%|80.54%|
|3|gpt2-small (124M)|pretrained|last|last\_layer|longest train ex. (120)|V100|0.33 min|78.65%|87.25%|
|4|gpt2-small (124M)|pretrained|last|all|longest train ex. (120)|V100|0.94 min|99.62%|96.64%|
|5|gpt2-medium (355M)|pretrained|last|last\_block|longest train ex. (120)|V100|0.91 min|87.50%|51.01%|
|6|gpt2-large (774M)|pretrained|last|last\_block|longest train ex. (120)|V100|1.91 min|99.52%|98.66%|
|7|gpt2-small (124M)|random|last|all|longest train ex. (120)|V100|0.93 min|100%|97.32%|
|8|gpt2-small (124M)|pretrained|last|last\_block|context length (1024)|V100|3.24 min|83.08%|87.92%|
1. Training the Last vs. …
acc classification context cpu finetuning found gpt gpt-2 gpu llms machinelearning min ran small test token train training v100 validation
More from www.reddit.com / Machine Learning
[R] AlphaMath Almost Zero: process Supervision without process
1 day, 1 hour ago |
www.reddit.com
[D] ECCV 2024 Review Discussion
1 day, 2 hours ago |
www.reddit.com
[D] Is it a good idea for a 3rd year PhD student to start a …
1 day, 4 hours ago |
www.reddit.com
[D] Use VQ-VAEs for SSL?
1 day, 5 hours ago |
www.reddit.com
Jobs in AI, ML, Big Data
Artificial Intelligence – Bioinformatic Expert
@ University of Texas Medical Branch | Galveston, TX
Lead Developer (AI)
@ Cere Network | San Francisco, US
Research Engineer
@ Allora Labs | Remote
Ecosystem Manager
@ Allora Labs | Remote
Founding AI Engineer, Agents
@ Occam AI | New York
AI Engineer Intern, Agents
@ Occam AI | US