[P] Classification finetuning experiments on small GPT-2 sized LLMs | allainews.com

April 27, 2024, 12:31 p.m. | /u/seraschka

Machine Learning www.reddit.com

I ran a few classification finetuning experiments on relatively "small" experiments that I found interesting and wanted to share:

|Model|Weights|Trainable token|Trainable layers|Context length|CPU/GPU|Training time|Training acc|Validation acc|Test acc|
|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|
|1|gpt2-small (124M)|pretrained|last|last\_block|longest train ex. (120)|V100|0.39 min|96.63%|97.99%|
|2|gpt2-small (124M)|pretrained|first|last\_block|longest train ex. (120)|V100|0.37 min|78.46%|80.54%|
|3|gpt2-small (124M)|pretrained|last|last\_layer|longest train ex. (120)|V100|0.33 min|78.65%|87.25%|
|4|gpt2-small (124M)|pretrained|last|all|longest train ex. (120)|V100|0.94 min|99.62%|96.64%|
|5|gpt2-medium (355M)|pretrained|last|last\_block|longest train ex. (120)|V100|0.91 min|87.50%|51.01%|
|6|gpt2-large (774M)|pretrained|last|last\_block|longest train ex. (120)|V100|1.91 min|99.52%|98.66%|
|7|gpt2-small (124M)|random|last|all|longest train ex. (120)|V100|0.93 min|100%|97.32%|
|8|gpt2-small (124M)|pretrained|last|last\_block|context length (1024)|V100|3.24 min|83.08%|87.92%|

1. Training the Last vs. …

acc classification context cpu finetuning found gpt gpt-2 gpu llms machinelearning min ran small test token train training v100 validation

More from www.reddit.com / Machine Learning

[D] What on earth is "discretization" step in Mamba? 6 hours ago | www.reddit.com

article core earth form +11

[R] Better & Faster Large Language Models via Multi-token Prediction 7 hours ago | www.reddit.com

abstract efficiency future gpt +17

[D] How to use RAG benchmarks in practice 11 hours ago | www.reddit.com

context datasets however machinelearning +5

[D] ECCV-2024 reviews are out 20 hours ago | www.reddit.com

eccv machinelearning reviews

[D] ICLR Outstanding Paper Awards. Congratulations! 22 hours ago | www.reddit.com

abstract feature identify images +12

[D] Where does the term "feature" come from? 23 hours ago | www.reddit.com

call engineering feature features +8

[D] Any encoder only model having bigger max token than 512 (BERT, Roberta, etc)? 1 day, 5 hours ago | www.reddit.com

advance bert bigger class +8

[R] AlphaMath Almost Zero: process Supervision without process 1 day, 6 hours ago | www.reddit.com

abstract code errors however +15

[D] ECCV 2024 Review Discussion 1 day, 7 hours ago | www.reddit.com

center conferences eccv machinelearning +5

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net

Research Engineer

@ Allora Labs | Remote

View on ai-jobs.net

Ecosystem Manager

@ Allora Labs | Remote

View on ai-jobs.net

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net