Sept. 13, 2023, 3:07 a.m. | /u/ginger_turmeric

Machine Learning

Currently I'm training medium (1B-3B) sized audio models. I have several different architectures in mind. Obviously I don't want to train the full-sized models and then compare them, thats a waste of money. So I'm thinking of training smaller versions (\~100M) and then comparing those instead.

My question is there some sort of best practice for this? Some smaller multiple of your full model size where it is best to compare? Thanks.

