Web: https://www.reddit.com/r/MachineLearning/comments/xfup9f/r_rwkv4_scaling_rnn_to_7b_params_and_beyond_with/

Sept. 16, 2022, 3:40 p.m. | /u/bo_peng

Machine Learning reddit.com

Hi everyone :) I have finished training RWKV-4 1.5B on the Pile (330B tokens) and it's great at zero-shot comparing with GPT-Neo (same corpus).

https://preview.redd.it/adxndshw12o91.png?width=1336&format=png&auto=webp&s=fbc499549e5ebbb816b2e6b1ce1bcf4a59fb61aa

RWKV-4 is an attention-free RNN, thus faster and saves VRAM. It also supports a GPT-mode for parallelized training. Previous discussion: [https://www.reddit.com/r/MachineLearning/comments/vzr6ie/r\_rwkv3\_scaling\_rnn\_to\_15b\_and\_reach\_transformer/](https://www.reddit.com/r/MachineLearning/comments/vzr6ie/r_rwkv3_scaling_rnn_to_15b_and_reach_transformer/)

Inference / training / fine-tuning code: [https://github.com/BlinkDL/RWKV-LM](https://github.com/BlinkDL/RWKV-LM)

Model download: [https://huggingface.co/BlinkDL](https://huggingface.co/BlinkDL)

Training is fast and stable with BFloat16 DeepSpeed ZERO2. The 3B and 7B runs will finish in 20 and 50 days respectively. No loss …

gpt language machinelearning modeling performance rnn scaling

Postdoctoral Fellow: ML for autonomous materials discovery

@ Lawrence Berkeley National Lab | Berkeley, CA

Research Scientists

@ ODU Research Foundation | Norfolk, Virginia

Embedded Systems Engineer (Robotics)

@ Neo Cybernetica | Bedford, New Hampshire

2023 Luis J. Alvarez and Admiral Grace M. Hopper Postdoc Fellowship in Computing Sciences

@ Lawrence Berkeley National Lab | San Francisco, CA

Senior Manager Data Scientist

@ NAV | Remote, US

Senior AI Research Scientist

@ Earth Species Project | Remote anywhere

Research Fellow- Center for Security and Emerging Technology (Multiple Opportunities)

@ University of California Davis | Washington, DC

Staff Fellow - Data Scientist

@ U.S. FDA/Center for Devices and Radiological Health | Silver Spring, Maryland

Staff Fellow - Senior Data Engineer

@ U.S. FDA/Center for Devices and Radiological Health | Silver Spring, Maryland

Software Engineer, Machine Learning

@ Next Insurance | Atlanta

Big Data Engineer- E4076

@ Nisum | United States

[Job-8613] Data Engineer SR.

@ CI&T | Brazil