Web: https://www.reddit.com/r/MachineLearning/comments/sf3wjo/r_how_does_gpt2gpt3_differ_from_transformer_in/

Jan. 28, 2022, 11:28 p.m. | /u/white0clouds

Machine Learning reddit.com

I'm quite new to NLP/transformers, and work mostly in computer vision. Here's a basic question.

How does the transformer architecture in GPT2/GPT3 differ from the original transformer from "attention is all you need" (Vaswani et al, NeurIPS'17)? What additional techniques are incorporated there?

submitted by /u/white0clouds
[link] [comments]

gpt3 machinelearning transformer

Research Scientist, 3D Reconstruction

@ Yembo | Remote, US

Clinical Assistant or Associate Professor of Management Science and Systems

@ University at Buffalo | Buffalo, NY

Data Analyst

@ Colorado Springs Police Department | Colorado Springs, CO

Predictive Ecology Postdoctoral Fellow

@ Lawrence Berkeley National Lab | Berkeley, CA

Data Analyst, Patagonia Action Works

@ Patagonia | Remote

Data & Insights Strategy & Innovation General Manager

@ Chevron Services Company, a division of Chevron U.S.A Inc. | Houston, TX