all AI news
[R] How does GPT2/GPT3 differ from transformer in [Vaswani et al]?
Jan. 28, 2022, 11:28 p.m. | /u/white0clouds
Machine Learning www.reddit.com
I'm quite new to NLP/transformers, and work mostly in computer vision. Here's a basic question.
How does the transformer architecture in GPT2/GPT3 differ from the original transformer from "attention is all you need" (Vaswani et al, NeurIPS'17)? What additional techniques are incorporated there?
submitted by /u/white0clouds[link] [comments]
More from www.reddit.com / Machine Learning
Jobs in AI, ML, Big Data
Senior ML Researcher - 3D Geometry Processing | 3D Shape Generation | 3D Mesh Data
@ Promaton | Europe
Senior AI Engineer, EdTech (Remote)
@ Lightci | Toronto, Ontario
Data Scientist for Salesforce Applications
@ ManTech | 781G - Customer Site,San Antonio,TX
AI Research Scientist
@ Gridmatic | Cupertino, CA
Data Engineer
@ Global Atlantic Financial Group | Boston, Massachusetts, United States
Machine Learning Engineer - Conversation AI
@ DoorDash | Sunnyvale, CA; San Francisco, CA; Seattle, WA; Los Angeles, CA