all AI news
[D] Transformers: Polynomial gated FFN is better than SwiGLU and reduces the number of parameters while improving model's performance
Dec. 29, 2023, 11:12 a.m. | /u/alagagbar
Machine Learning www.reddit.com
In my language modeling experiments I was using this PaLM-like SwiGLU FFN:
class FFNSwiGLU(nn.Module):
def __init__(self, d_model: int) -> None:
super().__init__()
self.fc1 = nn.Linear(d_model, d_model * 4, bias=False)
self.fc2 = nn.Linear(d_model * 2, d_model, bias=False)
def forward(self, x: torch.Tensor) -> torch.Tensor:
x1, x2 = self.fc1.forward(x).chunk(2, dim=-1)
x = F.silu(x1) * x2
x = …
bias false language linear machinelearning modeling palm tensor torch
More from www.reddit.com / Machine Learning
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Senior Data Engineer
@ Quantexa | Sydney, New South Wales, Australia
Staff Analytics Engineer
@ Warner Bros. Discovery | NY New York 230 Park Avenue South