"transformers can use meaningless filler tokens (e.g., '......') in place of a chain of thought" - Let's Think Dot by Dot [P] | allainews.com

April 28, 2024, 9:59 a.m. | /u/Agitated_Space_672

Machine Learning www.reddit.com

[https://arxiv.org/abs/2404.15758](https://arxiv.org/abs/2404.15758)

# From the abstract

We show that transformers can use meaningless filler tokens (e.g., '......') in place of a chain of thought to solve two hard algorithmic tasks they could not solve when responding without intermediate tokens. However, we find empirically that learning to use filler tokens is difficult and requires specific, dense supervision to converge

abstract chain of thought converge however intermediate machinelearning show solve supervision tasks thought tokens transformers

More from www.reddit.com / Machine Learning

[D] Please consider signing this letter to open source AlphaFold3 3 hours ago | www.reddit.com

acid alphafold bioinformatics capability +13

[P] SimpleGEMM: Fast and minimal tensor core matrix multiplication in CUDA 8 hours ago | www.reddit.com

architecture code core cuda +10

[P] I made a website that visualizes your codebase with LLMs 9 hours ago | www.reddit.com

codebase llms machinelearning website

[P] DARWIN - open-sourced Devin alternative 11 hours ago | www.reddit.com

access ai software ai software engineer alternative +16

[R] How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with … 15 hours ago | www.reddit.com

abstract machinelearning

[R] Curvature-Informed SGD via General Purpose Lie-Group Preconditioners 15 hours ago | www.reddit.com

abstract algorithm approximation criterion +15

[P] A look at the latest major open LLM releases: Mixtral, Llama 3, Phi-3, and … 17 hours ago | www.reddit.com

latest llama llama 3 llm +8

[D] How do unets achieve spatial consistency? 18 hours ago | www.reddit.com

convolution create denoising hair +8

[D] Impact of solar storm on QLORA + RLHF of Llama3 8B? 20 hours ago | www.reddit.com

article control current experience +13

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net

Research Engineer

@ Allora Labs | Remote

View on ai-jobs.net

Ecosystem Manager

@ Allora Labs | Remote

View on ai-jobs.net

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net