April 15, 2024, 11:15 p.m. | Synced

Synced syncedreview.com

Researchers from OPPO AI Center have introduced a solution. They present four optimization techniques and introduce a novel mobile inference engine dubbed Transformer-Lite. This engine outperforms CPU-based FastLLM and GPU-based MLC-LLM, achieving a remarkable over 10x acceleration for prefill speed and 2~3x for decoding speed.


The post OPPO AI’s Transformer-Lite Delivers 10x+ Prefill and 2~3x Decoding Boost on Mobile Phone GPUs first appeared on Synced.

ai artificial intelligence boost center cpu decoding deep-neural-networks gpu gpus inference large language model llm machine learning machine learning & data science ml mlc mobile novel oppo optimization phone research researchers solution speed technology transformer transformers

More from syncedreview.com / Synced

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US