all AI news
OPPO AI’s Transformer-Lite Delivers 10x+ Prefill and 2~3x Decoding Boost on Mobile Phone GPUs
Synced syncedreview.com
Researchers from OPPO AI Center have introduced a solution. They present four optimization techniques and introduce a novel mobile inference engine dubbed Transformer-Lite. This engine outperforms CPU-based FastLLM and GPU-based MLC-LLM, achieving a remarkable over 10x acceleration for prefill speed and 2~3x for decoding speed.
The post OPPO AI’s Transformer-Lite Delivers 10x+ Prefill and 2~3x Decoding Boost on Mobile Phone GPUs first appeared on Synced.
ai artificial intelligence boost center cpu decoding deep-neural-networks gpu gpus inference large language model llm machine learning machine learning & data science ml mlc mobile novel oppo optimization phone research researchers solution speed technology transformer transformers