April 15, 2024, 11:15 p.m. | Synced

Synced syncedreview.com

Researchers from OPPO AI Center have introduced a solution. They present four optimization techniques and introduce a novel mobile inference engine dubbed Transformer-Lite. This engine outperforms CPU-based FastLLM and GPU-based MLC-LLM, achieving a remarkable over 10x acceleration for prefill speed and 2~3x for decoding speed.


The post OPPO AI’s Transformer-Lite Delivers 10x+ Prefill and 2~3x Decoding Boost on Mobile Phone GPUs first appeared on Synced.

ai artificial intelligence boost center cpu decoding deep-neural-networks gpu gpus inference large language model llm machine learning machine learning & data science ml mlc mobile novel oppo optimization phone research researchers solution speed technology transformer transformers

More from syncedreview.com / Synced

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Research Scientist (Computer Science)

@ Nanyang Technological University | NTU Main Campus, Singapore

Intern - Sales Data Management

@ Deliveroo | Dubai, UAE (Main Office)