April 8, 2022, 1:11 a.m. | Nikita Balagansky, Daniil Gavrilov

cs.LG updates on arXiv.org arxiv.org

Currently, pre-trained models can be considered the default choice for a wide
range of NLP tasks. Despite their SoTA results, there is practical evidence
that these models may require a different number of computing layers for
different input sequences, since evaluating all layers leads to overconfidence
on wrong predictions (namely overthinking). This problem can potentially be
solved by implementing adaptive computation time approaches, which were first
designed to improve inference speed. Recently proposed PonderNet may be a
promising solution for …

arxiv teaching

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Research Associate (Data Science/Information Engineering/Applied Mathematics/Information Technology)

@ Nanyang Technological University | NTU Main Campus, Singapore

Associate Director of Data Science and Analytics

@ Penn State University | Penn State University Park

Student Worker- Data Scientist

@ TransUnion | Israel - Tel Aviv

Vice President - Customer Segment Analytics Data Science Lead

@ JPMorgan Chase & Co. | Bengaluru, Karnataka, India

Middle/Senior Data Engineer

@ Devexperts | Sofia, Bulgaria