June 22, 2022, 3:58 p.m. | /u/LeanderKu

Machine Learning www.reddit.com

I am working on a pytorch project and I have a custom computation that I am so far unable to express as a combination pre-defined pytorch functions (because it's essentially some loops around conv2d calls where I juggle some indices in a 5-d tensor). So currently I use python-loops with some smart padding but that's not the fastest. The only way to speed this up would be, i think, to implement custom cuda kernels. While the computation is not that …

cuda machinelearning pytorch

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Alternant Data Engineering

@ Aspire Software | Angers, FR

Senior Software Engineer, Generative AI

@ Google | Dublin, Ireland