[R] The Manga Whisperer: Automatically Generating Transcriptions for Comics | allainews.com

Jan. 20, 2024, 2:44 p.m. | /u/ragavsachdeva

Machine Learning www.reddit.com

Paper: [http://arxiv.org/abs/2401.10224](http://arxiv.org/abs/2401.10224)

Github: [https://github.com/ragavsachdeva/magi](https://github.com/ragavsachdeva/magi)

Try it yourself: [https://huggingface.co/spaces/ragavsachdeva/the-manga-whisperer/](https://huggingface.co/spaces/ragavsachdeva/the-manga-whisperer/)

TLDR: Given a high resolution manga page as input, Magi (our model) can (i) detect panels, character, text blocks, (ii) cluster characters (without making any assumptions about the number of ground truth clusters), (iii) match text blocks to their speakers, (iv) perform OCR, (v) generate a transcript of who said and when (by sorting the panels and text boxes in the reading order). See the figure below for an example.Wanted to share …

assumptions characters cluster generate iii machinelearning magi making manga match ocr page panels sorting speakers text truth

More from www.reddit.com / Machine Learning

[D] How would you diagnose these spikes in the training loss? 6 hours ago | www.reddit.com

loss machinelearning training training loss

"transformers can use meaningless filler tokens (e.g., '......') in place of a chain of thought" … 8 hours ago | www.reddit.com

abstract chain of thought converge however +9

[D] What are the most common and significant challenges moving your LLM (application/system) to production? 10 hours ago | www.reddit.com

application building challenges companies +10

[P] Natural language to MongoDB query conversion. 12 hours ago | www.reddit.com

machinelearning

[D] Role of the Identity Matrix in PointNet's Input Transformation Block 14 hours ago | www.reddit.com

block cloud code context +7

[P] NLLB-200 Distill 350M for en-ko 17 hours ago | www.reddit.com

cpu english good gpu +9

[D] Real talk about RAG 1 day ago | www.reddit.com

data deal documents machinelearning +5

[P] Classification finetuning experiments on small GPT-2 sized LLMs 1 day, 6 hours ago | www.reddit.com

acc classification context cpu +16

[D] Llama-3 based OpenBioLLM-70B & 8B: Outperforms GPT-4, Gemini, Meditron-70B, Med-PaLM-1 & Med-PaLM-2 in Medical-domain 1 day, 6 hours ago | www.reddit.com

70b art biomedical domain +16

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

RL Analytics - Content, Data Science Manager

@ Meta | Burlingame, CA

View on ai-jobs.net

Research Engineer

@ BASF | Houston, TX, US, 77079

View on ai-jobs.net