all AI news
Bounding box regression network outputs a central area of the image
May 21, 2023, 3:40 p.m. | /u/chaizyy
Deep Learning www.reddit.com
I try to implement something similar to this:
[https://arxiv.org/pdf/2104.08541.pdf](https://arxiv.org/pdf/2104.08541.pdf)
while using clip.
tldr:
input: image & text describing an object
output: bounding box localizing the object in x1, y1, x2, y2 coordinates
dataset: refcocog
I have a basic custom visual feature extractor inspired by HRNet, whose output I combine in a \[batches, 1, 1024 \* 3\] tensor together with clip's image and textual features. next I have a bottleneck of 3 linear layers, each with relu activation except for …
box clip dataset deeplearning feature image network regression text
More from www.reddit.com / Deep Learning
The shift from custom NLP models to LLM providers
1 day, 10 hours ago |
www.reddit.com
Problem in reading Research Paper
2 days, 11 hours ago |
www.reddit.com
Could anybody explain these weird spikes in training?
3 days, 7 hours ago |
www.reddit.com
Jobs in AI, ML, Big Data
Senior ML Researcher - 3D Geometry Processing | 3D Shape Generation | 3D Mesh Data
@ Promaton | Europe
Principal Data Engineer
@ RS21 | Remote
SQL/Power BI Developer
@ ICF | Virginia Remote Office (VA99)
Senior Machine Learning Engineer (Canada Remote)
@ Fullscript | Ottawa, ON
Software Engineer - MLOps.
@ Renesas Electronics | Toyosu, Japan
Junior Data Scientist / Artificial Intelligence consultant
@ Deloitte | Luxembourg, LU