all AI news
Is Cross-modal Information Retrieval Possible without Training?. (arXiv:2304.11095v1 [cs.LG])
cs.LG updates on arXiv.org arxiv.org
Encoded representations from a pretrained deep learning model (e.g., BERT
text embeddings, penultimate CNN layer activations of an image) convey a rich
set of features beneficial for information retrieval. Embeddings for a
particular modality of data occupy a high-dimensional space of its own, but it
can be semantically aligned to another by a simple mapping without training a
deep neural net. In this paper, we take a simple mapping computed from the
least squares and singular value decomposition (SVD) for …
arxiv bert cnn data deep learning embeddings features image information least mapping neural net paper retrieval serve set solution space squares svd text training value