Vector Embeddings by Sequence Similarity and Context for Improved Compression, Similarity Search, Clustering, Organization, and Manipulation of cDNA Libraries. (arXiv:2308.05118v1 [q-bio.GN]) | allainews.com

Aug. 11, 2023, 6:43 a.m. | Daniel H. Um, David A. Knowles, Gail E. Kaiser

cs.LG updates on arXiv.org arxiv.org

This paper demonstrates the utility of organized numerical representations of
genes in research involving flat string gene formats (i.e., FASTA/FASTQ5).
FASTA/FASTQ files have several current limitations, such as their large file
sizes, slow processing speeds for mapping and alignment, and contextual
dependencies. These challenges significantly hinder investigations and tasks
that involve finding similar sequences. The solution lies in transforming
sequences into an alternative representation that facilitates easier clustering
into similar groups compared to the raw sequences themselves. By assigning a …

alignment arxiv bio clustering compression context current embeddings files gene genes libraries limitations manipulation mapping numerical organization paper processing research search string utility vector vector embeddings

More from arxiv.org / cs.LG updates on arXiv.org

Tao: Re-Thinking DL-based Microarchitecture Simulation 23 hours ago | arxiv.org

abstract arxiv cs.ar cs.lg +12

Towards a Systems Theory of Algorithms 23 hours ago | arxiv.org

abstract algorithms arxiv code +16

Object Detection for Automated Coronary Artery Using Deep Learning 23 hours ago | arxiv.org

abstract arxiv automated cs.cv +21

On the Role of the Action Space in Robot Manipulation Learning and Sim-to-Real Transfer 23 hours ago | arxiv.org

abstract agents arxiv cs.lg +16

Computer Vision for Increased Operative Efficiency via Identification of Instruments in the Neurosurgical Operating Room: … 23 hours ago | arxiv.org

abstract artificial artificial intelligence arxiv +18

A New Random Reshuffling Method for Nonsmooth Nonconvex Finite-sum Optimization 23 hours ago | arxiv.org

abstract applications arxiv case +16

nach0: Multimodal Natural and Chemical Languages Foundation Model 23 hours ago | arxiv.org

abstract arxiv biomedical creative +24

How good are Large Language Models on African Languages? 23 hours ago | arxiv.org

abstract arxiv context cs.ai +19

Using Skew to Assess the Quality of GAN-generated Image Features 23 hours ago | arxiv.org

abstract advancement adversarial arxiv +20

Data Architect

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

View on ai-jobs.net

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

View on ai-jobs.net

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

View on ai-jobs.net

Sr. VBI Developer II

@ Atos | Texas, US, 75093

View on ai-jobs.net

Wealth Management - Data Analytics Intern/Co-op Fall 2024

@ Scotiabank | Toronto, ON, CA

View on ai-jobs.net