April 27, 2022, 1:11 a.m. | Preeti Jha, Aruna Tiwari, Neha Bharill, Milind Ratnaparkhe, Om Prakash Patel, Nilagiri Harshith, Mukkamalla Mounika, Neha Nagendra

cs.LG updates on arXiv.org arxiv.org

Genome sequencing projects are rapidly increasing the number of
high-dimensional protein sequence datasets. Clustering a high-dimensional
protein sequence dataset using traditional machine learning approaches poses
many challenges. Many different feature extraction methods exist and are widely
used. However, extracting features from millions of protein sequences becomes
impractical because they are not scalable with current algorithms. Therefore,
there is a need for an efficient feature extraction approach that extracts
significant features. We have proposed two scalable feature extraction
approaches for extracting …

analysis apache apache spark arxiv bio clustering extraction feature performance performance analysis protein scalable spark

Founding AI Engineer, Agents

@ Occam AI | New York

AI Engineer Intern, Agents

@ Occam AI | US

AI Research Scientist

@ Vara | Berlin, Germany and Remote

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne