all AI news
A Novel Scalable Apache Spark Based Feature Extraction Approaches for Huge Protein Sequence and their Clustering Performance Analysis. (arXiv:2204.11835v1 [q-bio.QM])
cs.LG updates on arXiv.org arxiv.org
Genome sequencing projects are rapidly increasing the number of
high-dimensional protein sequence datasets. Clustering a high-dimensional
protein sequence dataset using traditional machine learning approaches poses
many challenges. Many different feature extraction methods exist and are widely
used. However, extracting features from millions of protein sequences becomes
impractical because they are not scalable with current algorithms. Therefore,
there is a need for an efficient feature extraction approach that extracts
significant features. We have proposed two scalable feature extraction
approaches for extracting …
analysis apache apache spark arxiv bio clustering extraction feature performance performance analysis protein scalable spark