all AI news
Protein language models trained on multiple sequence alignments learn phylogenetic relationships. (arXiv:2203.15465v2 [q-bio.BM] UPDATED)
cs.LG updates on arXiv.org arxiv.org
Self-supervised neural language models with attention have recently been
applied to biological sequence data, advancing structure, function and
mutational effect prediction. Some protein language models, including MSA
Transformer and AlphaFold's EvoFormer, take multiple sequence alignments (MSAs)
of evolutionarily related proteins as inputs. Simple combinations of MSA
Transformer's row attentions have led to state-of-the-art unsupervised
structural contact prediction. We demonstrate that similarly simple, and
universal, combinations of MSA Transformer's column attentions strongly
correlate with Hamming distances between sequences in MSAs. Therefore, …
arxiv bio language language models learn protein relationships