all AI news
[R] HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution
June 30, 2023, 6:13 a.m. | /u/panabeenu
Machine Learning www.reddit.com
[https://arxiv.org/abs/2306.15794](https://arxiv.org/abs/2306.15794)
**Blog**
[https://hazyresearch.stanford.edu/blog/2023-06-29-hyena-dna](https://hazyresearch.stanford.edu/blog/2023-06-29-hyena-dna)
**Colab**
[https://colab.research.google.com/drive/1wyVEQd4R3HYLTUOXEEQmp\_I8aNC\_aLhL?usp=sharing](https://colab.research.google.com/drive/1wyVEQd4R3HYLTUOXEEQmp_I8aNC_aLhL?usp=sharing)
**Abstract**
Genomic (DNA) sequences encode an enormous amount of information for gene regulation and protein synthesis. Similar to natural language models, researchers have proposed foundation models in genomics to learn generalizable features from unlabeled genome data that can then be fine-tuned for downstream tasks such as identifying regulatory elements. Due to the quadratic scaling of attention, previous Transformer-based genomic models have used 512 to 4k tokens as context (<0.001% of the human genome), significantly …
abstract blog colab data dna encode features foundation gene genome genomic genomics information language language models learn machinelearning modeling natural natural language paper protein regulation researchers synthesis
More from www.reddit.com / Machine Learning
Jobs in AI, ML, Big Data
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Senior Machine Learning Engineer (MLOps)
@ Promaton | Remote, Europe
Principal Applied Scientist
@ Microsoft | Redmond, Washington, United States
Data Analyst / Action Officer
@ OASYS, INC. | OASYS, INC., Pratt Avenue Northwest, Huntsville, AL, United States