June 30, 2023, 6:13 a.m. | /u/panabeenu

Machine Learning www.reddit.com

**Paper**

[https://arxiv.org/abs/2306.15794](https://arxiv.org/abs/2306.15794)

**Blog**

[https://hazyresearch.stanford.edu/blog/2023-06-29-hyena-dna](https://hazyresearch.stanford.edu/blog/2023-06-29-hyena-dna)

**Colab**

[https://colab.research.google.com/drive/1wyVEQd4R3HYLTUOXEEQmp\_I8aNC\_aLhL?usp=sharing](https://colab.research.google.com/drive/1wyVEQd4R3HYLTUOXEEQmp_I8aNC_aLhL?usp=sharing)

**Abstract**

Genomic (DNA) sequences encode an enormous amount of information for gene regulation and protein synthesis. Similar to natural language models, researchers have proposed foundation models in genomics to learn generalizable features from unlabeled genome data that can then be fine-tuned for downstream tasks such as identifying regulatory elements. Due to the quadratic scaling of attention, previous Transformer-based genomic models have used 512 to 4k tokens as context (<0.001% of the human genome), significantly …

abstract blog colab data dna encode features foundation gene genome genomic genomics information language language models learn machinelearning modeling natural natural language paper protein regulation researchers synthesis

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Principal Applied Scientist

@ Microsoft | Redmond, Washington, United States

Data Analyst / Action Officer

@ OASYS, INC. | OASYS, INC., Pratt Avenue Northwest, Huntsville, AL, United States