Genome Understanding using LLMs

Repository Link

[https://github.com/your_username/your_project_name]

Description

This project aims to apply large language models (LLMs) to the problem of understanding genomic sequences, specifically FASTA files. The primary goal is to leverage transformer-based architectures such as DNABERT, DNABERT-2, and the Nucleotide Transformer to analyze and interpret DNA sequences. This approach seeks to improve the prediction of genomic elements (e.g., promoters, splice sites, transcription factor binding sites) and to explore the potential of pre-trained models in genomics for bioinformtics, personalized medicine and biotechnology applications.

Task Type

Fine Tunning LLMs for Genome Understanding

Results Summary

Best Model: DNABERT-2
Evaluation Metric: Training time was too much for the dataset, so I stopped the training after few Hours.
Result: Could not complete the training, but the model was able to learn the patterns in the data.

Documentation

Literature Review
Dataset Characteristics
Baseline Model
Fine Tuning LLM
Presentation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Genome Understanding using LLMs

Repository Link

Description

Task Type

Results Summary

Documentation

Cover Image

Files

README.md

Latest commit

History

README.md

File metadata and controls

Genome Understanding using LLMs

Repository Link

Description

Task Type

Results Summary

Documentation

Cover Image