Genome Understanding using LLMs

Repository Link

[https://github.com/your_username/your_project_name]

Description

This project aims to apply large language models (LLMs) to the problem of understanding genomic sequences, specifically FASTA files. The primary goal is to leverage transformer-based architectures such as DNABERT, DNABERT-2, and the Nucleotide Transformer to analyze and interpret DNA sequences. This approach seeks to improve the prediction of genomic elements (e.g., promoters, splice sites, transcription factor binding sites) and to explore the potential of pre-trained models in genomics for bioinformtics, personalized medicine and biotechnology applications.

Task Type

Fine Tunning LLMs for Genome Understanding

Results Summary

Best Model: DNABERT-2
Evaluation Metric: Training time was too much for the dataset, so I stopped the training after few Hours.
Result: Could not complete the training, but the model was able to learn the patterns in the data.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
0_LiteratureReview		0_LiteratureReview
1_DatasetCharacteristics		1_DatasetCharacteristics
2_BaselineModel		2_BaselineModel
3_Model_fine_tuning		3_Model_fine_tuning
4_Presentation		4_Presentation
CoverImage		CoverImage
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Genome Understanding using LLMs

Repository Link

Description

Task Type

Results Summary

Documentation

Cover Image

About

Releases

Packages

Languages

License

AammarTufail/fine_tuning_LLM_for_genome_understanding_of_covid_19

Folders and files

Latest commit

History

Repository files navigation

Genome Understanding using LLMs

Repository Link

Description

Task Type

Results Summary

Documentation

Cover Image

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages