Skip to content

AammarTufail/fine_tuning_LLM_for_genome_understanding_of_covid_19

Repository files navigation

Genome Understanding using LLMs

Repository Link

[https://github.com/your_username/your_project_name]

Description

This project aims to apply large language models (LLMs) to the problem of understanding genomic sequences, specifically FASTA files. The primary goal is to leverage transformer-based architectures such as DNABERT, DNABERT-2, and the Nucleotide Transformer to analyze and interpret DNA sequences. This approach seeks to improve the prediction of genomic elements (e.g., promoters, splice sites, transcription factor binding sites) and to explore the potential of pre-trained models in genomics for bioinformtics, personalized medicine and biotechnology applications.

Task Type

Fine Tunning LLMs for Genome Understanding

Results Summary

  • Best Model: DNABERT-2
  • Evaluation Metric: Training time was too much for the dataset, so I stopped the training after few Hours.
  • Result: Could not complete the training, but the model was able to learn the patterns in the data.

Documentation

  1. Literature Review
  2. Dataset Characteristics
  3. Baseline Model
  4. Fine Tuning LLM
  5. Presentation

Cover Image

Project Cover Image


About

This repository is about fine tunning Large Language Models to Understand Genomes

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published