🕵️‍♂️ SyntheSearch

🖋️ Authors

Lead Project: Phong Cao
AI Developer: Phong Cao
Backend: Phong Cao, Hien Hoang
Frontend: Doanh Phung, Minh Bui

🌟 Project Overview

SyntheSearch: A smart research tool that finds and synthesizes the most relevant papers, saving researchers time and enhancing insight.

🚀 Getting Started

🛠️ Start Server (Backend)

Navigate to the backend directory:
```
cd backend
```
Set up a virtual environment:
```
python3 -m venv venv
```
Activate the virtual environment:
```
source venv/bin/activate
```
Install dependencies:
```
pip install -r requirement.txt
```
⚠️ Note: On iOS, ensure pywin32 is removed from requirement.txt.
Run the server:
```
uvicorn main:app --reload
```

💻 Start Client (Frontend)

Navigate to the frontend directory:
```
cd frontend
```
Install dependencies:
```
npm i
```
Start the development server:
```
npm run dev
```

📚 Introduction

SyntheSearch is a web application designed to streamline the research process for students and researchers by efficiently locating relevant research papers. Researchers often spend hours sifting through papers, hoping to find the studies that best match their interests. SyntheSearch aims to reduce this time by intelligently suggesting the most relevant papers and generating a synthesis to reveal how the studies interrelate, offering users an insightful overview that saves time and enhances understanding.

💡 Inspiration

The inspiration for SyntheSearch came from our own experiences as students. Before HackUMass XII, one team member struggled to find research papers on machine-learning applications in cancer detection. The process of locating credible sources was exhausting and time-consuming, even with optimized library search tools. This frustration inspired us to develop a more efficient search engine that leverages Large Language Models (LLM) and vector databases to quickly surface relevant research and summar...

🛠️ How We Built the Project

We chose Python for the back end because of its extensive frameworks for AI development. Databricks was used to streamline our machine-learning pipeline. Here’s how we approached building SyntheSearch:

📊 Data Collection: We started by scraping data from the CORE collection of open-access research papers.
🔍 Embedding: Using LangChain, we implemented OpenAI's text-embedding-3-large model to convert paper texts into vector embeddings.
📂 Storage: We utilized LanceDB as our vector database, storing the embedded vectors for fast and efficient retrieval.
📝 Summarization and Synthesis: We employed OpenAI’s GPT-4o-mini model to generate summaries, suggestions, and synthesized insights.
🌐 Front-End: We built the user interface using React.JS with a TypeScript template, providing a clean and responsive experience for users.

🚧 Challenges Faced

🔄 GitHub Workflow Issues: Frequent pull request conflicts slowed our progress due to merge conflicts.
🗣️ Communication Gaps: Miscommunication led to duplicated work and inefficiencies.

🎓 Lessons Learned

This project was an invaluable learning experience. As it was our first LLM project, we gained hands-on experience with GenAI technologies, particularly the power of vector databases. We learned the importance of clear team communication, and we now have a deeper understanding of LLMs and their capabilities in revolutionizing information retrieval.

🛠️ Built With

🐍 Python (Backend Development)
🔗 LangChain (Embedding)
📂 LanceDB (Vector Database)
🤖 OpenAI GPT Models (Summarization and Synthesis)
⚛️ React with TypeScript (Front-End Development)
🎨 TailwindCSS (Styling)
⚡ Vite (Tooling)
🧪 Databricks (Machine Learning Pipeline)

🌱 Empowering Researchers

Through SyntheSearch, we’re excited to contribute to the efficiency of the research process, empowering researchers to focus on insights rather than information overload.

Name		Name	Last commit message	Last commit date
Latest commit History 122 Commits
backend		backend
frontend		frontend
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirement.txt		requirement.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🕵️‍♂️ SyntheSearch

🖋️ Authors

🌟 Project Overview

🚀 Getting Started

🛠️ Start Server (Backend)

💻 Start Client (Frontend)

📚 Introduction

💡 Inspiration

🛠️ How We Built the Project

🚧 Challenges Faced

🎓 Lessons Learned

🛠️ Built With

🌱 Empowering Researchers

About

Releases 1

Packages

Contributors 4

Languages

License

PhongCT1105/SyntheSearch

Folders and files

Latest commit

History

Repository files navigation

🕵️‍♂️ SyntheSearch

🖋️ Authors

🌟 Project Overview

🚀 Getting Started

🛠️ Start Server (Backend)

💻 Start Client (Frontend)

📚 Introduction

💡 Inspiration

🛠️ How We Built the Project

🚧 Challenges Faced

🎓 Lessons Learned

🛠️ Built With

🌱 Empowering Researchers

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 4

Languages

Packages