A powerful tool that crawls documentation websites and generates a clean, well-formatted markdown document. Built with FastAPI and support for multiple LLM providers (DeepSeek and Groq).
- 🌐 Web Crawling: Automatically crawls documentation websites and extracts content
- 🌳 Tree Structure: Maintains the original documentation hierarchy
- ✨ Clean Output: Generates well-formatted markdown documents
- 🎯 Selective Processing: Choose which pages to include in the final document
- 🔄 Multiple LLM Support:
- DeepSeek API integration
- Groq API integration
- 🎨 Modern UI:
- Dark mode by default
- Interactive documentation tree
- Real-time processing feedback
- Copy and download options
- Clone the repository:
git clone https://github.com/kr3t3n/documentation-crawler.git
cd documentation-crawler
- Install dependencies:
pip install -r requirements.txt
- Run the application:
python main.py
The application will be available at http://127.0.0.1:8000
You'll need at least one of these API keys to use the application:
- Enter the documentation URL you want to process
- Choose your preferred API (DeepSeek or Groq) and enter the API key
- Wait for the crawler to analyze the documentation structure
- Select the pages you want to include in the final document
- Click "Generate Markdown" to process the selected pages
- Copy or download the generated markdown
documentation-crawler/
├── main.py # FastAPI application and endpoints
├── crawler.py # Documentation crawling logic
├── processor.py # Content processing and LLM integration
├── requirements.txt # Project dependencies
├── static/ # Static assets
└── templates/ # HTML templates
└── index.html # Main application interface
Contributions are welcome! Feel free to:
- Fork the repository
- Create a feature branch
- Submit a pull request
Created by Georgi from Mangia Studios Limited.
If you find Documentation Crawler useful, consider buying me a coffee ☕
This project is licensed under the MIT License - see the LICENSE file for details.
- Built with FastAPI
- UI powered by DaisyUI and Tailwind CSS
- Animations by Anime.js