Skip to content

Alibaba-NLP/WebWalker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WebWalker: Benchmarking LLMs in Web Traversal

Stars

Jialong Wu, Wenbiao Yin, Jiang Yong, Zhenglin Wang, Zekun Xi, Runnan Fang

Linhai Zhang, Yulan He, Deyu Zhou, Pengjun Xie, Fei Huang

Tongyi Lab , Alibaba Group

👏 Welcome to try web traversal via our Modelscope online demo or 🤗 Huggingface online demo!

[🤖Project] [📄Paper] [🚩Citation]

Repo for WebWalker: Benchmarking LLMs in Web Traversal

📖 Quick Start

📌 Introduction

  • We construct a challenging benchmark, WebWalkerQA, which is composed of 680 queries from four real-world scenarios across over 1373 webpages.
  • To tackle the challenge of web-navigation tasks requiring long context, we propose WebWalker, which utilizes a multi-agent framework for effective memory management.
  • Extensive experiments show that the WebWalkerQA is challenging, and for information-seeking tasks, vertical exploration within the page proves to be beneficial.

📚 WebWalkerQA Dataset

The json item of WebWalkerQA dataset is organized in the following format:

{
  "Question": "When is the paper submission deadline for the ACL 2025 Industry Track, and what is the venue address for the conference?",
  "Answer": "The paper submission deadline for the ACL 2025 Industry Track is March 21, 2025. The conference will be held in Brune-Kreisky-Platz 1.",
  "Root_Url": "https://2025.aclweb.org/",
  "Info": {
    "Hop": "multi-source",
    "Domain": "Conference",
    "Language": "English",
    "Difficulty_Level": "Medium",
    "Source_Website": [
      "https://2025.aclweb.org/calls/industry_track/",
      "https://2025.aclweb.org/venue/"
    ],
    "Golden_Path": ["root->call>student_research_workshop", "root->venue"]
  }
}

🤗 The WebWalkerQA Leaderboard is is available at HuggingFace!

You can load the dataset via the following code:

from datasets import load_dataset
ds = load_dataset("callanwu/WebWalkerQA", split="main")

Additionally, we possess a collection of approximately 14k silver QA pairs, which, although not yet carefully human-verified. You can load the silver dataset by changing the split to silver.

💡 Perfomance

📊 Result on Web Agents

The performance on Web Agents are shown below:

📊 Result on RAG-Systems

🤗 The WebWalkerQA Leaderboard is is available at HuggingFace!

🚩 Welcome to submit your method to the leaderboard!

🛠 Dependencies

conda create -n webwalker python=3.10
git clone https://github.com/alibaba-nlp/WebWalker.git
cd WebWalker
pip install -e .
# Install requirements
pip install -r requirements.txt
# Run post-installation setup
crawl4ai-setup
# Verify your installation
crawl4ai-doctor

💻 Running WebWalker Demo Locally

🔑 Before running, please export the OPENAI API key or Dashscope API key as an environment variable:

export OPEN_AI_API_KEY=YOUR_API_KEY
export OPEN_AI_API_BASE_URL=YOUR_API_BASE_URL

or

export DASHSCOPE_API_KEY=YOUR_API_KEY

You can use other supported API keys with Qwen-Agent. For more details, please refer to the Qwen-Agent. To configure the API key, modify the code in lines 44-53 of src/app.py.

Then, run the app.py file with Streamlit:

cd src
streamlit run app.py

Runing RAG-System on WebWalkerQA

cd src
python rag_system.py --api_name [API_NAME] --output_file [OUTPUT_PATH]

The details of environment setup can be found in the README.md in the src folder.

🔍 Evaluation

The evaluation script for accuracy of the output answers using GPT-4 can be used as follows:

cd src
python evaluate.py --input_path [INPUT_PATH]--output_path [OUTPUT_PATH]

🌻Acknowledgement

  • This work is implemented by ReACT, Qwen-Agents, LangChain. Sincere thanks for their efforts.
  • We sincerely thank the contributors and maintainers of ai4crawl for their open-source tool❤️, which helped us get web pages in a Markdown-like format.
  • The repo is contributed by Jialong Wu, if you have any questions, please feel free to contact via [email protected] or [email protected] or create an issue.

🚩Citation

If this work is helpful, please kindly cite as:

@misc{wu2025webwalker,
      title={WebWalker: Benchmarking LLMs in Web Traversal},
      author={Jialong Wu and Wenbiao Yin and Yong Jiang and Zhenglin Wang and Zekun Xi and Runnan Fang and Deyu Zhou and Pengjun Xie and Fei Huang},
      year={2025},
      eprint={2501.07572},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2501.07572},
}

Star History

Star History Chart