The paper can be found here.
The data is composed of pairs of sub-heading + article scrapped from Shkuf, ha-makom, the7eye.
The data can be found here. You can also used the csv attached under the data folder.
The data can be downloaded directly from hugginface using datasets library in python.
pip3 intall datasets
from datasets import load_dataset
hesum = load_dataset('biunlp/HeSum')
The data contains three data splits - train (8000 examples), validation (1000 examples) and test (1000 examples)
Each sample contains the following:
summary
: Sub-heading of an articlearticle
: The article.
There are two models fine-tuned on the HeSum dataset.
For running the model
from datasets import load_dataset
from transformers import pipeline
dataset = load_dataset("biunlp/HeSum")['test']
hub_model_id = "biunlp/mT5LongHeSum-large"
summarizer = pipeline("summarization", model=hub_model_id)
article = "<Enter you text here>"
summarizer(article, max_length=250)