HeSum: A Novel Dataset for Abstractive Text Summarization in Hebrew

Paper

The paper can be found here.

Data

The data is composed of pairs of sub-heading + article scrapped from Shkuf, ha-makom, the7eye.

The data can be found here. You can also used the csv attached under the data folder.

The data can be downloaded directly from hugginface using datasets library in python.

pip3 intall datasets
from datasets import load_dataset

hesum = load_dataset('biunlp/HeSum')

The data contains three data splits - train (8000 examples), validation (1000 examples) and test (1000 examples)

Each sample contains the following:

summary: Sub-heading of an article
article: The article.

Models

There are two models fine-tuned on the HeSum dataset.

mT5LongHeSum-base (2.3 GB), Download.
mT5LongHeSum-large (4.6 GB) Download.

For running the model

from datasets import load_dataset
from transformers import pipeline

dataset = load_dataset("biunlp/HeSum")['test']
hub_model_id = "biunlp/mT5LongHeSum-large"
summarizer = pipeline("summarization", model=hub_model_id)

article = "<Enter you text here>"
summarizer(article, max_length=250)

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
Readme.md		Readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HeSum: A Novel Dataset for Abstractive Text Summarization in Hebrew

Paper

Data

Models

About

Releases

Packages

OnlpLab/HeSum

Folders and files

Latest commit

History

Repository files navigation

HeSum: A Novel Dataset for Abstractive Text Summarization in Hebrew

Paper

Data

Models

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages