`tokenizer` should be replaced to `processing_class` in `Seq2SeqTrainer`? #35446

zzaebok · 2024-12-29T03:02:53Z

System Info

transformers version: 4.47.1
Platform: Linux-5.4.0-200-generic-x86_64-with-glibc2.31
Python version: 3.10.16
Huggingface_hub version: 0.27.0
Safetensors version: 0.4.5
Accelerate version: 1.2.1
Accelerate config: not found
PyTorch version (GPU?): 2.5.1+cu124 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using distributed or parallel set-up in script?:
Using GPU in script?:
GPU type: NVIDIA GeForce RTX 2070 SUPER

Who can help?

@amyeroberts @ArthurZucker

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

In trainer_seq2seq.py file, there is still calling self.tokenizer. which produces deprecation warning "Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead."

def _pad_tensors_to_max_len(self, tensor, max_length):
    if self.tokenizer is not None and hasattr(self.tokenizer, "pad_token_id"):
        # If PAD token is not defined at least EOS token has to be defined
        pad_token_id = (
            self.tokenizer.pad_token_id if self.tokenizer.pad_token_id is not None else self.tokenizer.eos_token_id
        )

Expected behavior

I believe self.tokenizer should be replaced to self.processing_class

def _pad_tensors_to_max_len(self, tensor, max_length):
    if self.processing_class is not None and hasattr(self.processing_class, "pad_token_id"):
        # If PAD token is not defined at least EOS token has to be defined
        pad_token_id = (
            self.processing_class.pad_token_id if self.processing_class.pad_token_id is not None else self.processing_class.eos_token_id
        )

Is it okay for me to make a PR for this issue? 😄

The text was updated successfully, but these errors were encountered:

LysandreJik · 2024-12-29T13:47:33Z

Thanks @zzaebok! Would you like to open a PR to fix this warning?

zzaebok added the bug label Dec 29, 2024

LysandreJik added Core: Tokenization Internals of the library; Tokenization. trainer labels Dec 29, 2024

zzaebok mentioned this issue Dec 29, 2024

Replace tokenizer to processing_class in Seq2SeqTrainer #35452

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`tokenizer` should be replaced to `processing_class` in `Seq2SeqTrainer`? #35446

`tokenizer` should be replaced to `processing_class` in `Seq2SeqTrainer`? #35446

zzaebok commented Dec 29, 2024 •

edited

Loading

LysandreJik commented Dec 29, 2024

tokenizer should be replaced to processing_class in Seq2SeqTrainer? #35446

tokenizer should be replaced to processing_class in Seq2SeqTrainer? #35446

Comments

zzaebok commented Dec 29, 2024 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

LysandreJik commented Dec 29, 2024

`tokenizer` should be replaced to `processing_class` in `Seq2SeqTrainer`? #35446

`tokenizer` should be replaced to `processing_class` in `Seq2SeqTrainer`? #35446

zzaebok commented Dec 29, 2024 •

edited

Loading