Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed precondition: Python interpreter state is not initialized. #369

Open
MGJamJam opened this issue Nov 3, 2024 · 2 comments
Open

Failed precondition: Python interpreter state is not initialized. #369

MGJamJam opened this issue Nov 3, 2024 · 2 comments

Comments

@MGJamJam
Copy link

MGJamJam commented Nov 3, 2024

Hello!
When training a model I get the following error:

INFO     2024-11-03 19:13:30,783                         FOLD 0: INFO     2024-11-03 19:13:30,684 calamari_ocr.ocr.training.trai: Training finished
INFO     2024-11-03 19:13:30,884                         FOLD 0: 2024-11-03 19:13:30.833537: W tensorflow/core/kernels/data/generator_dataset_op.cc:107] Error occurred when finalizing GeneratorDataset iterator: Failed precondition: Python interpreter state is not initialized. The process may be terminated.
INFO     2024-11-03 19:13:30,884                         FOLD 0: 	 [[{{node PyFunc}}]]
CRITICAL 2024-11-03 19:13:39,935             tfaip.util.logging: Uncaught exception
Traceback (most recent call last):
  File "/home/fablab/miniconda3/envs/test_gpu/bin/calamari-cross-fold-train", line 8, in <module>
    sys.exit(run())
  File "/home/fablab/miniconda3/envs/test_gpu/lib/python3.7/site-packages/calamari_ocr/scripts/cross_fold_train.py", line 13, in run
    return main(parse_args())
  File "/home/fablab/miniconda3/envs/test_gpu/lib/python3.7/site-packages/calamari_ocr/scripts/cross_fold_train.py", line 31, in main
    trainer.run()
  File "/home/fablab/miniconda3/envs/test_gpu/lib/python3.7/site-packages/calamari_ocr/ocr/training/cross_fold_trainer.py", line 321, in run
    pool.map_async(train_individual_model, run_args).get()
  File "/home/fablab/miniconda3/envs/test_gpu/lib/python3.7/multiprocessing/pool.py", line 657, in get
    raise self._value
  File "/home/fablab/miniconda3/envs/test_gpu/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/home/fablab/miniconda3/envs/test_gpu/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "/home/fablab/miniconda3/envs/test_gpu/lib/python3.7/site-packages/calamari_ocr/ocr/training/cross_fold_trainer.py", line 53, in train_individual_model
    verbose=run_args.get("verbose", False),
  File "/home/fablab/miniconda3/envs/test_gpu/lib/python3.7/site-packages/calamari_ocr/utils/multiprocessing.py", line 83, in run
    raise Exception("Error: Process finished with code {}".format(process.returncode))
Exception: Error: Process finished with code -9

Can you help me understand what might cause this error and what its impact is? Is this error even relevant, as it is printed after the Training finished message?

I am using:

  • WSL with Ubuntu 22.04.3 LTS.
  • Python 3.7
  • Tensorflow 2.6.0
  • Cuda 11.2
  • cuDNN 8
  • calamari 2.2.2

The training command I used was:

CUDA_VISIBLE_DEVICES=0 calamari-cross-fold-train \
    --train PageXML \
    --train.images "training_data_senat_reduced/*.png" \
    --temporary_dir calamari_cd_training_output_warmstart_gothic_03_11 \
    --keep_temporary_files True \
    --scenario.tensorboard_logger_history_size 50 \
    --device.gpus 0 \
    --codec.include {string.digits + string.ascii_letters} \
    --best_models_dir "calamari_cf_training_03_11" \
    --weights "calamari_models_experimental/deep3_htr-gothic/0.ckpt.json" \
              "calamari_models_experimental/deep3_htr-gothic/1.ckpt.json" \
              "calamari_models_experimental/deep3_htr-gothic/2.ckpt.json" \
              "calamari_models_experimental/deep3_htr-gothic/3.ckpt.json" \
              "calamari_models_experimental/deep3_htr-gothic/4.ckpt.json" \
   --n_augmentations=5 \
   --network deep3 \
   |& tee output_cf_03_11.txt
@andbue
Copy link
Member

andbue commented Nov 4, 2024

I have the same error, it does not affect the training process (it occurs in the training process after the training is finished). So far I was not able to pin down the reason for the error in calamari, it might be fixed in newer tensorflow versions (cf. tensorflow/tensorflow#24570).

@MGJamJam
Copy link
Author

MGJamJam commented Nov 4, 2024

Thanks for the quick answer 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants