Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot connect to Huggingface to load model files #64

Open
RoenTh opened this issue Jul 9, 2024 · 0 comments
Open

Cannot connect to Huggingface to load model files #64

RoenTh opened this issue Jul 9, 2024 · 0 comments

Comments

@RoenTh
Copy link

RoenTh commented Jul 9, 2024

While attempting to run the train_dreambooth_lora.py script with the DeepFloyd/IF-I-XL-v1.0 model, I encountered an issue. It appears that the script cannot connect to the Huggingface servers.

I tried to run this command:

export MODEL_NAME="DeepFloyd/IF-I-XL-v1.0"
export INSTANCE_DIR=".cache/temp"
export OUTPUT_DIR=".cache/if_dreambooth_mushroom"

accelerate launch threestudio/scripts/train_dreambooth_lora.py \
  --pretrained_model_name_or_path=$MODEL_NAME  \
  --instance_data_dir=$INSTANCE_DIR \
  --output_dir=$OUTPUT_DIR \
  --instance_prompt="a sks mushroom" \
  --resolution=64 \
  --train_batch_size=4 \
  --gradient_accumulation_steps=1 \
  --learning_rate=5e-6 \
  --scale_lr \
  --max_train_steps=1200 \
  --checkpointing_steps=600 \
  --pre_compute_text_embeddings \
  --tokenizer_max_length=77 \
  --text_encoder_use_attention_mask

Error Log:

The following values were not passed to `accelerate launch` and had defaults used instead:
	`--num_processes` was set to a value of `2`
		More than one GPU was found, enabling multi-GPU training.
		If this was unintended please pass in `--num_processes=1`.
	`--num_machines` was set to a value of `1`
	`--mixed_precision` was set to a value of `'no'`
	`--dynamo_backend` was set to a value of `'no'`
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
/opt/conda/lib/python3.10/site-packages/diffusers/utils/outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
/opt/conda/lib/python3.10/site-packages/diffusers/utils/outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
07/09/2024 14:59:50 - INFO - __main__ - Distributed environment: MULTI_GPU  Backend: nccl
Num processes: 2
Process index: 0
Local process index: 0
Device: cuda:0

Mixed precision type: no

[rank0]: Traceback (most recent call last):
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/transformers/utils/hub.py", line 402, in cached_file
[rank0]:     resolved_file = hf_hub_download(
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
[rank0]:     return fn(*args, **kwargs)
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1221, in hf_hub_download
[rank0]:     return _hf_hub_download_to_cache_dir(
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1325, in _hf_hub_download_to_cache_dir
[rank0]:     _raise_on_head_call_error(head_call_error, force_download, local_files_only)
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1817, in _raise_on_head_call_error
[rank0]:     raise LocalEntryNotFoundError(
[rank0]: huggingface_hub.utils._errors.LocalEntryNotFoundError: Cannot find the requested files in the disk cache and outgoing traffic has been disabled. To enable hf.co look-ups and downloads online, set 'local_files_only' to False.

[rank0]: The above exception was the direct cause of the following exception:

[rank0]: Traceback (most recent call last):
[rank0]:   File "/DreamCraft3D/threestudio/scripts/train_dreambooth_lora.py", line 1480, in <module>
[rank0]:     main(args)
[rank0]:   File "/DreamCraft3D/threestudio/scripts/train_dreambooth_lora.py", line 801, in main
[rank0]:     tokenizer = AutoTokenizer.from_pretrained(
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 846, in from_pretrained
[rank0]:     config = AutoConfig.from_pretrained(
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 965, in from_pretrained
[rank0]:     config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/transformers/configuration_utils.py", line 632, in get_config_dict
[rank0]:     config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/transformers/configuration_utils.py", line 689, in _get_config_dict
[rank0]:     resolved_config_file = cached_file(
[rank0]:   File "/opt/conda/lib/python3.10/site-packages/transformers/utils/hub.py", line 445, in cached_file
[rank0]:     raise EnvironmentError(
[rank0]: OSError: We couldn't connect to 'https://huggingface.co/' to load this file, couldn't find it in the cached files and it looks like DeepFloyd/IF-I-XL-v1.0 is not the path to a directory containing a file named tokenizer/config.json.
[rank0]: Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.
07/09/2024 14:59:50 - INFO - __main__ - Distributed environment: MULTI_GPU  Backend: nccl
Num processes: 2
Process index: 1
Local process index: 1
Device: cuda:1

Mixed precision type: no

[rank1]: Traceback (most recent call last):
[rank1]:   File "/opt/conda/lib/python3.10/site-packages/transformers/utils/hub.py", line 402, in cached_file
[rank1]:     resolved_file = hf_hub_download(
[rank1]:   File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
[rank1]:     return fn(*args, **kwargs)
[rank1]:   File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1221, in hf_hub_download
[rank1]:     return _hf_hub_download_to_cache_dir(
[rank1]:   File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1325, in _hf_hub_download_to_cache_dir
[rank1]:     _raise_on_head_call_error(head_call_error, force_download, local_files_only)
[rank1]:   File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1817, in _raise_on_head_call_error
[rank1]:     raise LocalEntryNotFoundError(
[rank1]: huggingface_hub.utils._errors.LocalEntryNotFoundError: Cannot find the requested files in the disk cache and outgoing traffic has been disabled. To enable hf.co look-ups and downloads online, set 'local_files_only' to False.

[rank1]: The above exception was the direct cause of the following exception:

[rank1]: Traceback (most recent call last):
[rank1]:   File "/DreamCraft3D/threestudio/scripts/train_dreambooth_lora.py", line 1480, in <module>
[rank1]:     main(args)
[rank1]:   File "/DreamCraft3D/threestudio/scripts/train_dreambooth_lora.py", line 801, in main
[rank1]:     tokenizer = AutoTokenizer.from_pretrained(
[rank1]:   File "/opt/conda/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 846, in from_pretrained
[rank1]:     config = AutoConfig.from_pretrained(
[rank1]:   File "/opt/conda/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 965, in from_pretrained
[rank1]:     config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
[rank1]:   File "/opt/conda/lib/python3.10/site-packages/transformers/configuration_utils.py", line 632, in get_config_dict
[rank1]:     config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
[rank1]:   File "/opt/conda/lib/python3.10/site-packages/transformers/configuration_utils.py", line 689, in _get_config_dict
[rank1]:     resolved_config_file = cached_file(
[rank1]:   File "/opt/conda/lib/python3.10/site-packages/transformers/utils/hub.py", line 445, in cached_file
[rank1]:     raise EnvironmentError(
[rank1]: OSError: We couldn't connect to 'https://huggingface.co/' to load this file, couldn't find it in the cached files and it looks like DeepFloyd/IF-I-XL-v1.0 is not the path to a directory containing a file named tokenizer/config.json.
[rank1]: Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.
E0709 14:59:50.875000 140279330338624 torch/distributed/elastic/multiprocessing/api.py:826] failed (exitcode: 1) local_rank: 0 (pid: 7671) of binary: /opt/conda/bin/python3
Traceback (most recent call last):
  File "/opt/conda/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/opt/conda/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
    args.func(args)
  File "/opt/conda/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1088, in launch_command
    multi_gpu_launcher(args)
  File "/opt/conda/lib/python3.10/site-packages/accelerate/commands/launch.py", line 733, in multi_gpu_launcher
    distrib_run.run(args)
  File "/opt/conda/lib/python3.10/site-packages/torch/distributed/run.py", line 870, in run
    elastic_launch(
  File "/opt/conda/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 132, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/opt/conda/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 263, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
threestudio/scripts/train_dreambooth_lora.py FAILED
------------------------------------------------------------
Failures:
[1]:
  time      : 2024-07-09_14:59:50
  host      : 94e9c6295430
  rank      : 1 (local_rank: 1)
  exitcode  : 1 (pid: 7672)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2024-07-09_14:59:50
  host      : 94e9c6295430
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 7671)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant