Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Environment images failed to build in running "swebench.harness.run_evaluation" #293

Open
haoxianghz opened this issue Jan 18, 2025 · 1 comment
Labels
bug Something isn't working

Comments

@haoxianghz
Copy link

Describe the bug

I wish to run a basic evaluation, using:

python -u -m swebench.harness.run_evaluation \
    --predictions_path gold \
    --max_workers 12 \
    --instance_ids astropy__astropy-12907 \
    --run_id validate-gold \
    --rewrite_reports False \
    --clean False \
    --cache_level instance

It fails with error:

    raise BuildImageError(image_name, str(e), logger) from e
swebench.harness.docker_build.BuildImageError: Error building image sweb.env.py.x86_64.428468730904ff6b4232aa:latest: The command '/bin/sh -c /bin/bash -c "source ~/.bashrc && /root/setup_env.sh"' returned a non-zero code: 1
Check (logs/build_images/env/sweb.env.py.x86_64.428468730904ff6b4232aa__latest/build_image.log) for more information.

  0%|          | 0/1 [06:42<?, ?it/s]
1 environment images failed to build.

This is one instance to run, I also get similar errors when trying to run all 300 instances from --dataset_name princeton-nlp/SWE-bench_Lite

Is it normal that these 300 instances fail here and there case by case. My host machine does not have issue of pip install when I pip install swebench repo. So now if in these docker containers they have issues, how should I fix them, for 300 case by case?

I also observe that when I rerun python -u -m swebench.harness.run_evaluation ... with --cache_level env, then some instance gets fixed. But I rather if there is a simple way to run the whole 300 instances eval without any build image error. Any suggestion would be great!

Steps/Code to Reproduce

python -u -m swebench.harness.run_evaluation \
    --predictions_path gold \
    --max_workers 12 \
    --instance_ids astropy__astropy-12907 \
    --run_id validate-gold \
    --rewrite_reports False \
    --clean False \
    --cache_level instance

Expected Results

Successfully build all images, then with the 300 instances, ideally:

Instances resolved: 300
Instances with errors: 0

Actual Results

build_image.log for --instance_ids astropy__astropy-12907:

2025-01-18 12:48:38,493 - INFO - Downloading tomli-2.0.1-py3-none-any.whl.metadata (8.9 kB)
2025-01-18 12:49:57,932 - INFO - INFO: pip is looking at multiple versions of pytest-cov to determine which version is compatible with other requirements. This could take a while.
2025-01-18 12:49:57,936 - INFO - ERROR: Ignored the following versions that require a different python version: 2.1.0 Requires-Python >=3.10; 2.1.0rc1 Requires-Python >=3.10; 2.1.1 Requires-Python >=3.10; 2.1.2 Requires-Python >=3.10; 2.1.3 Requires-Python >=3.10; 2.2.0 Requires-Python >=3.10; 2.2.0rc1 Requires-Python >=3.10; 2.2.1 Requires-Python >=3.10
2025-01-18 12:49:57,936 - INFO - ERROR: Could not find a version that satisfies the requirement coverage>=5.2.1 (from pytest-cov) (from versions: none)
2025-01-18 12:50:05,011 - INFO - ERROR: No matching distribution found for coverage>=5.2.1
2025-01-18 12:50:05,716 - INFO - ---> Removed intermediate container 93e48816709d
2025-01-18 12:50:05,717 - ERROR - Error: The command '/bin/sh -c /bin/bash -c "source ~/.bashrc && /root/setup_env.sh"' returned a non-zero code: 1
2025-01-18 12:50:05,717 - ERROR - docker.errors.BuildError during sweb.env.py.x86_64.428468730904ff6b4232aa:latest: The command '/bin/sh -c /bin/bash -c "source ~/.bashrc && /root/setup_env.sh"' returned a non-zero code: 1

in another instance, the build_image.log:

2025-01-18 10:01:34,258 - INFO - Downloading trio-0.10.0.tar.gz (402 kB)
2025-01-18 10:02:23,714 - INFO - Installing build dependencies: started
2025-01-18 10:03:37,178 - INFO - Installing build dependencies: still running...
2025-01-18 10:03:38,674 - INFO - Installing build dependencies: finished with status 'error'
2025-01-18 10:03:38,675 - INFO - ERROR: Command errored out with exit status 2:
   command: /opt/miniconda3/envs/testbed/bin/python /opt/miniconda3/envs/testbed/lib/python3.6/site-packages/pip install --ignore-installed --no-user --prefix /tmp/pip-build-env-9molz0gs/overlay --no-warn-script-location --no-binary :none: --only-binary :none: -i https://pypi.org/simple -- 'setuptools>=40.8.0' wheel
       cwd: None
  Complete output (82 lines):
  Collecting setuptools>=40.8.0
    Downloading setuptools-59.6.0-py3-none-any.whl (952 kB)
  ERROR: Exception:
  Traceback (most recent call last):
    File "/opt/miniconda3/envs/testbed/lib/python3.6/site-packages/pip/_vendor/urllib3/response.py", line 438, in _error_catcher
      yield
    File "/opt/miniconda3/envs/testbed/lib/python3.6/site-packages/pip/_vendor/urllib3/response.py", line 519, in read
      data = self._fp.read(amt) if not fp_closed else b""
    File "/opt/miniconda3/envs/testbed/lib/python3.6/site-packages/pip/_vendor/cachecontrol/filewrapper.py", line 62, in read
      data = self.__fp.read(amt)
    File "/opt/miniconda3/envs/testbed/lib/python3.6/http/client.py", line 463, in read
      n = self.readinto(b)
    File "/opt/miniconda3/envs/testbed/lib/python3.6/http/client.py", line 507, in readinto
      n = self.fp.readinto(b)
    File "/opt/miniconda3/envs/testbed/lib/python3.6/socket.py", line 586, in readinto
      return self._sock.recv_into(b)
    File "/opt/miniconda3/envs/testbed/lib/python3.6/ssl.py", line 1012, in recv_into
      return self.read(nbytes, buffer)
    File "/opt/miniconda3/envs/testbed/lib/python3.6/ssl.py", line 874, in read
      return self._sslobj.read(len, buffer)
    File "/opt/miniconda3/envs/testbed/lib/python3.6/ssl.py", line 631, in read
      v = self._sslobj.read(len, buffer)
  socket.timeout: The read operation timed out
  
  During handling of the above exception, another exception occurred:
  
  Traceback (most recent call last):
    File "/opt/miniconda3/envs/testbed/lib/python3.6/site-packages/pip/_internal/cli/base_command.py", line 173, in _main
      status = self.run(options, args)
    File "/opt/miniconda3/envs/testbed/lib/python3.6/site-packages/pip/_internal/cli/req_command.py", line 203, in wrapper
      return func(self, options, args)
    File "/opt/miniconda3/envs/testbed/lib/python3.6/site-packages/pip/_internal/commands/install.py", line 316, in run
      reqs, check_supported_wheels=not options.target_dir
    File "/opt/miniconda3/envs/testbed/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/resolver.py", line 95, in resolve
      collected.requirements, max_rounds=try_to_avoid_resolution_too_deep
    File "/opt/miniconda3/envs/testbed/lib/python3.6/site-packages/pip/_vendor/resolvelib/resolvers.py", line 472, in resolve
      state = resolution.resolve(requirements, max_rounds=max_rounds)
    File "/opt/miniconda3/envs/testbed/lib/python3.6/site-packages/pip/_vendor/resolvelib/resolvers.py", line 341, in resolve
      self._add_to_criteria(self.state.criteria, r, parent=None)
    File "/opt/miniconda3/envs/testbed/lib/python3.6/site-packages/pip/_vendor/resolvelib/resolvers.py", line 172, in _add_to_criteria
      if not criterion.candidates:
    File "/opt/miniconda3/envs/testbed/lib/python3.6/site-packages/pip/_vendor/resolvelib/structs.py", line 151, in __bool__
      return bool(self._sequence)
    File "/opt/miniconda3/envs/testbed/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 140, in __bool__
      return any(self)
    File "/opt/miniconda3/envs/testbed/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 128, in <genexpr>
      return (c for c in iterator if id(c) not in self._incompatible_ids)
    File "/opt/miniconda3/envs/testbed/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 32, in _iter_built
      candidate = func()
    File "/opt/miniconda3/envs/testbed/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/factory.py", line 209, in _make_candidate_from_link
      version=version,
    File "/opt/miniconda3/envs/testbed/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 301, in __init__
      version=version,
    File "/opt/miniconda3/envs/testbed/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 156, in __init__
      self.dist = self._prepare()
    File "/opt/miniconda3/envs/testbed/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 227, in _prepare
      dist = self._prepare_distribution()
    File "/opt/miniconda3/envs/testbed/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 306, in _prepare_distribution
      self._ireq, parallel_builds=True
    File "/opt/miniconda3/envs/testbed/lib/python3.6/site-packages/pip/_internal/operations/prepare.py", line 508, in prepare_linked_requirement
      return self._prepare_linked_requirement(req, parallel_builds)
    File "/opt/miniconda3/envs/testbed/lib/python3.6/site-packages/pip/_internal/operations/prepare.py", line 552, in _prepare_linked_requirement
      self.download_dir, hashes
    File "/opt/miniconda3/envs/testbed/lib/python3.6/site-packages/pip/_internal/operations/prepare.py", line 243, in unpack_url
      hashes=hashes,
    File "/opt/miniconda3/envs/testbed/lib/python3.6/site-packages/pip/_internal/operations/prepare.py", line 102, in get_http_url
      from_path, content_type = download(link, temp_dir.path)
    File "/opt/miniconda3/envs/testbed/lib/python3.6/site-packages/pip/_internal/network/download.py", line 145, in __call__
      for chunk in chunks:
    File "/opt/miniconda3/envs/testbed/lib/python3.6/site-packages/pip/_internal/cli/progress_bars.py", line 144, in iter
      for x in it:
    File "/opt/miniconda3/envs/testbed/lib/python3.6/site-packages/pip/_internal/network/utils.py", line 87, in response_chunks
      decode_content=False,
    File "/opt/miniconda3/envs/testbed/lib/python3.6/site-packages/pip/_vendor/urllib3/response.py", line 576, in stream
      data = self.read(amt=amt, decode_content=decode_content)
    File "/opt/miniconda3/envs/testbed/lib/python3.6/site-packages/pip/_vendor/urllib3/response.py", line 541, in read
      raise IncompleteRead(self._fp_bytes_read, self.length_remaining)
    File "/opt/miniconda3/envs/testbed/lib/python3.6/contextlib.py", line 99, in __exit__
      self.gen.throw(type, value, traceback)
    File "/opt/miniconda3/envs/testbed/lib/python3.6/site-packages/pip/_vendor/urllib3/response.py", line 443, in _error_catcher
      raise ReadTimeoutError(self._pool, None, "Read timed out.")
  pip._vendor.urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='files.pythonhosted.org', port=443): Read timed out.

System Information

No response

@haoxianghz haoxianghz added the bug Something isn't working label Jan 18, 2025
@john-b-yang
Copy link
Member

Hi @haoxianghz this looks like some sort of connection issue, as opposed to a SWE-bench bug.

What machine did you run this on? If you use the latest version of SWE-bench, its behavior has been changed to download built images from docker hub online instead of re-building locally. If the problem is truly related to the building of the environment images, the new SWE-bench should circumvent this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants