Environment images failed to build in running "swebench.harness.run_evaluation" #293

haoxianghz · 2025-01-18T18:10:46Z

Describe the bug

I wish to run a basic evaluation, using:

python -u -m swebench.harness.run_evaluation \
    --predictions_path gold \
    --max_workers 12 \
    --instance_ids astropy__astropy-12907 \
    --run_id validate-gold \
    --rewrite_reports False \
    --clean False \
    --cache_level instance

It fails with error:

    raise BuildImageError(image_name, str(e), logger) from e
swebench.harness.docker_build.BuildImageError: Error building image sweb.env.py.x86_64.428468730904ff6b4232aa:latest: The command '/bin/sh -c /bin/bash -c "source ~/.bashrc && /root/setup_env.sh"' returned a non-zero code: 1
Check (logs/build_images/env/sweb.env.py.x86_64.428468730904ff6b4232aa__latest/build_image.log) for more information.

  0%|          | 0/1 [06:42<?, ?it/s]
1 environment images failed to build.

This is one instance to run, I also get similar errors when trying to run all 300 instances from --dataset_name princeton-nlp/SWE-bench_Lite

Is it normal that these 300 instances fail here and there case by case. My host machine does not have issue of pip install when I pip install swebench repo. So now if in these docker containers they have issues, how should I fix them, for 300 case by case?

I also observe that when I rerun python -u -m swebench.harness.run_evaluation ... with --cache_level env, then some instance gets fixed. But I rather if there is a simple way to run the whole 300 instances eval without any build image error. Any suggestion would be great!

Steps/Code to Reproduce

python -u -m swebench.harness.run_evaluation \
    --predictions_path gold \
    --max_workers 12 \
    --instance_ids astropy__astropy-12907 \
    --run_id validate-gold \
    --rewrite_reports False \
    --clean False \
    --cache_level instance

Expected Results

Successfully build all images, then with the 300 instances, ideally:

Instances resolved: 300
Instances with errors: 0

Actual Results

build_image.log for --instance_ids astropy__astropy-12907:

2025-01-18 12:48:38,493 - INFO - Downloading tomli-2.0.1-py3-none-any.whl.metadata (8.9 kB)
2025-01-18 12:49:57,932 - INFO - INFO: pip is looking at multiple versions of pytest-cov to determine which version is compatible with other requirements. This could take a while.
2025-01-18 12:49:57,936 - INFO - ERROR: Ignored the following versions that require a different python version: 2.1.0 Requires-Python >=3.10; 2.1.0rc1 Requires-Python >=3.10; 2.1.1 Requires-Python >=3.10; 2.1.2 Requires-Python >=3.10; 2.1.3 Requires-Python >=3.10; 2.2.0 Requires-Python >=3.10; 2.2.0rc1 Requires-Python >=3.10; 2.2.1 Requires-Python >=3.10
2025-01-18 12:49:57,936 - INFO - ERROR: Could not find a version that satisfies the requirement coverage>=5.2.1 (from pytest-cov) (from versions: none)
2025-01-18 12:50:05,011 - INFO - ERROR: No matching distribution found for coverage>=5.2.1
2025-01-18 12:50:05,716 - INFO - ---> Removed intermediate container 93e48816709d
2025-01-18 12:50:05,717 - ERROR - Error: The command '/bin/sh -c /bin/bash -c "source ~/.bashrc && /root/setup_env.sh"' returned a non-zero code: 1
2025-01-18 12:50:05,717 - ERROR - docker.errors.BuildError during sweb.env.py.x86_64.428468730904ff6b4232aa:latest: The command '/bin/sh -c /bin/bash -c "source ~/.bashrc && /root/setup_env.sh"' returned a non-zero code: 1

in another instance, the build_image.log:

2025-01-18 10:01:34,258 - INFO - Downloading trio-0.10.0.tar.gz (402 kB)
2025-01-18 10:02:23,714 - INFO - Installing build dependencies: started
2025-01-18 10:03:37,178 - INFO - Installing build dependencies: still running...
2025-01-18 10:03:38,674 - INFO - Installing build dependencies: finished with status 'error'
2025-01-18 10:03:38,675 - INFO - ERROR: Command errored out with exit status 2:
   command: /opt/miniconda3/envs/testbed/bin/python /opt/miniconda3/envs/testbed/lib/python3.6/site-packages/pip install --ignore-installed --no-user --prefix /tmp/pip-build-env-9molz0gs/overlay --no-warn-script-location --no-binary :none: --only-binary :none: -i https://pypi.org/simple -- 'setuptools>=40.8.0' wheel
       cwd: None
  Complete output (82 lines):
  Collecting setuptools>=40.8.0
    Downloading setuptools-59.6.0-py3-none-any.whl (952 kB)
  ERROR: Exception:
  Traceback (most recent call last):
    File "/opt/miniconda3/envs/testbed/lib/python3.6/site-packages/pip/_vendor/urllib3/response.py", line 438, in _error_catcher
      yield
    File "/opt/miniconda3/envs/testbed/lib/python3.6/site-packages/pip/_vendor/urllib3/response.py", line 519, in read
      data = self._fp.read(amt) if not fp_closed else b""
    File "/opt/miniconda3/envs/testbed/lib/python3.6/site-packages/pip/_vendor/cachecontrol/filewrapper.py", line 62, in read
      data = self.__fp.read(amt)
    File "/opt/miniconda3/envs/testbed/lib/python3.6/http/client.py", line 463, in read
      n = self.readinto(b)
    File "/opt/miniconda3/envs/testbed/lib/python3.6/http/client.py", line 507, in readinto
      n = self.fp.readinto(b)
    File "/opt/miniconda3/envs/testbed/lib/python3.6/socket.py", line 586, in readinto
      return self._sock.recv_into(b)
    File "/opt/miniconda3/envs/testbed/lib/python3.6/ssl.py", line 1012, in recv_into
      return self.read(nbytes, buffer)
    File "/opt/miniconda3/envs/testbed/lib/python3.6/ssl.py", line 874, in read
      return self._sslobj.read(len, buffer)
    File "/opt/miniconda3/envs/testbed/lib/python3.6/ssl.py", line 631, in read
      v = self._sslobj.read(len, buffer)
  socket.timeout: The read operation timed out
  
  During handling of the above exception, another exception occurred:
  
  Traceback (most recent call last):
    File "/opt/miniconda3/envs/testbed/lib/python3.6/site-packages/pip/_internal/cli/base_command.py", line 173, in _main
      status = self.run(options, args)
    File "/opt/miniconda3/envs/testbed/lib/python3.6/site-packages/pip/_internal/cli/req_command.py", line 203, in wrapper
      return func(self, options, args)
    File "/opt/miniconda3/envs/testbed/lib/python3.6/site-packages/pip/_internal/commands/install.py", line 316, in run
      reqs, check_supported_wheels=not options.target_dir
    File "/opt/miniconda3/envs/testbed/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/resolver.py", line 95, in resolve
      collected.requirements, max_rounds=try_to_avoid_resolution_too_deep
    File "/opt/miniconda3/envs/testbed/lib/python3.6/site-packages/pip/_vendor/resolvelib/resolvers.py", line 472, in resolve
      state = resolution.resolve(requirements, max_rounds=max_rounds)
    File "/opt/miniconda3/envs/testbed/lib/python3.6/site-packages/pip/_vendor/resolvelib/resolvers.py", line 341, in resolve
      self._add_to_criteria(self.state.criteria, r, parent=None)
    File "/opt/miniconda3/envs/testbed/lib/python3.6/site-packages/pip/_vendor/resolvelib/resolvers.py", line 172, in _add_to_criteria
      if not criterion.candidates:
    File "/opt/miniconda3/envs/testbed/lib/python3.6/site-packages/pip/_vendor/resolvelib/structs.py", line 151, in __bool__
      return bool(self._sequence)
    File "/opt/miniconda3/envs/testbed/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 140, in __bool__
      return any(self)
    File "/opt/miniconda3/envs/testbed/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 128, in <genexpr>
      return (c for c in iterator if id(c) not in self._incompatible_ids)
    File "/opt/miniconda3/envs/testbed/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 32, in _iter_built
      candidate = func()
    File "/opt/miniconda3/envs/testbed/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/factory.py", line 209, in _make_candidate_from_link
      version=version,
    File "/opt/miniconda3/envs/testbed/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 301, in __init__
      version=version,
    File "/opt/miniconda3/envs/testbed/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 156, in __init__
      self.dist = self._prepare()
    File "/opt/miniconda3/envs/testbed/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 227, in _prepare
      dist = self._prepare_distribution()
    File "/opt/miniconda3/envs/testbed/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 306, in _prepare_distribution
      self._ireq, parallel_builds=True
    File "/opt/miniconda3/envs/testbed/lib/python3.6/site-packages/pip/_internal/operations/prepare.py", line 508, in prepare_linked_requirement
      return self._prepare_linked_requirement(req, parallel_builds)
    File "/opt/miniconda3/envs/testbed/lib/python3.6/site-packages/pip/_internal/operations/prepare.py", line 552, in _prepare_linked_requirement
      self.download_dir, hashes
    File "/opt/miniconda3/envs/testbed/lib/python3.6/site-packages/pip/_internal/operations/prepare.py", line 243, in unpack_url
      hashes=hashes,
    File "/opt/miniconda3/envs/testbed/lib/python3.6/site-packages/pip/_internal/operations/prepare.py", line 102, in get_http_url
      from_path, content_type = download(link, temp_dir.path)
    File "/opt/miniconda3/envs/testbed/lib/python3.6/site-packages/pip/_internal/network/download.py", line 145, in __call__
      for chunk in chunks:
    File "/opt/miniconda3/envs/testbed/lib/python3.6/site-packages/pip/_internal/cli/progress_bars.py", line 144, in iter
      for x in it:
    File "/opt/miniconda3/envs/testbed/lib/python3.6/site-packages/pip/_internal/network/utils.py", line 87, in response_chunks
      decode_content=False,
    File "/opt/miniconda3/envs/testbed/lib/python3.6/site-packages/pip/_vendor/urllib3/response.py", line 576, in stream
      data = self.read(amt=amt, decode_content=decode_content)
    File "/opt/miniconda3/envs/testbed/lib/python3.6/site-packages/pip/_vendor/urllib3/response.py", line 541, in read
      raise IncompleteRead(self._fp_bytes_read, self.length_remaining)
    File "/opt/miniconda3/envs/testbed/lib/python3.6/contextlib.py", line 99, in __exit__
      self.gen.throw(type, value, traceback)
    File "/opt/miniconda3/envs/testbed/lib/python3.6/site-packages/pip/_vendor/urllib3/response.py", line 443, in _error_catcher
      raise ReadTimeoutError(self._pool, None, "Read timed out.")
  pip._vendor.urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='files.pythonhosted.org', port=443): Read timed out.

System Information

No response

The text was updated successfully, but these errors were encountered:

john-b-yang · 2025-02-05T15:50:41Z

Hi @haoxianghz this looks like some sort of connection issue, as opposed to a SWE-bench bug.

What machine did you run this on? If you use the latest version of SWE-bench, its behavior has been changed to download built images from docker hub online instead of re-building locally. If the problem is truly related to the building of the environment images, the new SWE-bench should circumvent this issue.

haoxianghz added the bug Something isn't working label Jan 18, 2025

MattFisher mentioned this issue Feb 12, 2025

SweBench image registry? UKGovernmentBEIS/inspect_evals#39

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Environment images failed to build in running "swebench.harness.run_evaluation" #293

Environment images failed to build in running "swebench.harness.run_evaluation" #293

haoxianghz commented Jan 18, 2025

john-b-yang commented Feb 5, 2025

Environment images failed to build in running "swebench.harness.run_evaluation" #293

Environment images failed to build in running "swebench.harness.run_evaluation" #293

Comments

haoxianghz commented Jan 18, 2025

Describe the bug

Steps/Code to Reproduce

Expected Results

Actual Results

System Information

john-b-yang commented Feb 5, 2025