Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(shortfin-sd) Multi-device program initialization fails in SPX mode #467

Closed
monorimet opened this issue Nov 10, 2024 · 2 comments
Closed

Comments

@monorimet
Copy link
Contributor

With caching allocator and async allocations disabled:

~/SHARK-Platform/shortfin$ SHORTFIN_ALLOCATORS=caching SHORTFIN_AMDGPU_LOGICAL_DEVICES_PER_PHYSICAL_DEVICE=1 python -m shortfin_apps.sd.server --model_config=./python/shortfin_apps/sd/examples/sdxl_config_i8.json --device=amdgpu --fibers_per_device=4 --workers_per_device=1 --isolation="none" --flagfile=./python/shortfin_apps/sd/examples/sdxl_flags_gfx942.txt --build_preference=compile --device_ids 1 2
[2024-11-10 09:47:53.078] [info] Configure allocator amdgpu:0:0@0 = [caching]
[2024-11-10 09:47:53.491] [info] Configure allocator amdgpu:1:0@0 = [caching]
INFO:shortfin_apps.sd.components.manager:Created local system with ['amdgpu:1:0@0', 'amdgpu:0:0@0'] devices
Servicing 4 outstanding tasks
Completed BuildFile[bin](sdxl/stable_diffusion_xl_base_1_0_clip_bs1_64_fp16_amdgpu-gfx942.vmfb)
Completed BuildFile[gen](sdxl/stable_diffusion_xl_base_1_0_clip_dataset_fp16.irpa)
Completed BuildFile[gen](sdxl/stable_diffusion_xl_base_1_0_clip_bs1_64_fp16.mlir)
Servicing 1 outstanding tasks
Completed BuildEntrypoint(path='sdxl')
Servicing 4 outstanding tasks
Completed BuildFile[bin](sdxl/stable_diffusion_xl_base_1_0_punet_bs1_64_1024x1024_i8_amdgpu-gfx942.vmfb)
Completed BuildFile[gen](sdxl/stable_diffusion_xl_base_1_0_punet_bs1_64_1024x1024_i8.mlir)
Completed BuildFile[gen](sdxl/stable_diffusion_xl_base_1_0_punet_dataset_i8.irpa)
Servicing 1 outstanding tasks
Completed BuildEntrypoint(path='sdxl')
Servicing 4 outstanding tasks
Completed BuildFile[gen](sdxl/stable_diffusion_xl_base_1_0_vae_dataset_fp16.irpa)
Completed BuildFile[gen](sdxl/stable_diffusion_xl_base_1_0_vae_bs1_1024x1024_fp16.mlir)
Completed BuildFile[bin](sdxl/stable_diffusion_xl_base_1_0_vae_bs1_1024x1024_fp16_amdgpu-gfx942.vmfb)
Servicing 1 outstanding tasks
Completed BuildEntrypoint(path='sdxl')
Servicing 3 outstanding tasks
Completed BuildFile[gen](sdxl/stable_diffusion_xl_base_1_0_EulerDiscreteScheduler_bs1_1024x1024_fp16.mlir)
Completed BuildFile[bin](sdxl/stable_diffusion_xl_base_1_0_EulerDiscreteScheduler_bs1_1024x1024_fp16_amdgpu-gfx942.vmfb)
Servicing 1 outstanding tasks
Completed BuildEntrypoint(path='sdxl')
INFO:root:Loading parameter fiber 'model' from: genfiles/sdxl/stable_diffusion_xl_base_1_0_clip_dataset_fp16.irpa
INFO:root:Loading parameter fiber 'model' from: genfiles/sdxl/stable_diffusion_xl_base_1_0_punet_dataset_i8.irpa
INFO:root:Loading parameter fiber 'model' from: genfiles/sdxl/stable_diffusion_xl_base_1_0_vae_dataset_fp16.irpa
INFO:uvicorn.error:Started server process [1742761]
INFO:uvicorn.error:Waiting for application startup.
INFO:shortfin_apps.sd.components.manager:Starting system manager
INFO:root:Initializing service 'sd':
INFO:root:ServiceManager(
  INFERENCE DEVICES : 
     [Device(name='amdgpu:0:0@0', ordinal=0:0, node_affinity=0, capabilities=0x0), Device(name='amdgpu:1:0@0', ordinal=1:0, node_affinity=0, capabilities=0x0)]

  MODEL PARAMS : 
     base model : SDXL 
     output size (H,W) : [[1024, 1024]] 
     max token sequence length : 64 
     classifier free guidance : True 

  SERVICE PARAMS : 
     fibers per device : 4
     program isolation mode : ProgramIsolation.NONE

  INFERENCE MODULES : 
     clip : [ProgramModule('compiled_clip', version=0, exports=[encode_prompts$async(0rrrrrr_rr), encode_prompts(0rrrr_rr), __init(0v_v)])]
     unet : [ProgramModule('compiled_punet', version=0, exports=[main$async(0rrrrrrrr_r), main(0rrrrrr_r), __init(0v_v)])]
     vae : [ProgramModule('compiled_vae', version=0, exports=[decode$async(0rrr_r), decode(0r_r), __init(0v_v)])]
     scheduler : [ProgramModule('compiled_scheduler', version=0, exports=[run_initialize$async(0rrrr_rrrr), run_initialize(0rr_rrrr), run_scale$async(0rrrrrr_rrrr), run_scale(0rrrr_rrrr), run_step$async(0rrrrrr_r), run_step(0rrrr_r), __init(0v_v)])]

  INFERENCE PARAMETERS : 
     clip : [<_shortfin_default.lib.local.StaticProgramParameters object at 0x7f7bae8a0730>]
     unet : [<_shortfin_default.lib.local.StaticProgramParameters object at 0x7f3b5809bd70>]
     vae : [<_shortfin_default.lib.local.StaticProgramParameters object at 0x7f7bae88f870>]
)
INFO:shortfin_apps.sd.components.manager:Shutting down system manager
INFO:root:System manager command processor stopped
ERROR:uvicorn.error:Traceback (most recent call last):
  File "/home/eagarvey/SHARK-Platform/.venv/lib/python3.12/site-packages/starlette/routing.py", line 693, in lifespan
    async with self.lifespan_context(app) as maybe_state:
  File "/usr/local/lib/python3.12/contextlib.py", line 204, in __aenter__
    return await anext(self.gen)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/eagarvey/SHARK-Platform/shortfin/python/shortfin_apps/sd/server.py", line 50, in lifespan
    service.start()
  File "/home/eagarvey/SHARK-Platform/shortfin/python/shortfin_apps/sd/components/service.py", line 136, in start
    self.inference_programs[worker_idx][component] = sf.Program(
                                                     ^^^^^^^^^^^
ValueError: iree/runtime/src/iree/hal/drivers/hip/event_semaphore.c:350: ABORTED; while calling import; while invoking native function hal.device.queue.dealloca; 
[ 0] bytecode compiled_clip.__init:52672 genfiles/sdxl/stable_diffusion_xl_base_1_0_clip_bs1_64_fp16.mlir:3:3

ERROR:uvicorn.error:Application startup failed. Exiting.

The same error occurs with the default allocator and async allocations enabled, using the following server CLI input:

SHORTFIN_AMDGPU_LOGICAL_DEVICES_PER_PHYSICAL_DEVICE=1 python -m shortfin_apps.sd.server --model_config=./python/shortfin_apps/sd/examples/sdxl_config_i8.json --device=amdgpu --fibers_per_device=4 --workers_per_device=1 --isolation="none" --flagfile=./python/shortfin_apps/sd/examples/sdxl_flags_gfx942.txt --build_preference=compile --amdgpu_async_allocations --device_ids 1 2 
@AWoloszyn
Copy link
Contributor

Can you pull this branch:
https://github.com/AWoloszyn/iree/tree/hip-ctx
and see if this fixes it for you. It SEEMS to work for me? but I want to make sure it's the right direction. If it seems ok I will clean it up and get it landed upstream.

@monorimet
Copy link
Contributor Author

monorimet commented Nov 12, 2024

Resolved by iree-org/iree#19103

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants