Pass -foffload-lto instead of -flto for cuda/hip kernels in clangLinkerWrapper #16605

omarahmed1111 · 2025-01-13T15:12:34Z

ClangLinkerWrapper tool in one of its clang commands to generate ptx kernel binary from llvm bitcode kernel was using -flto option which should be only used for cpu code not gpu kernel code. This PR fixes that by changing that to -foffload-lto for cuda/hip kernels.

This fixes 16413 issue.

…led tests

sarnex · 2025-01-14T14:57:28Z

clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp

  for (auto &Arg : Args.filtered(OPT_offload_opt_eq_minus, OPT_mllvm))
    CmdArgs.append(
        {"-Xlinker",
         Args.MakeArgString("--plugin-opt=" + StringRef(Arg->getValue()))});

+  if (Triple.isNVPTX() || Triple.isAMDGPU()) {
+    CmdArgs.push_back("-foffload-lto");


should this go upstream?

+1 on adding this change to upstream code. I am also wondering why it is not an issue in the upstream compiler and why we see it only with SYCL offloading.

Also @omarahmed1111 can you please comment on the test failure?

Thanks

The test failure for loop_extended is a flaky failure not related to this PR, the failed test disabled on this PR: #16612 . I updated the PR, so should be solved now.

sarnex · 2025-01-27T19:04:43Z

clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp

  for (auto &Arg : Args.filtered(OPT_offload_opt_eq_minus, OPT_mllvm))
    CmdArgs.append(
        {"-Xlinker",
         Args.MakeArgString("--plugin-opt=" + StringRef(Arg->getValue()))});

+  if (Triple.isNVPTX() || Triple.isAMDGPU()) {


is there an upstream pr for this based on previous feedback?

I thought we are going to do this after merging this. I will open one for that.

Ok as long as you'll make an upstream PR lgtm to merge here first

sarnex

lgtm assuming future upstream pr

omarahmed1111 · 2025-01-29T10:41:58Z

@intel/llvm-gatekeepers please merge, Thanks!

omarahmed1111 · 2025-01-31T16:02:03Z

@sarnex llvm upstream PR: llvm/llvm-project#125243

omarahmed1111 force-pushed the fix-NewOffloadDrivers-failed-tests branch from 5b77ca9 to d24ae0e Compare January 13, 2025 15:15

omarahmed1111 had a problem deploying to WindowsCILock January 13, 2025 15:15 — with GitHub Actions Error

omarahmed1111 force-pushed the fix-NewOffloadDrivers-failed-tests branch from d24ae0e to 8615f88 Compare January 13, 2025 15:18

omarahmed1111 temporarily deployed to WindowsCILock January 13, 2025 15:19 — with GitHub Actions Inactive

omarahmed1111 had a problem deploying to WindowsCILock January 13, 2025 16:00 — with GitHub Actions Error

omarahmed1111 force-pushed the fix-NewOffloadDrivers-failed-tests branch from 8615f88 to 6c7fbf7 Compare January 13, 2025 16:36

omarahmed1111 had a problem deploying to WindowsCILock January 13, 2025 16:38 — with GitHub Actions Error

omarahmed1111 force-pushed the fix-NewOffloadDrivers-failed-tests branch from 6c7fbf7 to da775da Compare January 13, 2025 16:54

omarahmed1111 had a problem deploying to WindowsCILock January 13, 2025 16:56 — with GitHub Actions Error

omarahmed1111 force-pushed the fix-NewOffloadDrivers-failed-tests branch from da775da to 633d90b Compare January 13, 2025 17:11

omarahmed1111 had a problem deploying to WindowsCILock January 13, 2025 17:12 — with GitHub Actions Error

Pass -offload-lto instead of -lto for cuda/hip kernels and enable fai…

ccf221a

…led tests

omarahmed1111 force-pushed the fix-NewOffloadDrivers-failed-tests branch from 633d90b to ccf221a Compare January 13, 2025 17:26

omarahmed1111 temporarily deployed to WindowsCILock January 13, 2025 17:28 — with GitHub Actions Inactive

omarahmed1111 temporarily deployed to WindowsCILock January 13, 2025 18:30 — with GitHub Actions Inactive

omarahmed1111 marked this pull request as ready for review January 13, 2025 21:56

omarahmed1111 requested review from a team as code owners January 13, 2025 21:56

sarnex reviewed Jan 14, 2025

View reviewed changes

Merge branch 'sycl' into fix-NewOffloadDrivers-failed-tests

a0f881c

omarahmed1111 temporarily deployed to WindowsCILock January 15, 2025 10:51 — with GitHub Actions Inactive

omarahmed1111 temporarily deployed to WindowsCILock January 15, 2025 11:28 — with GitHub Actions Inactive

omarahmed1111 requested a review from sarnex January 27, 2025 10:12

sarnex reviewed Jan 27, 2025

View reviewed changes

sarnex approved these changes Jan 28, 2025

View reviewed changes

mdtoguchi approved these changes Jan 28, 2025

View reviewed changes

martygrant merged commit 11a73e7 into intel:sycl Jan 29, 2025
17 checks passed

omarahmed1111 mentioned this pull request Feb 4, 2025

Return the usage of -flto and use -lto-emit-asm instead #16884

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pass -foffload-lto instead of -flto for cuda/hip kernels in clangLinkerWrapper #16605

Pass -foffload-lto instead of -flto for cuda/hip kernels in clangLinkerWrapper #16605

omarahmed1111 commented Jan 13, 2025

sarnex Jan 14, 2025

asudarsa Jan 14, 2025 •

edited

Loading

omarahmed1111 Jan 15, 2025

sarnex Jan 27, 2025

omarahmed1111 Jan 28, 2025

sarnex Jan 28, 2025

sarnex left a comment

omarahmed1111 commented Jan 29, 2025

omarahmed1111 commented Jan 31, 2025

Pass -foffload-lto instead of -flto for cuda/hip kernels in clangLinkerWrapper #16605

Pass -foffload-lto instead of -flto for cuda/hip kernels in clangLinkerWrapper #16605

Conversation

omarahmed1111 commented Jan 13, 2025

sarnex Jan 14, 2025

Choose a reason for hiding this comment

asudarsa Jan 14, 2025 • edited Loading

Choose a reason for hiding this comment

omarahmed1111 Jan 15, 2025

Choose a reason for hiding this comment

sarnex Jan 27, 2025

Choose a reason for hiding this comment

omarahmed1111 Jan 28, 2025

Choose a reason for hiding this comment

sarnex Jan 28, 2025

Choose a reason for hiding this comment

sarnex left a comment

Choose a reason for hiding this comment

omarahmed1111 commented Jan 29, 2025

omarahmed1111 commented Jan 31, 2025

asudarsa Jan 14, 2025 •

edited

Loading