DNS - Data-tiling prototype for targeting multi-device. #18738

hanhanW · 2024-10-10T00:46:04Z

The branch demonstrates how data-tiling + heterogeneous computing run altogether in IREE.

Design Doc: https://hackmd.io/@hwPnnvLBTB-JGVMeh-bCEA/Sy9nvDhb1e

IR dump: https://gist.github.com/hanhanW/5029dc652aec1379102e43e702aaf15b

// Zen4 CPU
#executable_target_embedded_elf_x86_64_with_encoding_solver = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64",
  {cpu = "znver4", cpu_features = "+mmx,+popcnt,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+avx,+avx2,+sse4a,+fma,+avx512f,+bmi,+bmi2,+aes,+pclmul,+avx512vl,+avx512bw,+avx512dq,+avx512cd,+avx512vbmi,+avx512ifma,+avx512vpopcntdq,+avx512vbmi2,+gfni,+vpclmulqdq,+avx512vnni,+avx512bitalg,+avx512bf16,+adx,+clflushopt,+clwb,+clzero,+cx16,+cx8,+f16c,+fsgsbase,+crc32,+invpcid,+rdpru,+sahf,+lzcnt,+movbe,+mwaitx,+x87,+pku,+evex512,+prfchw,+rdpid,+rdrnd,+rdseed,+sha,+shstk,+vaes,+wbnoinvd,+xsave,+xsavec,+xsaveopt,+xsaves,+fxsr",
   data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128",
   native_vector_size = 64 : i64,
   target_triple = "x86_64-unknown-unknown-eabi-elf",
   encoding_solver = #iree_cpu.cpu_encoding_solver<>
   }>

// VMVX with ukernels enabled.
#executable_target_vmvx_bytecode_fb = #hal.executable.target<"vmvx", "vmvx-bytecode-fb", {encoding_solver = #iree_cpu.vmvx_encoding_solver<>, ukernels = "all"}>

util.global private @device_a = #hal.device.target<"local", {ordinal = 0 : index}, [
  #executable_target_embedded_elf_x86_64_with_encoding_solver
]> : !hal.device
util.global private @device_b = #hal.device.target<"local", {ordinal = 1 : index}, [
  #executable_target_vmvx_bytecode_fb
]> : !hal.device

func.func @foo(
  %lhs: tensor<?x?xf32> {iree.abi.affinity = #hal.device.affinity<@device_a>},
  %rhs: tensor<?x?xf32> {iree.abi.affinity = #hal.device.affinity<@device_a>}) -> (tensor<?x?xf32> {iree.abi.affinity = #hal.device.affinity<@device_a>}) {

  %c0 = arith.constant 0 : index
  %c1 = arith.constant 1 : index
  %M = tensor.dim %lhs, %c0 : tensor<?x?xf32>
  %K = tensor.dim %lhs, %c1 : tensor<?x?xf32>
  %N = tensor.dim %rhs, %c1 : tensor<?x?xf32>
  %cst = arith.constant 0.0 : f32
  %init = tensor.empty(%M, %N) : tensor<?x?xf32>
  %fill = linalg.fill ins(%cst : f32) outs(%init : tensor<?x?xf32>) -> tensor<?x?xf32>
  %op = linalg.matmul
      ins(%lhs, %rhs : tensor<?x?xf32>, tensor<?x?xf32>)
      outs(%fill : tensor<?x?xf32>) -> tensor<?x?xf32>
  // Execute matmul on device_a and transfer the result to device_b
  %transient_op = flow.tensor.transfer %op : tensor<?x?xf32>{%M, %N} to #hal.device.affinity<@device_b>

  // Transfer input data to device_b
  %lhsb = flow.tensor.transfer %lhs : tensor<?x?xf32>{%M, %K} to #hal.device.affinity<@device_b>
  %rhsb = flow.tensor.transfer %rhs : tensor<?x?xf32>{%K, %N} to #hal.device.affinity<@device_b>
  %initb = tensor.empty(%M, %N) : tensor<?x?xf32>
  %fillb = linalg.fill ins(%cst : f32) outs(%initb : tensor<?x?xf32>) -> tensor<?x?xf32>
  // Execute matmul on device_b and accumulate the result and the result from device_a.
  %opb = linalg.matmul
      ins(%lhsb, %rhsb : tensor<?x?xf32>, tensor<?x?xf32>)
      outs(%fillb : tensor<?x?xf32>) -> tensor<?x?xf32>
  %add = arith.addf %transient_op, %opb : tensor<?x?xf32>

  // Transfer the result from device_b -> device_a.
  %result_a = flow.tensor.transfer %add : tensor<?x?xf32>{%M, %N} to #hal.device.affinity<@device_a>

  // Return the result on device_a.
  func.return %result_a : tensor<?x?xf32>
}

# Compilation
iree-compile --iree-execution-model=async-external ~/matmul.mlir -o /tmp/z.vmfb --iree-global-opt-enable-early-materialization=false

# Execution
iree-run-module --module=/tmp/z.vmfb --function=foo --input=2x3xf32=1,2,3,4,5,6 --input=3x5xf32=1 --device=local-task --device=local-task

# EXEC @foo
# result[0]: hal.buffer_view
# 2x5xf32=[12 12 12 12 12][30 30 30 30 30]

ScottTodd · 2024-11-06T20:37:10Z

@hanhanW do you want to keep this PR open with the "check bazel deps" title? Seems like you are pushing to this branch fairly regularly.

hanhanW · 2024-11-06T21:26:00Z

@hanhanW do you want to keep this PR open with the "check bazel deps" title? Seems like you are pushing to this branch fairly regularly.

I forgot that I have this PR when I was updating my branch; I totally forgot that the title was not updated. :/ Sorry about that, I updated the title.

Signed-off-by: hanhanW <[email protected]>

This reverts commit b0944b9. Signed-off-by: hanhanW <[email protected]>

Signed-off-by: hanhanW <[email protected]>

…e changes)" This reverts commit a8daf79. Signed-off-by: hanhanW <[email protected]>

This is the third take. It introduces a "cloneWithConfig" interface method to solve dup config issue. Signed-off-by: hanhanW <[email protected]>

Signed-off-by: hanhanW <[email protected]>

The prototype is not the final state, ideally we should introduce the other attribute interface to handle materializations. The current implementation works only if the encoding target is as same as the execution target. Signed-off-by: hanhanW <[email protected]>

Signed-off-by: hanhanW <[email protected]>

The boundary operations (i.e., bindings, flow.tensor.load/store, etc) has EncodingSolver attached. It defines the layout for inputs/outputs. The encodings on the operations (e.g., compute ops) captures all the original encoding fields and we do know that they will be executed on the execution device, so we should be able to insert some ops which bring it from layout_a to layout_b. It adds the indexing maps back, we can revisit it later. Signed-off-by: hanhanW <[email protected]>

Note that this should be an issue when we land it to the main branch, because the refactoring should happen and they will use the same code. Signed-off-by: hanhanW <[email protected]>

Signed-off-by: hanhanW <[email protected]>

hanhanW force-pushed the hanhan-encoding-interface-prototype-v2 branch 5 times, most recently from 8105d67 to 84fe339 Compare October 12, 2024 00:55

hanhanW force-pushed the hanhan-encoding-interface-prototype-v2 branch from c76513d to e992c93 Compare October 28, 2024 22:53

hanhanW force-pushed the hanhan-encoding-interface-prototype-v2 branch from 1812f5e to a992fbf Compare November 6, 2024 20:17

hanhanW changed the title ~~DNS - check bazel deps~~ DNS - Data-tiling prototype for targeting multi-device. Nov 6, 2024

hanhanW force-pushed the hanhan-encoding-interface-prototype-v2 branch from a992fbf to 224a47a Compare November 7, 2024 23:14

hanhanW added 6 commits November 7, 2024 16:30

[Encoding] Introduce EncodingSolverInterfaceAttr attribute interface.

f1eb945

Signed-off-by: hanhanW <[email protected]>

[Encoding] Add targets field to EncodingAttr.

a58b3fd

Signed-off-by: hanhanW <[email protected]>

Implement Stream::AffinityAnalysisDialectInterface for HAL dialect.

b2f1e19

Signed-off-by: hanhanW <[email protected]>

Iterate all the registered dialect to fix empty resolver issue.

06120cd

Signed-off-by: hanhanW <[email protected]>

Move dialect interface to StreamInterfaces.h

470e17a

Signed-off-by: hanhanW <[email protected]>

WIP gather execution/resource affinities -- need input

8ee40ae

Signed-off-by: hanhanW <[email protected]>

hanhanW force-pushed the hanhan-encoding-interface-prototype-v2 branch 5 times, most recently from 6c8e63b to a4966c0 Compare November 11, 2024 21:37

hanhanW added 8 commits November 11, 2024 14:02

Introduce CPU dialect and VMVXEncodingSolverAttr - need to review design

4c580f5

Signed-off-by: hanhanW <[email protected]>

Fix dup config issue by wrapping the config into solvers.

91e9f10

Signed-off-by: hanhanW <[email protected]>

Revert "Fix dup config issue by wrapping the config into solvers."

7a3efad

This reverts commit b0944b9. Signed-off-by: hanhanW <[email protected]>

Second take for removing duplication config (w/o HAL attribute changes)

117c4a5

Signed-off-by: hanhanW <[email protected]>

Clone stream.executables and update stream.async.dispatch's entry points

4f838f5

Signed-off-by: hanhanW <[email protected]>

Fix mapping bugs when cloning executables

fbcc927

Signed-off-by: hanhanW <[email protected]>

Finish SpecializeEncodingPass! (update TypeInterface and FlowTensorType)

afc5349

Signed-off-by: hanhanW <[email protected]>

Revert "Second take for removing duplication config (w/o HAL attribut…

fc967c1

…e changes)" This reverts commit a8daf79. Signed-off-by: hanhanW <[email protected]>

hanhanW added 7 commits November 11, 2024 14:02

Attach encoding_solvers to VMVX target configuration.

44ec2b8

This is the third take. It introduces a "cloneWithConfig" interface method to solve dup config issue. Signed-off-by: hanhanW <[email protected]>

Rename MakeEncodingSolvable to SpecializeEncodings

d8fa700

Signed-off-by: hanhanW <[email protected]>

Teach EncodeHostEnsors about encodings with targets field.

81b7f74

Signed-off-by: hanhanW <[email protected]>

Implement OpAsmDialectInterface for encoding attributes/interfaces.

cf10808

Signed-off-by: hanhanW <[email protected]>

Implement CPUEncodingSolverAttr.

4286d79

Signed-off-by: hanhanW <[email protected]>

fix lint

a07be10

Signed-off-by: hanhanW <[email protected]>

hanhanW force-pushed the hanhan-encoding-interface-prototype-v2 branch from a4966c0 to a07be10 Compare November 11, 2024 22:04

hanhanW mentioned this pull request Nov 15, 2024

[DT] Plans for the buffer allocation in data-tiling #17924

Open

hanhanW added 2 commits November 18, 2024 16:21

Simplfiy the config in the interface to "layout".

6a8f2cc

Signed-off-by: hanhanW <[email protected]>

hanhanW force-pushed the hanhan-encoding-interface-prototype-v2 branch from 0e19402 to 65c1d4f Compare November 19, 2024 00:56

hanhanW added 2 commits November 19, 2024 11:39

fix vmvx e2e -- the copied code needs to be consisten with exising code

2d4e5c9

Note that this should be an issue when we land it to the main branch, because the refactoring should happen and they will use the same code. Signed-off-by: hanhanW <[email protected]>

Add CPUEncodingSolver to CPU targets

27b2905

Signed-off-by: hanhanW <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DNS - Data-tiling prototype for targeting multi-device. #18738

DNS - Data-tiling prototype for targeting multi-device. #18738

hanhanW commented Oct 10, 2024 •

edited

Loading

ScottTodd commented Nov 6, 2024

hanhanW commented Nov 6, 2024 •

edited

Loading

DNS - Data-tiling prototype for targeting multi-device. #18738

Are you sure you want to change the base?

DNS - Data-tiling prototype for targeting multi-device. #18738

Conversation

hanhanW commented Oct 10, 2024 • edited Loading

ScottTodd commented Nov 6, 2024

hanhanW commented Nov 6, 2024 • edited Loading

hanhanW commented Oct 10, 2024 •

edited

Loading

hanhanW commented Nov 6, 2024 •

edited

Loading