Skip to content
This repository has been archived by the owner on Mar 20, 2024. It is now read-only.

Commit

Permalink
Use a3 for widening multiply in ch6.4
Browse files Browse the repository at this point in the history
This change makes the assembly code more clear and consistent, as x10 (a0) is already used to hold the total number of elements to process, while a3 is an argument register in the function calling convention that is suitable for the multiplier. This way, the programmer can easily understand the source and purpose of the multiplier for the widening multiplication.
  • Loading branch information
Shao-Ce SUN committed Jun 15, 2023
1 parent 66fcbf4 commit b426d7e
Showing 1 changed file with 2 additions and 1 deletion.
3 changes: 2 additions & 1 deletion v-spec.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -1331,14 +1331,15 @@ throughput on mixed-width operations in a single loop.
# a0 holds the total number of elements to process
# a1 holds the address of the source array
# a2 holds the address of the destination array
# a3 holds the multiplier for the widening multiplication
loop:
vsetvli a3, a0, e16, m4, ta, ma # vtype = 16-bit integer vectors;
# also update a3 with vl (# of elements this iteration)
vle16.v v4, (a1) # Get 16b vector
slli t1, a3, 1 # Multiply # elements this iteration by 2 bytes/source element
add a1, a1, t1 # Bump pointer
vwmul.vx v8, v4, x10 # Widening multiply into 32b in <v8--v15>
vwmul.vx v8, v4, a3 # Widening multiply into 32b in <v8--v15>
vsetvli x0, x0, e32, m8, ta, ma # Operate on 32b values
vsrl.vi v8, v8, 3
Expand Down

0 comments on commit b426d7e

Please sign in to comment.