Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about multi-threading #2

Open
MolVlad opened this issue Jan 22, 2025 · 1 comment
Open

Question about multi-threading #2

MolVlad opened this issue Jan 22, 2025 · 1 comment

Comments

@MolVlad
Copy link

MolVlad commented Jan 22, 2025

I wonder whether it is possible to achieve further acceleration of decoding one frame by using multi-threading. Have you studied this question? Thanks in advance!

@williamyang98
Copy link
Owner

Parallelisation inside one symbol

It should be possible to parallelise this for loop inside the butterfly.

for (size_t curr_state = 0u; curr_state < Base::BranchTable::NUMSTATES; curr_state++) {

For codes with a large constraint length like Cassini (K=15, NUMSTATES = 2^(K-2) = 8192), there should be performance improvements with multithreading if the overhead is low enough. (Probably wouldn't be beneficial K is small and the NUMSTATES is small).

Parallelisation between all symbols

As for parallising the entire decoding of the symbol frame

for (size_t state = 0u; state < NUMSTATES; state++) {

Branch error metrics for each symbol is calculated from the error metrics from the previous symbol. I'm not sure how you would eliminate or work around this data dependency to make parallelisation possible.

// r = leading bit of previous state
// s = input bit
// g = parity bit corresponding to output symbol
// next_error_r_s = r^s^g
const error_t next_error_0_0 = old_metric[curr_state_0] + total_error;
const error_t next_error_1_0 = old_metric[curr_state_1] + inverted_error;
const error_t next_error_0_1 = old_metric[curr_state_0] + inverted_error;
const error_t next_error_1_1 = old_metric[curr_state_1] + total_error;
// TODO: When adding our error metrics we may cause an overflow to happen
// if the renormalisation step was not performed in time
// Our intrinsics implementations use saturated arithmetic to prevent an overflow
// Perhaps it is possible to use something like GCC's builtin saturated add here?
// Select the previous state r with a lower error for an input bit s
const decision_bits_t decision_0 = next_error_0_0 > next_error_1_0;
const decision_bits_t decision_1 = next_error_0_1 > next_error_1_1;
// Update metrics
new_metric[next_state_0] = decision_0 ? next_error_1_0 : next_error_0_0;
new_metric[next_state_1] = decision_1 ? next_error_1_1 : next_error_0_1;
// Store the leading bit for the previous state for the next states (X|0) and (X|1)
const decision_bits_t bits = decision_0 | (decision_1 << 1);
const size_t curr_pack_index = next_state_0 / Base::Decisions::TOTAL_BITS_PER_BLOCK;
const size_t curr_pack_bit = next_state_0 % Base::Decisions::TOTAL_BITS_PER_BLOCK;
decision[curr_pack_index] |= (bits << curr_pack_bit);

You could calculate the branch errors for each symbol in parallel

for (size_t i = 0; i < Base::R; i++) {
const soft_t sym = symbols[i];
const soft_t expected_sym = base.m_branch_table[i][curr_state];
const soft_t error = expected_sym - sym;
const error_t abs_error = error_t(get_abs(error));
total_error += abs_error;
}
assert(total_error <= base.m_config.soft_decision_max_error);
// We only store half the states in the branch table, but here we expand it out to explore the other unstored half
// Both state 0 and state 1 when shifted give the same next state (for the same input bit)
// Most significant bit for state 1 is shifted out when considering the next state
// since it is shifted out of the (K-1) bits considered for the next state
const size_t curr_state_0 = curr_state; // Goes up to K-2 bits
const size_t curr_state_1 = curr_state + Base::Metrics::NUMSTATES/2; // Adds (K-1)th bit
const size_t next_state_0 = (curr_state << 1) | 0;
const size_t next_state_1 = (curr_state << 1) | 1;

and then do some kind of reduction (add error, find min error, repeat) to get the final decison bits (this part would not be parallel).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants