Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce new Chunk iteration APIs #8553

Merged
merged 7 commits into from
Jan 6, 2025
Merged

Introduce new Chunk iteration APIs #8553

merged 7 commits into from
Jan 6, 2025

Conversation

teh-cmc
Copy link
Member

@teh-cmc teh-cmc commented Dec 20, 2024

  • Way more ergonomic
  • Same performance
  • Less code
  • Support for structs!
  • Basis for new upcoming codegen'd deserialization APIs (...at some point)

@teh-cmc teh-cmc added enhancement New feature or request 🏹 arrow concerning arrow 🧑‍💻 dev experience developer experience (excluding CI) 🔍 re_query affects re_query itself include in changelog labels Dec 20, 2024
Copy link

github-actions bot commented Dec 20, 2024

Web viewer built successfully. If applicable, you should also test it:

  • I have tested the web viewer
Result Commit Link Manifest
c77bd2a https://rerun.io/viewer/pr/8553 +nightly +main

Note: This comment is updated whenever you push a commit.

@teh-cmc teh-cmc marked this pull request as ready for review December 20, 2024 11:00
@Wumpf Wumpf self-requested a review January 3, 2025 12:15
Copy link
Member

@Wumpf Wumpf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noice

Comment on lines +871 to +921
let Some(inner_list_array) = array.as_any().downcast_ref::<Arrow2ListArray<i32>>() else {
if cfg!(debug_assertions) {
panic!("downcast failed for {component_name}, data discarded");
} else {
re_log::error_once!("downcast failed for {component_name}, data discarded");
}
return Either::Left(std::iter::empty());
};

let inner_offsets = inner_list_array.offsets();
let inner_lengths = inner_list_array.offsets().lengths().collect_vec();

let Some(fixed_size_list_array) = inner_list_array
.values()
.as_any()
.downcast_ref::<Arrow2FixedSizeListArray>()
else {
if cfg!(debug_assertions) {
panic!("downcast failed for {component_name}, data discarded");
} else {
re_log::error_once!("downcast failed for {component_name}, data discarded");
}
return Either::Left(std::iter::empty());
};

let Some(values) = fixed_size_list_array
.values()
.as_any()
.downcast_ref::<Arrow2PrimitiveArray<T>>()
else {
if cfg!(debug_assertions) {
panic!("downcast failed for {component_name}, data discarded");
} else {
re_log::error_once!("downcast failed for {component_name}, data discarded");
}
return Either::Left(std::iter::empty());
};

let size = fixed_size_list_array.size();
let values = values.values();

// NOTE: No need for validity checks here, `iter_offsets` already takes care of that.
Either::Right(component_offsets.map(move |(idx, len)| {
let inner_offsets = &inner_offsets.as_slice()[idx..idx + len];
let inner_lengths = &inner_lengths.as_slice()[idx..idx + len];
izip!(inner_offsets, inner_lengths)
.map(|(&idx, &len)| {
let idx = idx as usize;
bytemuck::cast_slice(&values[idx * size..idx * size + len * size])
})
.collect_vec()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

everything except collect_vec is not dependent on N -> we likely can save a lot of template instantiations by putting this into a second level utility method that only depends on T but not on N

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I'm ready to add even more layers and complex return types to all of this (we need the intermediary values too...) for something I don't really have a way to measure right now :/ Also this is private, no the number of instantiations is fixed, at least.

@teh-cmc teh-cmc merged commit 12f8e06 into main Jan 6, 2025
27 of 29 checks passed
@teh-cmc teh-cmc deleted the cmc/new_chunk_iter branch January 6, 2025 08:39
teh-cmc added a commit that referenced this pull request Jan 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏹 arrow concerning arrow 🧑‍💻 dev experience developer experience (excluding CI) enhancement New feature or request include in changelog 🔍 re_query affects re_query itself
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants