Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove the gather_index in DataAssembly #875

Open
YingtianDt opened this issue May 27, 2024 · 3 comments
Open

Remove the gather_index in DataAssembly #875

YingtianDt opened this issue May 27, 2024 · 3 comments

Comments

@YingtianDt
Copy link
Contributor

YingtianDt commented May 27, 2024

Hi, I suggest removing gather_index in any DataAssembly types due to the following reasons:

  1. it always create bugs when one dimension only has one coordinate. This creates big confusion for new users;
  2. it has no practical use;
  3. occasionally, multi-index creates more trouble than the normal index: e.g., when we need to add more coordinates.

Though this issue is in brainio repo, I am raising it here since it is more relevant in context.

@mike-ferguson
Copy link
Member

Hi @YingtianDt - thanks for the suggestion and for opening an issue! I will pass this forward to the rest of the team, and we appreciate the contribution.

@mschrimpf
Copy link
Member

related: brain-score/brainio#42

@YingtianDt
Copy link
Contributor Author

YingtianDt commented Jun 4, 2024

Hi, I am moving the discussion of this to this PR, since me and Martin seem to agree on rather removing gather_index but fixing it (potentially with this PR).

With this said, please still be aware of the downsides of using Multi-index. E.g.,

Take a concrete example here:

a = NeuroidAssembly(
    [[0, 1, 2], [3, 4, 5]],
    dims = ('presentation', 'neuroid'),
    coords = {
        'stimulus_id': ('presentation', ['a', 'b']),
        'stimulus_type': ('presentation', ['m', 'n']),
        'neuroid_id': ('neuroid', ['x', 'y', 'z']),
        'neuroid_layer': ('neuroid', ['k', 'f', 'c']),
    }
)
  1. when adding new coordinates, we have to always use gather_index again to maintain the multi-index. e.g.,
a = a.assign_coords(neuroid_weight = ('neuroid', [0.1, 0.2, 0.3]))
a = NeuroidAssembly(a)
  1. when comparing two multi-indices, the ordering of the coordinates within each index matters. e.g.,
b = NeuroidAssembly(
    [[0, 1, 2], [3, 4, 5]],
    dims = ('presentation', 'neuroid'),
    coords = {
        'stimulus_type': ('presentation', ['m', 'n']),
        'stimulus_id': ('presentation', ['a', 'b']),
        'neuroid_id': ('neuroid', ['x', 'y', 'z']),
        'neuroid_layer': ('neuroid', ['k', 'f', 'c']),
    }
) # here the ordering of coordinates of presentation is reversed

print(a.presentation == a.presentation)  # this will work
print(a.presentation == b.presentation)  # this gives empty comparision like the following

# array([], dtype=bool)
# Coordinates:
#   * presentation          (presentation) MultiIndex
#   - presentation_level_0  (presentation) object 
#   - presentation_level_1  (presentation) object

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants