Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: In function [: "j" must not have duplicated values #183

Open
alasaa opened this issue Dec 25, 2023 · 9 comments
Open

Error: In function [: "j" must not have duplicated values #183

alasaa opened this issue Dec 25, 2023 · 9 comments

Comments

@alasaa
Copy link

alasaa commented Dec 25, 2023

When I use 'DoubletFinder' with seurat5 + BPCells, I got the Error:

sweep.res.list_obj <- paramSweep(obj, PCs = 1:40, sct = FALSE)
[1] "Creating artificial doublets for pN = 5%"
Error: In function [: "j" must not have duplicated values

  1. └─DoubletFinder::paramSweep(obj, PCs = 1:40, sct = FALSE)
  2. └─base::lapply(...)
  3. └─DoubletFinder (local) FUN(X[[i]], ...)
    
  4.   ├─data[, real.cells1]
    
  5.   └─data[, real.cells1]
    
  6.     └─BPCells:::selection_index(j, ncol(x), colnames(x))\
    

packageVersion('DoubletFinder')
[1] ‘2.0.4’
packageVersion('Seurat')
[1] ‘5.0.1’
packageVersion('BPCells')
[1] ‘0.1.0’

Previous posts have raised similar questions, but ours seems to be different?
#161

Although the authors said "DoubletFinder should work with Seurat V5 now", I still cannot use 'DoubletFinder' with seurat5 + BPCells.
Is it my problem? Could someone help me, please?

@veeramv
Copy link

veeramv commented Jan 11, 2024

Re-install Double finder its updated >2.0.4 it works well now :)
remotes::install_github("chris-mcginnis-ucsf/DoubletFinder")

good luck :)

@ggruenhagen3
Copy link

I'm facing the same issue. I'm using DoubletFinder v2.0.4, BPCells v0.1.0, Seurat v5.0.0, and R 4.3.1. From the error it seems that the issue is that DoubletFinder is trying to subset by a list of cells that contains a single cell multiple times, but BPCells does not allow indexing that way. Seems like this issue might require attention from the DoubletFinder and/or BPCells team.

As a temporary solution, I found that converting your assays to the old versions allows DoubletFinder to work.

obj[["RNA"]] = as(obj[["RNA"]], "Assay")
obj[["SCT"]] = as(obj[["SCT"]], "Assay")
sweep.list = paramSweep(obj, PCs = 1:30, sct = TRUE)

@nitinmahajan20
Copy link

This worked for me as well, but I had to use v3 paramSweep_v3

obj[["RNA"]] = as(obj[["RNA"]], "Assay")
obj[["SCT"]] = as(obj[["SCT"]], "Assay")
sweep.list = paramSweep_v3(obj, PCs = 1:30, sct = TRUE)

@Pentayouth
Copy link

New doubletfinder only works for those Seurat V5 objects without BPCell preprocess

@simang5c
Copy link

simang5c commented Oct 24, 2024

Hi,

I've encountered the same issue you're facing. The problem arises due to the type of data you're working with. Specifically:

  1. In paramsweep.R, line 18, you are trying to load the counts matrix using the following line:
data <- seu@assays$RNA$counts[, real.cells]

However, the object seu@assays$RNA$counts is stored as a BPCells data type. This is an optimized data structure used to handle large single-cell RNA-seq datasets efficiently. Because of this, the matrix is not fully loaded into memory when you access it like this. Instead, it's handled in chunks or "iteratively," meaning you're not dealing with a typical matrix that you can manipulate directly.

  1. To correctly load the matrix and perform operations on it, you need to convert it into a format that can be handled in memory, such as a sparse matrix. You can do this using the following command:
library(Matrix)
data <- as(seu[["RNA"]]$counts[, real.cells], "dgCMatrix")

This will convert the BPCells matrix into a sparse matrix (dgCMatrix), which is memory-efficient and allows you to perform standard matrix operations.

I hope it helps.

Cheers :)

@chris-mcginnis-ucsf
Copy link
Owner

@haibol2016 thoughts?

@haibol2016
Copy link
Collaborator

@alasaa, @chris-mcginnis-ucsf , @simang5c

I think @simang5c's solution works. I personally haven't try BPcells yet, which seems very helpful for scRNA-seq data with millions of cells. I will try later. After my update, the package is tested and I am sure it works with Seurat v3 and v5 smoothly.

Haibo

@ggruenhagen3
Copy link

I think as long as a cell isn't attempted to be retrieved multiple times, BPCells could still work, which would be ideal to get its speed-ups.

@yxsee
Copy link

yxsee commented Feb 13, 2025

Just to chime in, this error impacts both paramSweep and doubletFinder. I found that data <- as(counts[, real.cells], "dgCMatrix") only works if real.cells contains all cells. Otherwise, the entire matrix needs to be loaded into memory by converting to dgCMatrix format before subsetting: data <- as(counts, "dgCMatrix")[, real.cells] This will work if the matrix is not too big, but may incur high memory usage for large datasets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants