-
-
Notifications
You must be signed in to change notification settings - Fork 203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: enhance readability and performance of graph.adjacency.sparse #1667
Conversation
Current Aviator status
This PR was merged manually (without Aviator). Merging manually can negatively impact the performance of the queue. Consider using Aviator next time.
See the real-time status of this PR on the
Aviator webapp.
Use the Aviator Chrome Extension
to see the status of your PR within GitHub.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's understand the pmax()
issue a bit better, silence the messages in the tests for Matrix < 1.6.2, and merge. Good job!
R/adjacency.R
Outdated
adjmatrix <- pmax(adjmatrix, Matrix::t(adjmatrix)) | ||
adjmatrix <- Matrix::tril(adjmatrix) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mmaechler: We tried the code below, but it appears to be even slower. Are pmax()
and pmin()
efficient for sparse matrices?
adjmatrix <- pmax(adjmatrix, Matrix::t(adjmatrix)) | |
adjmatrix <- Matrix::tril(adjmatrix) | |
adjmatrix <- pmax(Matrix::tril(adjmatrix), Matrix::t(Matrix::triu(adjmatrix))) |
For context, we're transforming a directed to an undirected graph, combining edge attributes from the attributes of the original directed edges:
A[i, j] <- max(A[i, j], A[j, i])
The result is a sparse matrix, we're unsure where the overhead comes from:
set.seed(20250130)
n <- 3
m <- Matrix::sparseMatrix(i = rep(1:n, each = n), j = rep(1:n, n), x = runif(n * n))
pmax(m, Matrix::t(m))
#> 3 x 3 sparse Matrix of class "dgCMatrix"
#>
#> [1,] 0.3174528 0.6438151 0.7530661
#> [2,] 0.6438151 0.3822910 0.4231476
#> [3,] 0.7530661 0.4231476 0.2236633
Created on 2025-01-30 with reprex v2.1.1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For more context, we use adjmat=pmax(adjmax,t(adjmax)
to solve the problem atm and we are wondering about the memory footprint. It looks quite high and thus we thought there might be a dense conversion happening somewhere within pmax()
.
bench::mark(
check = FALSE,
old = igraph:::graph.adjacency.sparse(A,mode = "max"),
new = graph.adjacency.sparse.new(A,mode = "max")
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 old 586ms 586ms 1.71 198MB 13.7
#> 2 new 783ms 783ms 1.28 645MB 3.83
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To provide some more context: I got some mixed results when I was running more benchmarks using bench::mark()
so I switched to microbenchmark
. There seems to be a high variability in the runtime of both functions, which explains the mixed results I got. Not sure what to make of this yet.
set.seed(5461)
f1 <- function(adjmatrix) {
adjmatrix <- pmax(adjmatrix, Matrix::t(adjmatrix))
Matrix::tril(adjmatrix)
}
f2 <- function(adjmatrix){
pmax(Matrix::tril(adjmatrix), Matrix::t(Matrix::triu(adjmatrix)))
}
A <- igraph::sample_gnp(5000,0.03,TRUE) |> igraph::as_adjacency_matrix()
A[A==1] <- sample(1:15,sum(A==1),replace = TRUE)
mb <- microbenchmark::microbenchmark(
f1(A),
f2(A)
)
#> Warning in microbenchmark::microbenchmark(f1(A), f2(A)): less accurate
#> nanosecond times to avoid potential integer overflows
mb
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> f1(A) 170.0002 220.5387 224.3328 226.1104 231.3204 274.3551 100
#> f2(A) 117.4338 170.5237 191.4667 176.8457 213.3561 317.0131 100
ggplot2::autoplot(mb)
Created on 2025-02-05 with reprex v2.1.1
Standard output and standard error
-- nothing to show --
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Managed to strip away everything from the pmax()
function that is not needed here and then we have a clearer winner.
set.seed(5461)
f1 <- function(adjmatrix) {
adjmatrix <- pmax(adjmatrix, Matrix::t(adjmatrix))
Matrix::tril(adjmatrix)
}
f2 <- function(adjmatrix){
pmax(Matrix::tril(adjmatrix), Matrix::t(Matrix::triu(adjmatrix)))
}
pmax_AtA <- function(A,B) {
change <- A < B
A[change] <- B[change]
A
}
f3 <- function(adjmatrix){
adjmatrix <- pmax_AtA(adjmatrix, Matrix::t(adjmatrix))
Matrix::tril(adjmatrix)
}
f4 <- function(adjmatrix){
pmax_AtA(Matrix::tril(adjmatrix), Matrix::t(Matrix::triu(adjmatrix)))
}
A <- igraph::sample_gnp(5000,0.03,TRUE) |> igraph::as_adjacency_matrix()
A[A==1] <- sample(1:15,sum(A==1),replace = TRUE)
mb <- microbenchmark::microbenchmark(
f1(A),
f2(A),
f3(A),
f4(A)
)
#> Warning in microbenchmark::microbenchmark(f1(A), f2(A), f3(A), f4(A)): less
#> accurate nanosecond times to avoid potential integer overflows
mb
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> f1(A) 170.06214 220.47717 223.22230 224.72250 228.85255 274.9699 100
#> f2(A) 120.94934 171.71962 189.97005 176.42294 213.91762 333.1371 100
#> f3(A) 98.58450 103.73449 117.82620 107.91879 136.19962 181.9903 100
#> f4(A) 47.06476 51.58171 63.10618 55.08414 58.32074 124.1383 100
ggplot2::autoplot(mb)
Created on 2025-02-05 with reprex v2.1.1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, please merge when good.
9c253c4
to
c7b79ea
Compare
This PR is a large refactoring of
graph.adjacency.sparse()
. The goal was to make the function more readable and remove some performance bottlenecks.The only cases that has a slightly worse performance in some cases are
mode = "min"
andmode = "max"
(However, there are some fluctuations I do not fully understand). The new implementation is a simple one-liner. Probably needs some discussions.Created on 2025-01-28 with reprex v2.1.1