Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: enhance readability and performance of graph.adjacency.sparse #1667

Merged
merged 5 commits into from
Feb 6, 2025

Conversation

schochastics
Copy link
Contributor

@schochastics schochastics commented Jan 28, 2025

This PR is a large refactoring of graph.adjacency.sparse(). The goal was to make the function more readable and remove some performance bottlenecks.

The only cases that has a slightly worse performance in some cases are mode = "min" and mode = "max" (However, there are some fluctuations I do not fully understand). The new implementation is a simple one-liner. Probably needs some discussions.

library(igraph)
packageVersion("igraph")
#> [1] '2.1.3'

A <- sample_gnp(5000,0.02, directed = FALSE) |> as_adjacency_matrix()

bench::mark(
  check = FALSE,
  old = igraph:::graph.adjacency.sparse(A,mode = "undirected"),
  new = graph.adjacency.sparse.new(A,mode = "undirected")
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 2 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 old        515.15ms  515.2ms      1.94    32.6MB     25.2
#> 2 new          8.69ms   10.1ms     92.7       20MB     39.4

A[A==1] <- sample(1:10,sum(A==1),replace = TRUE)

bench::mark(
  check = FALSE,
  old = igraph:::graph.adjacency.sparse(A,mode = "max"),
  new = graph.adjacency.sparse.new(A,mode = "max")
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 2 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 old           586ms    586ms      1.71     198MB    13.7 
#> 2 new           783ms    783ms      1.28     645MB     3.83

bench::mark(
  check = FALSE,
  old = igraph:::graph.adjacency.sparse(A,mode = "max", weighted = TRUE),
  new = graph.adjacency.sparse.new(A,mode = "max", weighted = TRUE)
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 2 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 old          43.7ms   54.4ms     18.4      150MB     9.20
#> 2 new         128.9ms  184.3ms      5.40     473MB    14.4

A <- sample_gnp(5000,0.02, directed = TRUE) |> as_adjacency_matrix()

bench::mark(
  check = FALSE,
  old = igraph:::graph.adjacency.sparse(A,mode = "directed"),
  new = graph.adjacency.sparse.new(A,mode = "directed")
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 2 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 old           1.09s    1.09s     0.918    59.2MB    23.9 
#> 2 new         18.58ms  19.66ms    47.3      24.8MB     5.91

Created on 2025-01-28 with reprex v2.1.1

Copy link
Contributor

aviator-app bot commented Jan 28, 2025

Current Aviator status

Aviator will automatically update this comment as the status of the PR changes.
Comment /aviator refresh to force Aviator to re-examine your PR (or learn about other /aviator commands).

This PR was merged manually (without Aviator). Merging manually can negatively impact the performance of the queue. Consider using Aviator next time.


See the real-time status of this PR on the Aviator webapp.
Use the Aviator Chrome Extension to see the status of your PR within GitHub.

Copy link
Contributor

@krlmlr krlmlr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's understand the pmax() issue a bit better, silence the messages in the tests for Matrix < 1.6.2, and merge. Good job!

R/adjacency.R Outdated
Comment on lines 418 to 419
adjmatrix <- pmax(adjmatrix, Matrix::t(adjmatrix))
adjmatrix <- Matrix::tril(adjmatrix)
Copy link
Contributor

@krlmlr krlmlr Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mmaechler: We tried the code below, but it appears to be even slower. Are pmax() and pmin() efficient for sparse matrices?

Suggested change
adjmatrix <- pmax(adjmatrix, Matrix::t(adjmatrix))
adjmatrix <- Matrix::tril(adjmatrix)
adjmatrix <- pmax(Matrix::tril(adjmatrix), Matrix::t(Matrix::triu(adjmatrix)))

For context, we're transforming a directed to an undirected graph, combining edge attributes from the attributes of the original directed edges:

A[i, j] <- max(A[i, j], A[j, i])

The result is a sparse matrix, we're unsure where the overhead comes from:

set.seed(20250130)
n <- 3
m <- Matrix::sparseMatrix(i = rep(1:n, each = n), j = rep(1:n, n), x = runif(n * n))
pmax(m, Matrix::t(m))
#> 3 x 3 sparse Matrix of class "dgCMatrix"
#>                                   
#> [1,] 0.3174528 0.6438151 0.7530661
#> [2,] 0.6438151 0.3822910 0.4231476
#> [3,] 0.7530661 0.4231476 0.2236633

Created on 2025-01-30 with reprex v2.1.1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For more context, we use adjmat=pmax(adjmax,t(adjmax) to solve the problem atm and we are wondering about the memory footprint. It looks quite high and thus we thought there might be a dense conversion happening somewhere within pmax().

bench::mark(
  check = FALSE,
  old = igraph:::graph.adjacency.sparse(A,mode = "max"),
  new = graph.adjacency.sparse.new(A,mode = "max")
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 2 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 old           586ms    586ms      1.71     198MB    13.7 
#> 2 new           783ms    783ms      1.28     645MB     3.83

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To provide some more context: I got some mixed results when I was running more benchmarks using bench::mark() so I switched to microbenchmark. There seems to be a high variability in the runtime of both functions, which explains the mixed results I got. Not sure what to make of this yet.

set.seed(5461)
f1 <- function(adjmatrix) {
  adjmatrix <- pmax(adjmatrix, Matrix::t(adjmatrix))
  Matrix::tril(adjmatrix)
}

f2 <- function(adjmatrix){
  pmax(Matrix::tril(adjmatrix), Matrix::t(Matrix::triu(adjmatrix)))
}

A <- igraph::sample_gnp(5000,0.03,TRUE) |> igraph::as_adjacency_matrix()
A[A==1] <- sample(1:15,sum(A==1),replace = TRUE)


mb <- microbenchmark::microbenchmark(
  f1(A),
  f2(A)
)
#> Warning in microbenchmark::microbenchmark(f1(A), f2(A)): less accurate
#> nanosecond times to avoid potential integer overflows
mb
#> Unit: milliseconds
#>   expr      min       lq     mean   median       uq      max neval
#>  f1(A) 170.0002 220.5387 224.3328 226.1104 231.3204 274.3551   100
#>  f2(A) 117.4338 170.5237 191.4667 176.8457 213.3561 317.0131   100

ggplot2::autoplot(mb)

Created on 2025-02-05 with reprex v2.1.1

Standard output and standard error

-- nothing to show --

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Managed to strip away everything from the pmax() function that is not needed here and then we have a clearer winner.

set.seed(5461)
f1 <- function(adjmatrix) {
  adjmatrix <- pmax(adjmatrix, Matrix::t(adjmatrix))
  Matrix::tril(adjmatrix)
}

f2 <- function(adjmatrix){
  pmax(Matrix::tril(adjmatrix), Matrix::t(Matrix::triu(adjmatrix)))
}

pmax_AtA <- function(A,B) {
  change <- A < B  
  A[change] <- B[change]
  A
}

f3 <- function(adjmatrix){
  adjmatrix <- pmax_AtA(adjmatrix, Matrix::t(adjmatrix))
  Matrix::tril(adjmatrix)
}

f4 <- function(adjmatrix){
  pmax_AtA(Matrix::tril(adjmatrix), Matrix::t(Matrix::triu(adjmatrix)))
}


A <- igraph::sample_gnp(5000,0.03,TRUE) |> igraph::as_adjacency_matrix()
A[A==1] <- sample(1:15,sum(A==1),replace = TRUE)


mb <- microbenchmark::microbenchmark(
  f1(A),
  f2(A),
  f3(A),
  f4(A)
)
#> Warning in microbenchmark::microbenchmark(f1(A), f2(A), f3(A), f4(A)): less
#> accurate nanosecond times to avoid potential integer overflows
mb
#> Unit: milliseconds
#>   expr       min        lq      mean    median        uq      max neval
#>  f1(A) 170.06214 220.47717 223.22230 224.72250 228.85255 274.9699   100
#>  f2(A) 120.94934 171.71962 189.97005 176.42294 213.91762 333.1371   100
#>  f3(A)  98.58450 103.73449 117.82620 107.91879 136.19962 181.9903   100
#>  f4(A)  47.06476  51.58171  63.10618  55.08414  58.32074 124.1383   100

ggplot2::autoplot(mb)

Created on 2025-02-05 with reprex v2.1.1

Copy link
Contributor

@krlmlr krlmlr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, please merge when good.

R/adjacency.R Outdated Show resolved Hide resolved
R/adjacency.R Outdated Show resolved Hide resolved
@schochastics schochastics enabled auto-merge (squash) February 6, 2025 13:06
@schochastics schochastics merged commit 178bb74 into igraph:main Feb 6, 2025
22 checks passed
schochastics added a commit to schochastics/rigraph that referenced this pull request Feb 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants