refactor: enhance readability and performance of graph.adjacency.sparse #1667

schochastics · 2025-01-28T09:34:32Z

This PR is a large refactoring of graph.adjacency.sparse(). The goal was to make the function more readable and remove some performance bottlenecks.

The only cases that has a slightly worse performance in some cases are mode = "min" and mode = "max" (However, there are some fluctuations I do not fully understand). The new implementation is a simple one-liner. Probably needs some discussions.

library(igraph)
packageVersion("igraph")
#> [1] '2.1.3'

A <- sample_gnp(5000,0.02, directed = FALSE) |> as_adjacency_matrix()

bench::mark(
  check = FALSE,
  old = igraph:::graph.adjacency.sparse(A,mode = "undirected"),
  new = graph.adjacency.sparse.new(A,mode = "undirected")
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 2 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 old        515.15ms  515.2ms      1.94    32.6MB     25.2
#> 2 new          8.69ms   10.1ms     92.7       20MB     39.4

A[A==1] <- sample(1:10,sum(A==1),replace = TRUE)

bench::mark(
  check = FALSE,
  old = igraph:::graph.adjacency.sparse(A,mode = "max"),
  new = graph.adjacency.sparse.new(A,mode = "max")
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 2 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 old           586ms    586ms      1.71     198MB    13.7 
#> 2 new           783ms    783ms      1.28     645MB     3.83

bench::mark(
  check = FALSE,
  old = igraph:::graph.adjacency.sparse(A,mode = "max", weighted = TRUE),
  new = graph.adjacency.sparse.new(A,mode = "max", weighted = TRUE)
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 2 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 old          43.7ms   54.4ms     18.4      150MB     9.20
#> 2 new         128.9ms  184.3ms      5.40     473MB    14.4

A <- sample_gnp(5000,0.02, directed = TRUE) |> as_adjacency_matrix()

bench::mark(
  check = FALSE,
  old = igraph:::graph.adjacency.sparse(A,mode = "directed"),
  new = graph.adjacency.sparse.new(A,mode = "directed")
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 2 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 old           1.09s    1.09s     0.918    59.2MB    23.9 
#> 2 new         18.58ms  19.66ms    47.3      24.8MB     5.91

^{Created on 2025-01-28 with reprex v2.1.1}

aviator-app · 2025-01-28T09:34:35Z

Current Aviator status

Aviator will automatically update this comment as the status of the PR changes.
Comment /aviator refresh to force Aviator to re-examine your PR (or learn about other /aviator commands).

This PR was merged manually (without Aviator). Merging manually can negatively impact the performance of the queue. Consider using Aviator next time.

See the real-time status of this PR on the Aviator webapp.

Use the Aviator Chrome Extension to see the status of your PR within GitHub.

krlmlr

Let's understand the pmax() issue a bit better, silence the messages in the tests for Matrix < 1.6.2, and merge. Good job!

krlmlr · 2025-01-30T10:37:12Z

R/adjacency.R

+    adjmatrix <- pmax(adjmatrix, Matrix::t(adjmatrix))
+    adjmatrix <- Matrix::tril(adjmatrix)


@mmaechler: We tried the code below, but it appears to be even slower. Are pmax() and pmin() efficient for sparse matrices?

Suggested change

adjmatrix <- pmax(adjmatrix, Matrix::t(adjmatrix))

adjmatrix <- Matrix::tril(adjmatrix)

adjmatrix <- pmax(Matrix::tril(adjmatrix), Matrix::t(Matrix::triu(adjmatrix)))

For context, we're transforming a directed to an undirected graph, combining edge attributes from the attributes of the original directed edges:

A[i, j] <- max(A[i, j], A[j, i])

The result is a sparse matrix, we're unsure where the overhead comes from:

set.seed(20250130) n <- 3 m <- Matrix::sparseMatrix(i = rep(1:n, each = n), j = rep(1:n, n), x = runif(n * n)) pmax(m, Matrix::t(m)) #> 3 x 3 sparse Matrix of class "dgCMatrix" #> #> [1,] 0.3174528 0.6438151 0.7530661 #> [2,] 0.6438151 0.3822910 0.4231476 #> [3,] 0.7530661 0.4231476 0.2236633

^{Created on 2025-01-30 with reprex v2.1.1}

For more context, we use adjmat=pmax(adjmax,t(adjmax) to solve the problem atm and we are wondering about the memory footprint. It looks quite high and thus we thought there might be a dense conversion happening somewhere within pmax().

bench::mark( check = FALSE, old = igraph:::graph.adjacency.sparse(A,mode = "max"), new = graph.adjacency.sparse.new(A,mode = "max") ) #> Warning: Some expressions had a GC in every iteration; so filtering is #> disabled. #> # A tibble: 2 × 6 #> expression min median `itr/sec` mem_alloc `gc/sec` #> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> #> 1 old 586ms 586ms 1.71 198MB 13.7 #> 2 new 783ms 783ms 1.28 645MB 3.83

To provide some more context: I got some mixed results when I was running more benchmarks using bench::mark() so I switched to microbenchmark. There seems to be a high variability in the runtime of both functions, which explains the mixed results I got. Not sure what to make of this yet.

set.seed(5461) f1 <- function(adjmatrix) { adjmatrix <- pmax(adjmatrix, Matrix::t(adjmatrix)) Matrix::tril(adjmatrix) } f2 <- function(adjmatrix){ pmax(Matrix::tril(adjmatrix), Matrix::t(Matrix::triu(adjmatrix))) } A <- igraph::sample_gnp(5000,0.03,TRUE) |> igraph::as_adjacency_matrix() A[A==1] <- sample(1:15,sum(A==1),replace = TRUE) mb <- microbenchmark::microbenchmark( f1(A), f2(A) ) #> Warning in microbenchmark::microbenchmark(f1(A), f2(A)): less accurate #> nanosecond times to avoid potential integer overflows mb #> Unit: milliseconds #> expr min lq mean median uq max neval #> f1(A) 170.0002 220.5387 224.3328 226.1104 231.3204 274.3551 100 #> f2(A) 117.4338 170.5237 191.4667 176.8457 213.3561 317.0131 100 ggplot2::autoplot(mb)

^{Created on 2025-02-05 with reprex v2.1.1}

Standard output and standard error

-- nothing to show --

Managed to strip away everything from the pmax() function that is not needed here and then we have a clearer winner.

set.seed(5461) f1 <- function(adjmatrix) { adjmatrix <- pmax(adjmatrix, Matrix::t(adjmatrix)) Matrix::tril(adjmatrix) } f2 <- function(adjmatrix){ pmax(Matrix::tril(adjmatrix), Matrix::t(Matrix::triu(adjmatrix))) } pmax_AtA <- function(A,B) { change <- A < B A[change] <- B[change] A } f3 <- function(adjmatrix){ adjmatrix <- pmax_AtA(adjmatrix, Matrix::t(adjmatrix)) Matrix::tril(adjmatrix) } f4 <- function(adjmatrix){ pmax_AtA(Matrix::tril(adjmatrix), Matrix::t(Matrix::triu(adjmatrix))) } A <- igraph::sample_gnp(5000,0.03,TRUE) |> igraph::as_adjacency_matrix() A[A==1] <- sample(1:15,sum(A==1),replace = TRUE) mb <- microbenchmark::microbenchmark( f1(A), f2(A), f3(A), f4(A) ) #> Warning in microbenchmark::microbenchmark(f1(A), f2(A), f3(A), f4(A)): less #> accurate nanosecond times to avoid potential integer overflows mb #> Unit: milliseconds #> expr min lq mean median uq max neval #> f1(A) 170.06214 220.47717 223.22230 224.72250 228.85255 274.9699 100 #> f2(A) 120.94934 171.71962 189.97005 176.42294 213.91762 333.1371 100 #> f3(A) 98.58450 103.73449 117.82620 107.91879 136.19962 181.9903 100 #> f4(A) 47.06476 51.58171 63.10618 55.08414 58.32074 124.1383 100 ggplot2::autoplot(mb)

^{Created on 2025-02-05 with reprex v2.1.1}

krlmlr

Looks good, please merge when good.

R/adjacency.R

…se (igraph#1667)

krlmlr reviewed Jan 30, 2025

View reviewed changes

krlmlr reviewed Feb 6, 2025

View reviewed changes

R/adjacency.R Outdated Show resolved Hide resolved

R/adjacency.R Outdated Show resolved Hide resolved

schochastics enabled auto-merge (squash) February 6, 2025 13:06

schochastics added 4 commits February 6, 2025 15:42

refactor graph.adjacency.sparse

1981e1a

enforce minimum matrix version in test

1a9e5da

added stripped down versions of pmax and pmin

b5a63ed

fixed max and min mode and added missing comma for non loop case

c7b79ea

schochastics force-pushed the graph_adjmat_sparse branch from 9c253c4 to c7b79ea Compare February 6, 2025 14:42

fixed test

c421d57

schochastics merged commit 178bb74 into igraph:main Feb 6, 2025
22 checks passed

schochastics added a commit to schochastics/rigraph that referenced this pull request Feb 8, 2025

refactor: enhance readability and performance of graph.adjacency.spar…

ce6d8b5

…se (igraph#1667)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: enhance readability and performance of graph.adjacency.sparse #1667

refactor: enhance readability and performance of graph.adjacency.sparse #1667

schochastics commented Jan 28, 2025 •

edited

Loading

aviator-app bot commented Jan 28, 2025 •

edited

Loading

krlmlr left a comment

krlmlr Jan 30, 2025 •

edited

Loading

schochastics Jan 30, 2025

schochastics Feb 5, 2025

schochastics Feb 5, 2025

krlmlr left a comment

		adjmatrix <- pmax(adjmatrix, Matrix::t(adjmatrix))
		adjmatrix <- Matrix::tril(adjmatrix)

	adjmatrix <- pmax(adjmatrix, Matrix::t(adjmatrix))
	adjmatrix <- Matrix::tril(adjmatrix)
	adjmatrix <- pmax(Matrix::tril(adjmatrix), Matrix::t(Matrix::triu(adjmatrix)))

refactor: enhance readability and performance of graph.adjacency.sparse #1667

refactor: enhance readability and performance of graph.adjacency.sparse #1667

Conversation

schochastics commented Jan 28, 2025 • edited Loading

aviator-app bot commented Jan 28, 2025 • edited Loading

Current Aviator status

krlmlr left a comment

Choose a reason for hiding this comment

krlmlr Jan 30, 2025 • edited Loading

Choose a reason for hiding this comment

schochastics Jan 30, 2025

Choose a reason for hiding this comment

schochastics Feb 5, 2025

Choose a reason for hiding this comment

schochastics Feb 5, 2025

Choose a reason for hiding this comment

krlmlr left a comment

Choose a reason for hiding this comment

schochastics commented Jan 28, 2025 •

edited

Loading

aviator-app bot commented Jan 28, 2025 •

edited

Loading

krlmlr Jan 30, 2025 •

edited

Loading