-
-
Notifications
You must be signed in to change notification settings - Fork 203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
R: graph.data.frame converts factors to character #34
Comments
From @elbamos on January 6, 2015 5:25 I'm writing to join in this request... In the first place, as a matter of R, it shouldn't be altering a variable type to or from factor silently, because the factor data definition contains information that's important in, e.g., regression. Similarly, factors are the natural data type for some graph-relevant data, like community membership. Setting vertex colors should also be through factors in vertex attributes; if the graph is going to be visualized with ggplot2 or ggvis or the like, there's a whole framework for factor aesthetics. This seems like a super-easy thing to fix/add/change. if I just do this, will you take the pull request? And if so, how would you prefer it implemented -- I'm thinking its a graph-level "stringsAsFactors" preference set at graph creation. |
There are several problems with factors. One is that you cannot write them to standard file formats. I mean, you can, but the fact that they are factors is lost. (There are no factors in GraphML, GML, etc.) Another one is that you cannot even easily create a factor attribute in igraph currently: g <- make_ring(10)
V(g)$foo <- factor(letters[1:10])
V(g)$foo
#> [1] 1 2 3 4 5 6 7 8 9 10
g <- set_vertex_attr(g, "bar", value = factor(letters[1:10]))
g
#> IGRAPH U--- 10 10 -- Ring graph
#> + attr: name (g/c), mutual (g/l), circular (g/l), foo (v/n), bar
#> | (v/n)
V(g)$bar
#> [1] 1 2 3 4 5 6 7 8 9 10 So at least this needs to be changed, but there are a lot of potential hiccups. In general, vertex/edge attributes that are not atomic builtin classes are not handled well in igraph. igraph does not use ggplot for graph drawing, so I don't really see how factors would help with graph drawing. Also, why are factors natural for community membership? Maybe if you name your communities. Otherwise simple consecutive integer numbers are just as natural, and making them factors is just an unnecessary complication inho. |
From @elbamos on January 6, 2015 5:54 Well, one function of the igraph package is plotting. Another is generation of certain statistics. A third, though, is that its a data structure with a very convenient, well-thought-out syntax for creating, editing, manipulating, etc. graphical data. igraph doesn't use ggplot for plotting. igraph objects, though, can be fed into plotting systems other than igraph's built-in plotting. This is what GGally::ggnet does and I've tried to do with ggnetwork. Why are factors natural for community membership? Well, because community membership is categorical data. More practically, consider this workflow: vinfo <- data.frame(bunch of data about nodes including dat1 and factor2)
graph <- graph.data.frame(edges, vertices = vinfo)
V(graph)$astat <- igraph::a_stat_function(graph)
V(graph)$comm <- igraph::a_community_membership_function(graph)
graph %>% get.data.frame("vertices") %>% glm(dat1 ~ astat + comm + factor2)
or even
graph %>% get.data.frame("vertices") %>% glm(dat1 ~ astat + comm) Without factors, that obviously will produce gobbledygook. This is a simple contrived example. Doing a lot of analysis to see how network structure relates to some other variables, being able to store factors in igraph would really simplify the workflow. |
From @elbamos on January 6, 2015 5:56 I'm not sure I caught exactly what you meant about the implementation issues. I see where file formats are an issue, but that's not really a solveable one, and doesn't seem like a show-stopped to me. The other issues, I understood from the stackoverflow discussion about this, that it seemed that igraph was simply checking variables and converting all the factors to characters. So the project seemed to be going through the code, picking all that out, and then flyspecking whatever broke. Is it a lot more than I was thinking? |
These are some good points. What I meant by the code above is that if factors are first class data types in igraph, then there should be ways to create them. Other than Another potential error that comes to mind immediately is the As for representing community membership as factors, that is probably OK, because it is represented by In general I am a bit ambivalent with factors. They are definitely a good idea, but the way they are implemented in R, you can get some surprising behaviour out of of them. E.g. the way In summary, I don't mind trying to
|
From @elbamos on January 6, 2015 6:24 I agree with you on all counts. Its easiest to just not let names be factors, I think. That is a special case, as you say. I also agree that R can sometimes be surprising about them. But once one gets used to them and their purpose, that funny variable type is really invaluable. Thank you for your attention to this. |
I saw that you closed this... does that mean you're dropping it? Is there any way I can help? |
As you can see, it is open. Just moved the R package in a separate repo. |
Is there any work on this? It is especially pertinent for ggraph, in terms of allowing people to order scales as they would normally do in ggplot2... |
reprex from the original Stack Overflow example. library("igraph")
#>
#> Attaching package: 'igraph'
#> The following objects are masked from 'package:stats':
#>
#> decompose, spectrum
#> The following object is masked from 'package:base':
#>
#> union
actors <- data.frame(
name = c("Alice", "Bob", "Cecil", "David", "Esmeralda"),
age = c(48, 33, 45, 34, 21),
gender = factor(c("F", "M", "F", "M", "F"))
)
relations <- data.frame(
from = c(
"Bob", "Cecil", "Cecil", "David",
"David", "Esmeralda"
),
to = c("Alice", "Bob", "Alice", "Alice", "Bob", "Alice"),
same.dept = c(FALSE, FALSE, TRUE, FALSE, FALSE, TRUE),
friendship = c(4, 5, 5, 2, 1, 1), advice = c(4, 5, 5, 4, 2, 3)
)
g <- graph_from_data_frame(relations, directed = TRUE, vertices = actors)
g_actors <- as_data_frame(g, what = "vertices")
# Compare type of gender (before and after)
is.factor(actors$gender)
#> [1] TRUE
is.factor(g_actors$gender)
#> [1] FALSE Created on 2024-02-26 with reprex v2.1.0 |
Old implementation by @thomasp85: #193. |
While library("igraph")
#>
#> Attaching package: 'igraph'
#> The following objects are masked from 'package:stats':
#>
#> decompose, spectrum
#> The following object is masked from 'package:base':
#>
#> union
actors <- data.frame(
name = c("Alice", "Bob", "Cecil", "David", "Esmeralda"),
age = c(48, 33, 45, 34, 21),
gender = factor(c("F", "M", "F", "M", "F"))
)
relations <- data.frame(
from = c(
"Bob", "Cecil", "Cecil", "David",
"David", "Esmeralda"
),
to = c("Alice", "Bob", "Alice", "Alice", "Bob", "Alice"),
same.dept = c(FALSE, FALSE, TRUE, FALSE, FALSE, TRUE),
friendship = c(4, 5, 5, 2, 1, 1), advice = c(4, 5, 5, 4, 2, 3)
)
g <- graph_from_data_frame(relations, directed = TRUE, vertices = actors)
g_actors <- as_data_frame(g, what = "vertices")
# Compare type of gender (before and after)
is.factor(actors$gender)
#> [1] TRUE
is.factor(V(g)$gender)
#> [1] FALSE
is.factor(g_actors$gender)
#> [1] FALSE
g <- set_vertex_attr(g,"test_set",value=factor(LETTERS[1:5]))
V(g)$test_V <- factor(letters[1:5])
is.factor(V(g)$test_set)
#> [1] TRUE
is.factor(V(g)$test_V)
#> [1] TRUE Created on 2025-01-21 with reprex v2.1.1 To make Lines 201 to 203 in fe3b5b8
Lines 222 to 224 in fe3b5b8
Am I missing something? |
Worth trying with revdeps? |
I would try to bundle some things together:
and see what happens in the revdeps |
From @gaborcsardi on July 26, 2014 3:42
Add an option to keep factors as factors. See
http://stackoverflow.com/questions/24965840/igraph-graph-data-frame-silently-converts-factors-to-character-vectors
Copied from original issue: igraph/igraph#665
The text was updated successfully, but these errors were encountered: