Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

R: graph.data.frame converts factors to character #34

Open
gaborcsardi opened this issue Jan 14, 2015 · 14 comments
Open

R: graph.data.frame converts factors to character #34

gaborcsardi opened this issue Jan 14, 2015 · 14 comments
Milestone

Comments

@gaborcsardi
Copy link
Contributor

From @gaborcsardi on July 26, 2014 3:42

Add an option to keep factors as factors. See
http://stackoverflow.com/questions/24965840/igraph-graph-data-frame-silently-converts-factors-to-character-vectors

Copied from original issue: igraph/igraph#665

@gaborcsardi
Copy link
Contributor Author

From @elbamos on January 6, 2015 5:25

I'm writing to join in this request... In the first place, as a matter of R, it shouldn't be altering a variable type to or from factor silently, because the factor data definition contains information that's important in, e.g., regression. Similarly, factors are the natural data type for some graph-relevant data, like community membership.

Setting vertex colors should also be through factors in vertex attributes; if the graph is going to be visualized with ggplot2 or ggvis or the like, there's a whole framework for factor aesthetics.

This seems like a super-easy thing to fix/add/change. if I just do this, will you take the pull request? And if so, how would you prefer it implemented -- I'm thinking its a graph-level "stringsAsFactors" preference set at graph creation.

@gaborcsardi
Copy link
Contributor Author

There are several problems with factors. One is that you cannot write them to standard file formats. I mean, you can, but the fact that they are factors is lost. (There are no factors in GraphML, GML, etc.)

Another one is that you cannot even easily create a factor attribute in igraph currently:

g <- make_ring(10)
V(g)$foo <- factor(letters[1:10])
V(g)$foo
#>  [1]  1  2  3  4  5  6  7  8  9 10

g <- set_vertex_attr(g, "bar", value = factor(letters[1:10]))
g
#> IGRAPH U--- 10 10 -- Ring graph
#> + attr: name (g/c), mutual (g/l), circular (g/l), foo (v/n), bar
#> | (v/n)
V(g)$bar
#>  [1]  1  2  3  4  5  6  7  8  9 10

So at least this needs to be changed, but there are a lot of potential hiccups. In general, vertex/edge attributes that are not atomic builtin classes are not handled well in igraph.

igraph does not use ggplot for graph drawing, so I don't really see how factors would help with graph drawing. Also, why are factors natural for community membership? Maybe if you name your communities. Otherwise simple consecutive integer numbers are just as natural, and making them factors is just an unnecessary complication inho.

@gaborcsardi
Copy link
Contributor Author

From @elbamos on January 6, 2015 5:54

Well, one function of the igraph package is plotting. Another is generation of certain statistics. A third, though, is that its a data structure with a very convenient, well-thought-out syntax for creating, editing, manipulating, etc. graphical data.

igraph doesn't use ggplot for plotting. igraph objects, though, can be fed into plotting systems other than igraph's built-in plotting. This is what GGally::ggnet does and I've tried to do with ggnetwork.

Why are factors natural for community membership? Well, because community membership is categorical data. More practically, consider this workflow:

vinfo <- data.frame(bunch of data about nodes including dat1 and factor2)
graph <- graph.data.frame(edges, vertices = vinfo)
V(graph)$astat <- igraph::a_stat_function(graph)
V(graph)$comm <- igraph::a_community_membership_function(graph)
graph %>% get.data.frame("vertices") %>% glm(dat1 ~ astat + comm + factor2)
or even
graph %>% get.data.frame("vertices") %>% glm(dat1 ~ astat + comm)

Without factors, that obviously will produce gobbledygook. This is a simple contrived example. Doing a lot of analysis to see how network structure relates to some other variables, being able to store factors in igraph would really simplify the workflow.

@gaborcsardi
Copy link
Contributor Author

From @elbamos on January 6, 2015 5:56

I'm not sure I caught exactly what you meant about the implementation issues. I see where file formats are an issue, but that's not really a solveable one, and doesn't seem like a show-stopped to me. The other issues, I understood from the stackoverflow discussion about this, that it seemed that igraph was simply checking variables and converting all the factors to characters. So the project seemed to be going through the code, picking all that out, and then flyspecking whatever broke.

Is it a lot more than I was thinking?

@gaborcsardi
Copy link
Contributor Author

These are some good points.

What I meant by the code above is that if factors are first class data types in igraph, then there should be ways to create them. Other than graph.data.frame, which is just a special case. set.*.attribute should support factors.

Another potential error that comes to mind immediately is the name vertex attribute, that is treated specially, and I am not sure if everything works if it is a factor. Probably not.

As for representing community membership as factors, that is probably OK, because it is represented by 1:k anyway, and factor levels would match their internal representation.

In general I am a bit ambivalent with factors. They are definitely a good idea, but the way they are implemented in R, you can get some surprising behaviour out of of them. E.g. the way data.frame converts strings to factors, is just wrong.

In summary, I don't mind trying to

  • change graph.data.frame to keep factors, and
  • change set.vertex.attributes and set.edge.attribute so that factors are actually kept.

@gaborcsardi
Copy link
Contributor Author

From @elbamos on January 6, 2015 6:24

I agree with you on all counts. Its easiest to just not let names be factors, I think. That is a special case, as you say. I also agree that R can sometimes be surprising about them. But once one gets used to them and their purpose, that funny variable type is really invaluable.

Thank you for your attention to this.

@elbamos
Copy link

elbamos commented Jan 17, 2015

I saw that you closed this... does that mean you're dropping it? Is there any way I can help?

@gaborcsardi
Copy link
Contributor Author

As you can see, it is open. Just moved the R package in a separate repo.

@thomasp85
Copy link

Is there any work on this? It is especially pertinent for ggraph, in terms of allowing people to order scales as they would normally do in ggplot2...

@krlmlr krlmlr added this to the triage milestone Feb 20, 2024
@maelle
Copy link
Contributor

maelle commented Feb 26, 2024

reprex from the original Stack Overflow example.

library("igraph")
#> 
#> Attaching package: 'igraph'
#> The following objects are masked from 'package:stats':
#> 
#>     decompose, spectrum
#> The following object is masked from 'package:base':
#> 
#>     union
actors <- data.frame(
  name = c("Alice", "Bob", "Cecil", "David", "Esmeralda"),
  age = c(48, 33, 45, 34, 21),
  gender = factor(c("F", "M", "F", "M", "F"))
)
relations <- data.frame(
  from = c(
    "Bob", "Cecil", "Cecil", "David",
    "David", "Esmeralda"
  ),
  to = c("Alice", "Bob", "Alice", "Alice", "Bob", "Alice"),
  same.dept = c(FALSE, FALSE, TRUE, FALSE, FALSE, TRUE),
  friendship = c(4, 5, 5, 2, 1, 1), advice = c(4, 5, 5, 4, 2, 3)
)
g <- graph_from_data_frame(relations, directed = TRUE, vertices = actors)
g_actors <- as_data_frame(g, what = "vertices")

# Compare type of gender (before and after)
is.factor(actors$gender)
#> [1] TRUE
is.factor(g_actors$gender)
#> [1] FALSE

Created on 2024-02-26 with reprex v2.1.0

@krlmlr
Copy link
Contributor

krlmlr commented Apr 9, 2024

Old implementation by @thomasp85: #193.

@schochastics
Copy link
Contributor

While graph_from_data_fram() does remove factors, set_vertex_attr() supports them

library("igraph")
#> 
#> Attaching package: 'igraph'
#> The following objects are masked from 'package:stats':
#> 
#>     decompose, spectrum
#> The following object is masked from 'package:base':
#> 
#>     union
actors <- data.frame(
  name = c("Alice", "Bob", "Cecil", "David", "Esmeralda"),
  age = c(48, 33, 45, 34, 21),
  gender = factor(c("F", "M", "F", "M", "F"))
)
relations <- data.frame(
  from = c(
    "Bob", "Cecil", "Cecil", "David",
    "David", "Esmeralda"
  ),
  to = c("Alice", "Bob", "Alice", "Alice", "Bob", "Alice"),
  same.dept = c(FALSE, FALSE, TRUE, FALSE, FALSE, TRUE),
  friendship = c(4, 5, 5, 2, 1, 1), advice = c(4, 5, 5, 4, 2, 3)
)
g <- graph_from_data_frame(relations, directed = TRUE, vertices = actors)
g_actors <- as_data_frame(g, what = "vertices")

# Compare type of gender (before and after)
is.factor(actors$gender)
#> [1] TRUE
is.factor(V(g)$gender)
#> [1] FALSE
is.factor(g_actors$gender)
#> [1] FALSE

g <- set_vertex_attr(g,"test_set",value=factor(LETTERS[1:5]))
V(g)$test_V <- factor(letters[1:5])
is.factor(V(g)$test_set)
#> [1] TRUE
is.factor(V(g)$test_V)
#> [1] TRUE

Created on 2025-01-21 with reprex v2.1.1

To make graph_from_data_frame() accept factors, it seems like only these rows need to be removed.

rigraph/R/data_frame.R

Lines 201 to 203 in fe3b5b8

if (inherits(newval, "factor")) {
newval <- as.character(newval)
}

rigraph/R/data_frame.R

Lines 222 to 224 in fe3b5b8

if (inherits(newval, "factor")) {
newval <- as.character(newval)
}

Am I missing something?

@maelle
Copy link
Contributor

maelle commented Feb 13, 2025

Worth trying with revdeps?

@schochastics
Copy link
Contributor

I would try to bundle some things together:

  • Strings/NAs in Matrices
  • factors in graph_from_data_frame

and see what happens in the revdeps

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants