External pointer issue with xml2 and furrr #502
-
I have the “external pointer” issue with xml2 and furrr and wanted to ask if there is a solution or work-around.
I extract charge data by department and convert them to XML to upload to a Case Management System. The process takes about an hour or longer, so I thought to use future/furrr to take advantage of multiple cores on my Mac, but ran into the non-exportable reference (external ptr) issue with xml2. I’ve tried multisession and multicore on my Mac. Is there a better solution than having the worker processes write the data to disk and reading them in again at the manager process (or uploading multiple files which I need to clear with the vendor)? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
I don't think so. AFAIU, the xml2 object can only be used in the R session where they were created, which means they cannot be passed on to a parallel R workers. It's not clear to me how you create these objects in the first place, but I assume from your description that you grab them from an online resource rather than an existing file. So, similarly to your idea of writing to file and re-reading in the parallel worker, you can write the If you create the following two utility functions: xml_to_raw <- function(xml) {
stopifnot(inherits(xml, "xml_document"))
con <- rawConnection(raw(0L), open = "wb")
on.exit(close(con))
xml2::write_xml(xml, file = con)
rawConnectionValue(con)
}
raw_to_xml <- function(raw) {
con <- rawConnection(raw, open = "rb")
on.exit(close(con))
xml2::read_xml(con)
} you'll see that you can use them to back and forth as: library(xml2)
xml <- read_xml("<body></body>")
print(xml)
## {xml_document}
## <body>
xml_raw <- xml_to_raw(xml)
str(xml_raw)
## raw [1:47] 3c 3f 78 6d ...
xml2 <- raw_to_xml(xml_raw)
print(xml2)
## {xml_document}
## <body>
all.equal(xml2, xml)
## [1] TRUE So, to use library(future)
plan(multisession, workers = 2L)
library(xml2)
xml <- read_xml("<body></body>")
xml_raw <- xml_to_raw(xml) ## 'xml' is not exportable ...
f <- future({
xml <- raw_to_xml(xml_raw) ## ... but 'xml_raw' is
print(xml)
length(xml)
})
v <- value(f) ## relays the print(xml) output
## {xml_document}
## <body>
print(v)
[1] 2 Hope this helps. FWIW, the above |
Beta Was this translation helpful? Give feedback.
I don't think so. AFAIU, the xml2 object can only be used in the R session where they were created, which means they cannot be passed on to a parallel R workers. It's not clear to me how you create these objects in the first place, but I assume from your description that you grab them from an online resource rather than an existing file.
So, similarly to your idea of writing to file and re-reading in the parallel worker, you can write the
xml_document
object to a "raw" object in R, then ex…