Skip to content

Commit

Permalink
✨ Add first version of data_package()
Browse files Browse the repository at this point in the history
Create a data-package.json based on csv files without additional metadata.
  • Loading branch information
ThierryO committed Dec 13, 2024
1 parent 631bf80 commit 2a12557
Show file tree
Hide file tree
Showing 14 changed files with 132 additions and 0 deletions.
2 changes: 2 additions & 0 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ Imports:
yaml
Suggests:
ggplot2,
jsonlite,
knitr,
microbenchmark,
rmarkdown,
Expand All @@ -60,6 +61,7 @@ Roxygen: list(markdown = TRUE)
RoxygenNote: 7.3.2
Collate:
'clean_data_path.R'
'data_package.R'
'datahash.R'
'display_metadata.R'
'git2rdata_package.R'
Expand Down
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@ S3method(write_vc,character)
S3method(write_vc,default)
S3method(write_vc,git_repository)
export(commit)
export(data_package)
export(display_metadata)
export(is_git2rdata)
export(is_git2rmeta)
Expand Down
87 changes: 87 additions & 0 deletions R/data_package.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
#' Create a Data Package for a directory of CSV files
#'
#' @description
#' Create a `datapackage.json` file for a directory of CSV files.
#' The function will look for all `.csv` files in the directory and its
#' subdirectories.
#' It will then create a `datapackage.json` file with the metadata of each CSV
#' file.
#'
#' @param path the directory in which to create the `datapackage.json` file.
#' @family storage
#' @export
#' @importFrom assertthat assert_that is.string noNA
data_package <- function(path = ".") {
assert_that(
is.string(path), noNA(path), requireNamespace("jsonlite", quietly = TRUE)
)
stopifnot("`path` is not a directory" = file_test("-d", path))

data_files <- list.files(path, pattern = ".csv$", recursive = TRUE)
relevant <- vapply(
data_files, FUN = is_git2rdata, FUN.VALUE = logical(1), root = path
)
stopifnot(
"no non-optimized git2rdata objects found at `path`" = any(relevant)
)
data_files <- data_files[relevant]

list(
resources = vapply(
data_files, path = path, FUN = data_resource,
FUN.VALUE = vector(mode = "list", length = 1)
) |>
unname()
) |>
jsonlite::toJSON(pretty = TRUE, auto_unbox = TRUE) |>
writeLines(file.path(path, "datapackage.json"))
return(file.path(path, "datapackage.json"))
}

#' @importFrom assertthat assert_that is.string noNA
#' @importFrom yaml read_yaml
data_resource <- function(file, path = ".") {
assert_that(
is.string(file), is.string(path), noNA(file), noNA(path)
)
stopifnot("`path` is not a directory" = file_test("-d", path))

clean_data_path(root = path, file = file)[2] |>
read_yaml() -> metadata
list(
name = file, path = file, "encoding" = "utf-8",
format = "csv", media_type = "text/csv",
hash = paste0("sha1:", metadata[["..generic"]][["data_hash"]]),
schema = list(
fields = vapply(
names(metadata)[-1], metadata = metadata, FUN = field_schema,
FUN.VALUE = vector(mode = "list", length = 1)
) |>
unname(),
missingValues = list(
c(value = metadata[["..generic"]][["NA string"]], label = "missing")
)
)
) |>
list()
}

field_schema <- function(x, metadata) {
list(switch(
metadata[[x]]$class,
"character" = list(name = x, type = "string"),
"Date" = list(name = x, type = "date"),
"logical" = list(
name = x, type = "boolean", trueValues = c("TRUE", "true"),
falseValues = c("FALSE", "false")
),
"factor" = list(
name = x, type = "string", categories = metadata[[x]][["labels"]],
categoriesOrdered = metadata[[x]][["ordered"]]
),
"integer" = list(name = x, type = "integer"),
"numeric" = list(name = x, type = "number"),
"POSIXct" = list(name = x, type = "datetime"),
stop("field_schema() can't handle ", metadata[[x]]$class)
))
}
32 changes: 32 additions & 0 deletions man/data_package.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions man/display_metadata.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions man/list_data.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions man/prune_meta.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions man/read_vc.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions man/relabel.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions man/rename_variable.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions man/rm_data.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions man/update_metadata.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions man/verify_vc.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions man/write_vc.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

0 comments on commit 2a12557

Please sign in to comment.