Skip to content

Commit

Permalink
Ffix a latex error (#31)
Browse files Browse the repository at this point in the history
* Trying to fix a latex error
* Fix URLs
* Accepted by CRAN
  • Loading branch information
ELToulemonde authored Oct 25, 2017
1 parent a825001 commit dbe52ce
Show file tree
Hide file tree
Showing 18 changed files with 79 additions and 65 deletions.
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: dataPreparation
Title: Automated Data Preparation
Version: 0.3
Version: 0.3.2
Authors@R: person("Emmanuel-Lin", "Toulemonde", email = "[email protected]", role = c("aut", "cre"))
Description: Do most of the painful data preparation for a data science project with a minimum amount of code; Take advantages of data.table efficiency and use some algorithmic trick in order to perform data preparation in a time and RAM efficient way.
Depends:
Expand Down
8 changes: 8 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,11 @@
V 0.3.2
========
- Change URLs to meet CRAN requirement

v 0.3.1
=======
- Fix bug in Latex documentation

v 0.3
=====
- New features:
Expand Down
16 changes: 8 additions & 8 deletions R/aggregate.R
Original file line number Diff line number Diff line change
Expand Up @@ -8,15 +8,15 @@
#' @param ... Optional argument: \code{functions}: aggregation functions for numeric columns
#' (vector of function, optional, if not set we use: c(mean, min, max, sd))
#' @details
#' Perform aggregation depending on column type:\cr
#' Perform aggregation depending on column type:
#' \itemize{
#' \item If column is numeric \code{functions} are performed on the column. So 1 numeric column
#' give length(functions) new columns,
#' \item If column is character or factor and have less than \code{thresh} different values,
#' frequency count of values is performed,
#' \item If column is character or factor with more than \code{thresh} different values, number
#' of different values for each \code{key} is performed,
#' \item If column is logical, count of number and rate of positive is performed.
#' \item If column is numeric \code{functions} are performed on the column. So 1 numeric column
#' give length(functions) new columns,
#' \item If column is character or factor and have less than \code{thresh} different values,
#' frequency count of values is performed,
#' \item If column is character or factor with more than \code{thresh} different values, number
#' of different values for each \code{key} is performed,
#' \item If column is logical, count of number and rate of positive is performed.
#' }
#' Be careful using functions argument, given functions should be an aggregation function,
#' meaning that for multiple values it should only return one value.
Expand Down
2 changes: 1 addition & 1 deletion R/dataSet.R
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
###################################################################################################
#' Adult with some ugly columns added
#'
#' For examples and tutorials, messy_adult has been built using UCI \code{adult}.\cr
#' For examples and tutorials, messy_adult has been built using UCI \code{adult}.
#'
#' We added 9 really ugly columns to the data set:
#'
Expand Down
2 changes: 1 addition & 1 deletion R/discretization.R
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
#' @param verbose Should the algorithm talk? (Logical, default to TRUE)
#' @return A list where each element name is a column name of data set and each element contains
#' bins to discretize this column.
#' @details \cr
#' @details
#' Using equal freq first bin will start at -Inf and last bin will end at +Inf.
#' @examples
#' # Load data
Expand Down
2 changes: 1 addition & 1 deletion R/generateFromFactor.R
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#' Recode factor
#'
#' Recode factors into 3 new columns: \cr
#' Recode factors into 3 new columns:
#' \itemize{
#' \item was the value not NA, "NA", "",
#' \item how often this value occures,
Expand Down
20 changes: 11 additions & 9 deletions R/prepareSet.R
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
###################################################################################################
#' Preparation pipeline
#'
#' Full pipeline for preparing your dataSet set \cr
#' Full pipeline for preparing your dataSet set.
#' @param dataSet Matrix, data.frame or data.table
#' @param finalForm "data.table" or "numerical_matrix" (default to data.table)
#' @param verbose Should the algorithm talk? (logical, default to TRUE)
Expand All @@ -26,14 +26,16 @@
#' \code{\link{generateFactorFromDate}}) (character, default to "yearmonth")
#' }
#' @return A data.table or a numerical matrix (according to \code{finalForm}). \cr
#' It will perform the following steps: \cr
#' - Correct set: unfactor factor with many values, id dates and numeric that are hiden in character \cr
#' - Transform set: compute differences between every date, transform dates into factors, generate
#' features from character..., if \code{key} is provided, will perform aggregate according to this \code{key} \cr
#' - Filter set: filter constant, in double or bijection variables. If `digits` is provided,
#' will round numeric \cr
#' - Handle NA: will perform \code{\link{fastHandleNa}}) \cr
#' - Shape set: will put the result in asked shape (\code{finalForm}) with acceptable columns format.
#' It will perform the following steps:
#' \itemize{
#' \item Correct set: unfactor factor with many values, id dates and numeric that are hiden in character
#' \item Transform set: compute differences between every date, transform dates into factors, generate
#' features from character..., if \code{key} is provided, will perform aggregate according to this \code{key}
#' \item Filter set: filter constant, in double or bijection variables. If `digits` is provided,
#' will round numeric
#' \item Handle NA: will perform \code{\link{fastHandleNa}})
#' \item Shape set: will put the result in asked shape (\code{finalForm}) with acceptable columns format.
#' }
#' @examples
#' # Load ugly set
#' \dontrun{
Expand Down
19 changes: 10 additions & 9 deletions R/shapeSet.R
Original file line number Diff line number Diff line change
@@ -1,14 +1,15 @@
#' Final preparation before ML algorithm
#'
#' Prepare a data.table by: \cr
#' - transforming numeric variables into factors whenever they take less than \code{thresh} unique
#' variables \cr
#' - transforming characters using \code{\link{generateFromCharacter}} \cr
#' - transforming logical into binary integers \cr
#' - dropping constant columns \cr
#' - Sending the data.table to \code{\link{setAsNumericMatrix}} (when \code{finalForm == "numerical_matrix"}) will then allow
#' you to get a numerical matrix usable by most Machine Learning Algorithms.
#'
#' Prepare a data.table by:
#' \itemize{
#' \item transforming numeric variables into factors whenever they take less than \code{thresh} unique
#' variables
#' \item transforming characters using \code{\link{generateFromCharacter}}
#' \item transforming logical into binary integers
#' \item dropping constant columns
#' \item Sending the data.table to \code{\link{setAsNumericMatrix}} (when \code{finalForm == "numerical_matrix"})
#' will then allow you to get a numerical matrix usable by most Machine Learning Algorithms.
#' }
#' @param dataSet Matrix, data.frame or data.table
#' @param finalForm "data.table" or "numerical_matrix" (default to data.table)
#' @param thresh Threshold such that a numerical column is transformed into
Expand Down
6 changes: 3 additions & 3 deletions R/whichFunctions.R
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ whichAreConstant <- function(dataSet, keep_cols = NULL, verbose = TRUE){
#' first 10 lines of both columns. If they are not equal then the columns aren't identical, else
#' it compares lines 11 to 100; then 101 to 1000... So this function is fast with dataSet set
#' with a large number of lines and a lot of columns that aren't equals. \cr
#' If \code{verbose} is TRUE, the column logged will be the one returned. \cr
#' If \code{verbose} is TRUE, the column logged will be the one returned.
#' @examples
#' # First let's build a matrix with 3 columns and a lot of lines, with 1's everywhere
#' M <- matrix(1, nrow = 1e6, ncol = 3)
Expand Down Expand Up @@ -172,7 +172,7 @@ whichAreBijection <- function(dataSet, keep_cols = NULL, verbose = TRUE){

## Initialization

## Computation # to-do dé-gorifier
## Computation # to-do clean it
bijection_cols <- bi_col_test(dataSet, keep_cols, verbose = verbose,
test_function = "fastIsBijection", function_name = function_name, test_log = " is a bijection of ")

Expand Down Expand Up @@ -245,7 +245,7 @@ whichAreIncluded <- function(dataSet, keep_cols = NULL, verbose = TRUE){
pb <- initPB(function_name, names(dataSet))
}
nbr_various_val <- sapply(dataSet, uniqueN)
## Computation # to-do dé-gorifier
## Computation # to-do clean it
while (length(I) > 0){
i <- I[1]

Expand Down
16 changes: 8 additions & 8 deletions man/aggregateByKey.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 0 additions & 1 deletion man/build_bins.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/generateFromFactor.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/messy_adult.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

20 changes: 11 additions & 9 deletions man/prepareSet.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

18 changes: 10 additions & 8 deletions man/shapeSet.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/whichAreInDouble.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion vignettes/dataPreparation.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -359,7 +359,7 @@ description(agg_adult, level = 0)


# Conclusion
We presented some of the functions of *dataPreparation* package. There are a few more available, plus they have some parameters to make their use easier. So if you liked it, please go check the package documentation (by installing it or on [CRAN](https://cran.r-project.org/web/packages/dataPreparation/dataPreparation.pdf))
We presented some of the functions of *dataPreparation* package. There are a few more available, plus they have some parameters to make their use easier. So if you liked it, please go check the package documentation (by installing it or on [CRAN](https://CRAN.R-project.org/package=dataPreparation/dataPreparation.pdf))


We hope that this package is helpful, that it helped you prepare your data in a faster way.
Expand Down
4 changes: 2 additions & 2 deletions vignettes/train_test_prep.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ In this tutorial the following points are going to be viewed:
- Applying the same preparation to a testing set,
- Controling that train and test sets have the same shape.

Using [dataPreparation](https://cran.r-project.org/web/packages/dataPreparation/index.html) package, those sets will be
Using [dataPreparation](https://CRAN.R-project.org/package=dataPreparation/index.html) package, those sets will be

- fast (since dataPreparation is based on data.table framework and uses some computational tricks)
- easy (since those functions are packaged and handle most of the situations)
Expand Down Expand Up @@ -229,7 +229,7 @@ No warning have been raised it's all is ok.


# Conclusion
We presented some of the functions of *dataPreparation* package. There are a few more available, plus they have some parameters to make their use easier. So if you liked it, please go check the package documentation (by installing it or on [CRAN](https://cran.r-project.org/web/packages/dataPreparation/dataPreparation.pdf))
We presented some of the functions of *dataPreparation* package. There are a few more available, plus they have some parameters to make their use easier. So if you liked it, please go check the package documentation (by installing it or on [CRAN]( https://CRAN.R-project.org/package=dataPreparation/dataPreparation.pdf))


We hope that this package is helpful, that it helped you prepare your data in a faster way.
Expand Down

0 comments on commit dbe52ce

Please sign in to comment.