-
Notifications
You must be signed in to change notification settings - Fork 10
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
59de7ab
commit ad45172
Showing
2 changed files
with
116 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,115 @@ | ||
V 0.3.5 | ||
======= | ||
- New features: | ||
- New features in existing functions: | ||
- *findAndTransFormDates* now as an *ambiguities* parameter, IGNORE to work as before, WARN to check for ambiguities and print them, SOLVE to try to solve ambiguities on more lines. | ||
- *one_hot_encoder* now uses a *build_encoding* functions to be able to build same encoding on train and on test. | ||
- *aggregateByKey* is now way faster on numerics. But it changed the way it gets input functions. | ||
- *fastScale* now as a *way* parameter which allow you to either scale or unscale. Unscaling numeric values can be very usefull for most post-model analysis. | ||
- *setColAsDate* now accept multiple formats in a single call. | ||
- New functions: | ||
- *build_encoding* build a list of encoding to be used by *one_hot_encoder*, it also has a parameter *min_frequency* to control that rare values doesn't result in new columns. | ||
- Previously private function *identifyDates* is now exported. To be able to perform same transformation on train and on test. | ||
- Adding *dataPrepNews* function to open NEWS file (inspired from rfNews() of randomForest package) | ||
|
||
- Bug fixes: | ||
- *findAndTransFormDates*: bug fixed: user formats weren't used. | ||
- *identifyDates*: some formats where tested but would never work. They have been removed. | ||
|
||
- Refactoring: | ||
- Unit test partly reviewed to be more readable and more efficient. Unit test time as been divided by 3. | ||
- Improving input control for more robust functions | ||
|
||
WARNING: | ||
- *one_hot_encoder* now requires you to run *build_encoding* first. | ||
- *aggregateByKey* now require functions to be passed by character name | ||
|
||
This version is making (as much as possible) transformation reproducible on train and test set. This is to prepare future pipeline feature. | ||
|
||
V 0.3.4 | ||
======== | ||
- Improvement of function | ||
- *whichAreBijection*: It is 2 to 15 time faster than previous version. | ||
- *whichAreIncluded*: It is a bit faster. | ||
- Bug fixes: | ||
- *generateFactorFromDate*: default value was missing. Fixed. | ||
- New features: | ||
- New features in existing functions: | ||
- *fastFilterVariables* has a new parameter (level) to choose which types of filtering to perform | ||
|
||
WARNING: | ||
- *whichAreIncluded*: in case of bijection (col1 is a bijection of col2), they are both included in the other, but the choice of the one to drop might have changed in this version. | ||
|
||
V 0.3.3 | ||
======== | ||
- New features: | ||
- New features in existing functions: | ||
- *findAndTransFormDates* now recognize date character even if there are multiple separator in date (ex: "2016, Jan-26"). | ||
- *findAndTransFormDates* now recognize date character even if there are leading and tailing white spaces. | ||
|
||
WARNING: | ||
- *date3* column in *messy_adult* data set has changed in order to illustrate the recognition of date character even if there are leading and/or trailing white spaces. | ||
- *date4* column in *messy_adult* data set has changed in order to illustrate the recognition of date character even if there are multiple separator. | ||
|
||
V 0.3.2 | ||
======== | ||
- Change URLs to meet CRAN requirement | ||
|
||
v 0.3.1 | ||
======= | ||
- Fix bug in Latex documentation | ||
|
||
v 0.3 | ||
===== | ||
- New features: | ||
- New features in existing functions: | ||
- *findAndTransFormDates* now recognize date character even if "0" are not present in month or day part and month as lower strings. | ||
- *findAndTransFormDates* and *setColAsDate* now work with *factors*. | ||
- New functions: | ||
- *fastDiscretization*: to perform equal freq or equal width discretization on a data set using *data.table* power. | ||
- *fastScale*: to perform scaling on a data set using *data.table* power. | ||
- *one_hot_encoder*: to perform one_hot encoding on a data set using *data.table* power. | ||
- New documentation: | ||
- A new vignette to illustrate how to build a correct *train* and *test* set unising data preparation | ||
- Minor changes in log (in particular regarding progress bars and typos) | ||
- Due to dependencies issues with *tcltk*, we stop using it and start using *progress* | ||
- Refactoring: | ||
- Private function *real_cols* take more importance to control that columns have the correct types and handling "auto" value. | ||
- Making code faster: some functions are up to **30% faster** | ||
- Review unit testing to be faster | ||
- Unit test evolution to be more readable | ||
|
||
WARNING: | ||
- *date1* column in *messy_adult* data set has changed in order to illustrate the recognition of date character even if "0" are not present in month or day part. | ||
|
||
|
||
v 0.2 | ||
===== | ||
- Improving unit testing and code coverage | ||
- Improving documentation | ||
- Solving minor bug in date conversion and in which functions | ||
- New features: | ||
- New functions: | ||
- *unFactor* to unfactor columns, when reading wasn't performed in expected way. | ||
- *sameShape* to make ure that train and test set have exactly the same shape. | ||
- generate new columns from existing columns (generate functions) | ||
- generate factor from dates: *generateFactorFromDate* | ||
- diffDates becomes *generateDateDiffs* (for better name understanding). | ||
- generate numerics and booleans from character of fators (using *generateFromFactor* and *generateFromCharacter*) | ||
|
||
- *setColAsFactor* a function to make multiple columns as factor and controling number of unique elements | ||
|
||
- New features in existing functions: | ||
- which functions: add *keep_cols* argument to make sure that they are not dropped | ||
- fastFilterVariables: *verbose* can be T/F or 0, 1, 2 in order to control level of verbosity | ||
- *findAndTransFormDates* and *setColAsDates* now recognize and accept timestamp. | ||
|
||
WARNING: | ||
- If you were using *diffDates*, it is now called *generateDateDiffs* | ||
- *date2* column in *messy_adult* data set have changed in order to illustrate new timestamp features | ||
- *setColAsFactorOrLogical* doesn't exist anymore: it as been splitted between *setColAsFactor* and *generateFromCat* | ||
- Considering all those changes: *shapeSet* and *prepareSet* don't give the same result anymore. | ||
|
||
|
||
v 0.1: release on CRAN July 2017 | ||
================================ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters