Release CRAN release v0.3.5 · ELToulemonde/dataPreparation

New features:

New features in existing functions:
- findAndTransFormDates now as an ambiguities parameter, IGNORE to work as before, WARN to check for ambiguities and print them, SOLVE to try to solve ambiguities on more lines.
- one_hot_encoder now uses a build_encoding functions to be able to build same encoding on train and on test.
- aggregateByKey is now way faster on numerics. But it changed the way it gets input functions.
- fastScale now as a way parameter which allow you to either scale or unscale. Unscaling numeric values can be very usefull for most post-model analysis.
- setColAsDate now accept multiple formats in a single call.
New functions:
- build_encoding build a list of encoding to be used by one_hot_encoder, it also has a parameter min_frequency to control that rare values doesn't result in new columns.
- Previously private function identifyDates is now exported. To be able to perform same transformation on train and on test.
- Adding dataPrepNews function to open NEWS file (inspired from rfNews() of randomForest package)

Bug fixes:

findAndTransFormDates: bug fixed: user formats weren't used.
identifyDates: some formats where tested but would never work. They have been removed.

Refactoring:

Unit test partly reviewed to be more readable and more efficient. Unit test time as been divided by 3.
Improving input control for more robust functions

WARNING:

This version is making (as much as possible) transformation reproducible on train and test set. This is to prepare future pipeline feature.

Provide feedback