Skip to content

Commit

Permalink
Add TargetEncoder to documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
ClaudioSalvatoreArcidiacono committed Dec 10, 2024
1 parent a10eb01 commit 0623334
Show file tree
Hide file tree
Showing 5 changed files with 9 additions and 8 deletions.
1 change: 1 addition & 0 deletions docs/API/encoding/TargetEncoder.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: sklearo.encoding.TargetEncoder
1 change: 1 addition & 0 deletions docs/API/utils/infer_type_of_target.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: sklearo.utils.infer_type_of_target
1 change: 0 additions & 1 deletion docs/API/utils/type_of_target.md

This file was deleted.

1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ plugins:
show_root_full_path: true
show_symbol_type_heading: true
show_symbol_type_toc: true
inherited_members: true

markdown_extensions:
- pymdownx.highlight:
Expand Down
13 changes: 6 additions & 7 deletions sklearo/encoding/target.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ class TargetEncoder(BaseTargetEncoder):
Args:
columns (str, list[str], list[nw.typing.DTypes]): List of columns to encode.
- If a list of strings is passed, it is treated as a list of column names to encode.
- If a single string is passed instead, it is treated as a regular expression pattern to
match column names.
Expand All @@ -32,19 +33,22 @@ class TargetEncoder(BaseTargetEncoder):
unseen (str): Strategy to handle categories that appear during the `transform` step but were
never encountered in the `fit` step.
- If `'raise'`, an error is raised when unseen categories are found.
- If `'ignore'`, the unseen categories are encoded with the fill_value_unseen.
fill_value_unseen (int, float, None | Literal["mean"]): Fill value to use for unseen
categories. Defaults to `"mean"`, which will use the mean of the target variable.
missing_values (str): Strategy to handle missing values.
- If `'encode'`, missing values are initially replaced with a specified fill value and
the mean is computed as if it were a regular category.
- If `'ignore'`, missing values are left as is.
- If `'raise'`, an error is raised when missing values are found.
type_of_target (str): Type of the target variable.
- If `'auto'`, the type is inferred from the target variable.
- If `'binary'`, the target variable is binary.
- If `'multiclass'`, the target variable is multiclass.
Expand Down Expand Up @@ -104,12 +108,7 @@ def _calculate_target_statistic(
self, x_y: IntoFrameT, target_col: str, column: str
) -> dict:
mean_target_all_categories = (
x_y.group_by(column).agg(nw.col(target_col).mean()).rows(named=True)
x_y.group_by(column).agg(nw.col(target_col).mean()).rows()
)
mean_target = {}
for mean_target_per_category in mean_target_all_categories:
mean_target[mean_target_per_category[column]] = mean_target_per_category[
target_col
]

mean_target = dict(mean_target_all_categories)
return mean_target

0 comments on commit 0623334

Please sign in to comment.