diff --git a/2023/homework/Jun_Fan/homework_credit_scoring.ipynb b/2023/homework/Jun_Fan/homework_credit_scoring.ipynb
new file mode 100644
index 00000000..879aeb2b
--- /dev/null
+++ b/2023/homework/Jun_Fan/homework_credit_scoring.ipynb
@@ -0,0 +1,942 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# 一起来打怪之 Credit Scoring 练习"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "---\n",
+ "## 作业说明\n",
+ "\n",
+ "- 答题步骤:\n",
+ " - 回答问题**请保留每一步**操作过程,请不要仅仅给出最后答案\n",
+ " - 请养成代码注释的好习惯\n",
+ "\n",
+ "- 解题思路:\n",
+ " - 为方便大家准确理解题目,在习题实战中有所收获,本文档提供了解题思路提示\n",
+ " - 解题思路**仅供参考**,鼓励原创解题方法\n",
+ " - 为督促同学们自己思考,解题思路内容设置为**白色**,必要时请从冒号后拖动鼠标查看\n",
+ "\n",
+ "- 所用数据\n",
+ " - 请注意导入数据库后先**查看和了解数据的基本性质**,后面的问题不再一一提醒"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## machine learning for credit scoring\n",
+ "\n",
+ "\n",
+ "Banks play a crucial role in market economies. They decide who can get finance and on what terms and can make or break investment decisions. For markets and society to function, individuals and companies need access to credit. \n",
+ "\n",
+ "Credit scoring algorithms, which make a guess at the probability of default, are the method banks use to determine whether or not a loan should be granted. This competition requires participants to improve on the state of the art in credit scoring, by predicting the probability that somebody will experience financial distress in the next two years. [Dataset](https://www.kaggle.com/c/GiveMeSomeCredit)\n",
+ "\n",
+ "Attribute Information:\n",
+ "\n",
+ "|Variable Name\t|\tDescription\t|\tType|\n",
+ "|----|----|----|\n",
+ "|SeriousDlqin2yrs\t|\tPerson experienced 90 days past due delinquency or worse \t|\tY/N|\n",
+ "|RevolvingUtilizationOfUnsecuredLines\t|\tTotal balance on credit divided by the sum of credit limits\t|\tpercentage|\n",
+ "|age\t|\tAge of borrower in years\t|\tinteger|\n",
+ "|NumberOfTime30-59DaysPastDueNotWorse\t|\tNumber of times borrower has been 30-59 days past due |\tinteger|\n",
+ "|DebtRatio\t|\tMonthly debt payments\t|\tpercentage|\n",
+ "|MonthlyIncome\t|\tMonthly income\t|\treal|\n",
+ "|NumberOfOpenCreditLinesAndLoans\t|\tNumber of Open loans |\tinteger|\n",
+ "|NumberOfTimes90DaysLate\t|\tNumber of times borrower has been 90 days or more past due.\t|\tinteger|\n",
+ "|NumberRealEstateLoansOrLines\t|\tNumber of mortgage and real estate loans\t|\tinteger|\n",
+ "|NumberOfTime60-89DaysPastDueNotWorse\t|\tNumber of times borrower has been 60-89 days past due |integer|\n",
+ "|NumberOfDependents\t|\tNumber of dependents in family\t|\tinteger|\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "----------\n",
+ "## Read the data into Pandas "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " SeriousDlqin2yrs \n",
+ " RevolvingUtilizationOfUnsecuredLines \n",
+ " age \n",
+ " NumberOfTime30-59DaysPastDueNotWorse \n",
+ " DebtRatio \n",
+ " MonthlyIncome \n",
+ " NumberOfOpenCreditLinesAndLoans \n",
+ " NumberOfTimes90DaysLate \n",
+ " NumberRealEstateLoansOrLines \n",
+ " NumberOfTime60-89DaysPastDueNotWorse \n",
+ " NumberOfDependents \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 1 \n",
+ " 0.766127 \n",
+ " 45.0 \n",
+ " 2.0 \n",
+ " 0.802982 \n",
+ " 9120.0 \n",
+ " 13.0 \n",
+ " 0.0 \n",
+ " 6.0 \n",
+ " 0.0 \n",
+ " 2.0 \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 0 \n",
+ " 0.957151 \n",
+ " 40.0 \n",
+ " 0.0 \n",
+ " 0.121876 \n",
+ " 2600.0 \n",
+ " 4.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 0 \n",
+ " 0.658180 \n",
+ " 38.0 \n",
+ " 1.0 \n",
+ " 0.085113 \n",
+ " 3042.0 \n",
+ " 2.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 0 \n",
+ " 0.233810 \n",
+ " 30.0 \n",
+ " 0.0 \n",
+ " 0.036050 \n",
+ " 3300.0 \n",
+ " 5.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " 0 \n",
+ " 0.907239 \n",
+ " 49.0 \n",
+ " 1.0 \n",
+ " 0.024926 \n",
+ " 63588.0 \n",
+ " 7.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " SeriousDlqin2yrs RevolvingUtilizationOfUnsecuredLines age \\\n",
+ "0 1 0.766127 45.0 \n",
+ "1 0 0.957151 40.0 \n",
+ "2 0 0.658180 38.0 \n",
+ "3 0 0.233810 30.0 \n",
+ "4 0 0.907239 49.0 \n",
+ "\n",
+ " NumberOfTime30-59DaysPastDueNotWorse DebtRatio MonthlyIncome \\\n",
+ "0 2.0 0.802982 9120.0 \n",
+ "1 0.0 0.121876 2600.0 \n",
+ "2 1.0 0.085113 3042.0 \n",
+ "3 0.0 0.036050 3300.0 \n",
+ "4 1.0 0.024926 63588.0 \n",
+ "\n",
+ " NumberOfOpenCreditLinesAndLoans NumberOfTimes90DaysLate \\\n",
+ "0 13.0 0.0 \n",
+ "1 4.0 0.0 \n",
+ "2 2.0 1.0 \n",
+ "3 5.0 0.0 \n",
+ "4 7.0 0.0 \n",
+ "\n",
+ " NumberRealEstateLoansOrLines NumberOfTime60-89DaysPastDueNotWorse \\\n",
+ "0 6.0 0.0 \n",
+ "1 0.0 0.0 \n",
+ "2 0.0 0.0 \n",
+ "3 0.0 0.0 \n",
+ "4 1.0 0.0 \n",
+ "\n",
+ " NumberOfDependents \n",
+ "0 2.0 \n",
+ "1 1.0 \n",
+ "2 0.0 \n",
+ "3 0.0 \n",
+ "4 0.0 "
+ ]
+ },
+ "execution_count": 1,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "import warnings\n",
+ "warnings.filterwarnings(\"ignore\")\n",
+ "import pandas as pd\n",
+ "import numpy as np\n",
+ "pd.set_option('display.max_columns', 500)\n",
+ "import zipfile\n",
+ "with zipfile.ZipFile('KaggleCredit2.csv.zip', 'r') as z:\n",
+ " f = z.open('KaggleCredit2.csv')\n",
+ " data = pd.read_csv(f, index_col=0)\n",
+ "data.head()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(112915, 11)"
+ ]
+ },
+ "execution_count": 2,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "data.shape"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "------------\n",
+ "## Drop na"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "SeriousDlqin2yrs 0\n",
+ "RevolvingUtilizationOfUnsecuredLines 0\n",
+ "age 4267\n",
+ "NumberOfTime30-59DaysPastDueNotWorse 0\n",
+ "DebtRatio 0\n",
+ "MonthlyIncome 0\n",
+ "NumberOfOpenCreditLinesAndLoans 0\n",
+ "NumberOfTimes90DaysLate 0\n",
+ "NumberRealEstateLoansOrLines 0\n",
+ "NumberOfTime60-89DaysPastDueNotWorse 0\n",
+ "NumberOfDependents 4267\n",
+ "dtype: int64"
+ ]
+ },
+ "execution_count": 3,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "data.isnull().sum(axis=0)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(108648, 11)"
+ ]
+ },
+ "execution_count": 4,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "data.dropna(inplace=True)\n",
+ "data.shape"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "---------\n",
+ "## Create X and y"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "y = data['SeriousDlqin2yrs']\n",
+ "X = data.drop('SeriousDlqin2yrs', axis=1)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0.06742876076872101"
+ ]
+ },
+ "execution_count": 6,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "y.mean()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "---\n",
+ "## 练习1:把数据切分成训练集和测试集\n",
+ "- 提示:from sklearn.model_selection import train_test_split('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(81486, 10)"
+ ]
+ },
+ "execution_count": 7,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "## your code here\n",
+ "from sklearn.model_selection import train_test_split\n",
+ "x_train,x_test,y_train,y_test=train_test_split(X, y, \n",
+ " test_size=.25, \n",
+ " shuffle=True, \n",
+ " random_state=np.random.seed(1234))\n",
+ "x_train.shape\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "----\n",
+ "## 练习2:使用logistic regression/决策树/SVM/KNN...等sklearn分类算法进行分类\n",
+ "尝试查sklearn API了解模型参数含义,调整不同的参数"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Logistic regression\n",
+ "- 提示:from sklearn import linear_model('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "array([[-0.02491961, -0.32868909, 1.20804458, 0.42701328, -0.12762538,\n",
+ " -0.07764521, 0.2785668 , -0.43958502, -1.22289771, 0.13003486]])"
+ ]
+ },
+ "execution_count": 8,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "## your code here\n",
+ "import sklearn.linear_model\n",
+ "from sklearn.metrics import accuracy_score\n",
+ "from sklearn.preprocessing import StandardScaler\n",
+ "import matplotlib as plt\n",
+ "\n",
+ "#实例化一个标准函数类。\n",
+ "iden = StandardScaler()\n",
+ "#输入标准化模板\n",
+ "iden.fit(x_train)\n",
+ "#生成标准化的数组\n",
+ "X_train_std = iden.transform(x_train)\n",
+ "X_test_std = iden.transform(x_test)\n",
+ "\n",
+ "#使用模型计算\n",
+ "#linear_model.LogisticRegression()是一个类,下面的算法实现都是从类的实例化开始的。\n",
+ "#daul是选择惩罚函数形式,它需要传入一个bool值。\n",
+ "#C是惩罚项系数倒数,它决定了在惩罚项在损失函数的权重,c越大惩罚项对模型的影响就越小,过拟合倾向越大,欠拟合倾向越小。\n",
+ "#fit_intercept是否添加常数项\n",
+ "#random_state参数生成随机数的方式,可以是整数或numpy.random.randomState实例\n",
+ "LG=sklearn.linear_model.LogisticRegression(C=10.0, \n",
+ " max_iter=10,\n",
+ " random_state=np.random.seed())\n",
+ "LG.fit(X_train_std,y_train)\n",
+ "#输出预测结果\n",
+ "y_LG_pred=LG.predict(x_test)\n",
+ "y_LG_pred2=LG.predict(x_train)\n",
+ "\n",
+ "LG.coef_"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Decision Tree\n",
+ "- 提示:from sklearn.tree import DecisionTreeClassifier('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "## your code here\n",
+ "#导入决策树计算包\n",
+ "from sklearn.tree import DecisionTreeClassifier\n",
+ "#criterion:选择衡量分割质量的函数,有三种选择。\n",
+ "#max_depth:树的最大深度。\n",
+ "tree = DecisionTreeClassifier(criterion='gini',\n",
+ " max_depth=5,\n",
+ " random_state=0)\n",
+ "tree.fit(x_train, y_train)\n",
+ "\n",
+ "y_tree_pred=tree.predict(x_test)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "tags": []
+ },
+ "source": [
+ "### Random Forest\n",
+ "- 提示:from sklearn.ensemble import RandomForestClassifier('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "## your code here\n",
+ "from sklearn.ensemble import RandomForestClassifier\n",
+ "\n",
+ "#n_estimators:应是随机森林中tree的数目。\n",
+ "#n_jobs:并行训练多少个job。\n",
+ "forest = RandomForestClassifier(criterion='entropy', \n",
+ " n_estimators=10,\n",
+ " random_state=1,\n",
+ " n_jobs=2)\n",
+ "forest.fit(x_train, y_train)\n",
+ "\n",
+ "y_froest_pred=forest.predict(x_test)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "tags": []
+ },
+ "source": [
+ "### SVM\n",
+ "- 提示:from sklearn.svm import SVC('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "(81486, 10)\n"
+ ]
+ }
+ ],
+ "source": [
+ "from sklearn.svm import SVC\n",
+ "#这里模型跑的比较慢,由于是练习就选择了10000个数据训练,准确率也还行。\n",
+ "#kernel:支持向量机超平面分割的关机是额外维度的生成,kernel可以指定生成方式,下面采用线性模型。\n",
+ "\n",
+ "svm = SVC(kernel='linear',\n",
+ " random_state=0, \n",
+ " C=10.0)\n",
+ "print(X_train_std.shape)\n",
+ "svm.fit(X_train_std[0:10_000], y_train[0:10_000])\n",
+ "y_svm_pred=svm.predict(x_test)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "tags": []
+ },
+ "source": [
+ "### KNN\n",
+ "- 提示:from sklearn.neighbors import KNeighborsClassifier('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "## your code here\n",
+ "from sklearn.neighbors import KNeighborsClassifier\n",
+ "#n_neighbors:参与投票的点数\n",
+ "#P选择距离的衡量。1 Manhattan distance;2 Euclidean distance ;others:Minkowski distance \n",
+ "knn = KNeighborsClassifier(n_neighbors=5,\n",
+ " p=3,\n",
+ " metric='minkowski')\n",
+ "knn.fit(X_train_std, y_train)\n",
+ "y_knn_pred=knn.predict(x_test)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "---\n",
+ "\n",
+ "## 练习3:在测试集上进行预测,计算准确度"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Logistic regression\n",
+ "- 提示:y_pred_LR = clf_LR.predict(x_test)('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 13,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "train_accuracy: 0.93\n",
+ "test_Accuracy: 0.93\n"
+ ]
+ }
+ ],
+ "source": [
+ "## your code here\n",
+ "print(f'train_accuracy: {accuracy_score(y_train, y_LG_pred2):.2f}')\n",
+ "print(f'test_Accuracy: {accuracy_score(y_test, y_LG_pred):.2f}')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Decision Tree\n",
+ "- 提示:y_pred_tree = tree.predict(x_test)('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 14,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "test_Accuracy: 0.93\n"
+ ]
+ }
+ ],
+ "source": [
+ "## your code here\n",
+ "print(f'test_Accuracy: {accuracy_score(y_test, y_tree_pred):.2f}')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Random Forest\n",
+ "- 提示:y_pred_forest = forest.predict(x_test)('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 15,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "test_Accuracy: 0.93\n"
+ ]
+ }
+ ],
+ "source": [
+ "## your code here\n",
+ "print(f'test_Accuracy: {accuracy_score(y_test,y_froest_pred):.2f}')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### SVM\n",
+ "- 提示:y_pred_SVC = clf_svc.predict(x_test)('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 16,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "test_Accuracy: 0.93\n"
+ ]
+ }
+ ],
+ "source": [
+ "## your code here\n",
+ "print(f'test_Accuracy: {accuracy_score(y_test,y_svm_pred):.2f}')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### KNN\n",
+ "- 提示:y_pred_KNN = neigh.predict(x_test)('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 17,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "test_Accuracy: 0.93\n"
+ ]
+ }
+ ],
+ "source": [
+ "## your code here\n",
+ "print(f'test_Accuracy: {accuracy_score(y_test, y_knn_pred):.2f}')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "---\n",
+ "## 练习4:查看sklearn的官方说明,了解分类问题的评估标准,并对此例进行评估"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**混淆矩阵(Confusion Matrix)相关学习链接**\n",
+ "\n",
+ "- Blog: \n",
+ "http://blog.csdn.net/vesper305/article/details/44927047 \n",
+ "- WiKi: \n",
+ "http://en.wikipedia.org/wiki/Confusion_matrix \n",
+ "- sklearn doc: \n",
+ "http://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 18,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "confusion_matrix: [[25238 1]\n",
+ " [ 1923 0]]\n",
+ "confusion_matrix: [[25239 0]\n",
+ " [ 1923 0]]\n"
+ ]
+ },
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 18,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "## your code here\n",
+ "import matplotlib as plt\n",
+ "#导入混淆矩阵包\n",
+ "from sklearn.metrics import confusion_matrix,ConfusionMatrixDisplay,recall_score,precision_score\n",
+ "#计算Logistic regression混淆矩阵\n",
+ "cm = confusion_matrix(y_test, y_LG_pred , labels=LG.classes_)\n",
+ "print(f'confusion_matrix: {cm}')\n",
+ "#画出混淆矩阵\n",
+ "dis_LG = ConfusionMatrixDisplay(confusion_matrix=cm,\n",
+ " display_labels=LG.classes_)\n",
+ "dis_LG.plot()\n",
+ "\n",
+ "#计算SVM混淆矩阵\n",
+ "cm2 = confusion_matrix(y_test, y_svm_pred, labels=svm.classes_)\n",
+ "print(f'confusion_matrix: {cm2}')\n",
+ "dis_SVM = ConfusionMatrixDisplay(confusion_matrix=cm2,\n",
+ " display_labels=svm.classes_)\n",
+ "dis_SVM.plot()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "答:两个模型都是倾向输出0,0判为1是几乎没有的,同时1判断为1的几率也比较小,模型可能存在过拟合;准确率高的可能与测试集里的0与1的数量有关。"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 练习5:调整模型的标准\n",
+ "\n",
+ "银行通常会有更严格的要求,因为fraud带来的后果通常比较严重,一般我们会调整模型的标准。 \n",
+ "\n",
+ "比如在logistic regression当中,一般我们的概率判定边界为0.5,但是我们可以把阈值设定低一些,来提高模型的“敏感度”,试试看把阈值设定为0.3,再看看这时的评估指标(主要是准确率和召回率)。\n",
+ "\n",
+ "- 提示:sklearn的很多分类模型,predict_prob可以拿到预估的概率,可以根据它和设定的阈值大小去判断最终结果(分类类别)('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 19,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "test_Accuracy: 0.93\n",
+ "precision_score: 0.42\n",
+ "recall_score: 0.03\n"
+ ]
+ },
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 19,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "## your code here\n",
+ "LG2=sklearn.linear_model.LogisticRegression(C=10.0, \n",
+ " max_iter=10,\n",
+ " random_state=np.random.seed(0))\n",
+ "LG2.fit(X_train_std,y_train)\n",
+ "#输出预测结果\n",
+ "y_LG_pred3=1/(1+np.exp(-X_test_std @ LG2.coef_.T-LG2.intercept_))\n",
+ "#转化为DataFrame便于数据操作\n",
+ "df=pd.DataFrame(y_LG_pred3)\n",
+ "#设置概率判定边界\n",
+ "y_LG_pred3=df>=.3\n",
+ "#输出结果\n",
+ "print(f'test_Accuracy: {accuracy_score(y_test, y_LG_pred3):.2f}')\n",
+ "print(f'precision_score: {precision_score(y_test,y_LG_pred3 ):.2f}')\n",
+ "print(f'recall_score: {recall_score(y_test, y_LG_pred3):.2f}')\n",
+ "#画出混淆矩阵\n",
+ "cm3 = confusion_matrix(y_test, y_LG_pred3, labels=LG.classes_)\n",
+ "dis_SVM = ConfusionMatrixDisplay(confusion_matrix=cm3,\n",
+ " display_labels=LG.classes_)\n",
+ "dis_SVM.plot()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "python3.7(base)",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.7.11"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/2023/homework/Jun_Fan/homework_credit_scoring_finetune_ensemble.ipynb b/2023/homework/Jun_Fan/homework_credit_scoring_finetune_ensemble.ipynb
new file mode 100644
index 00000000..627f1862
--- /dev/null
+++ b/2023/homework/Jun_Fan/homework_credit_scoring_finetune_ensemble.ipynb
@@ -0,0 +1,1135 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# 一起来打怪之 Credit Scoring 练习"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "jp-MarkdownHeadingCollapsed": true,
+ "tags": []
+ },
+ "source": [
+ "-------\n",
+ "## >>>说明:\n",
+ "### 1. 答题步骤:\n",
+ "- 回答问题**请保留每一步**操作过程,请不要仅仅给出最后答案\n",
+ "- 请养成代码注释的好习惯\n",
+ "\n",
+ "### 2. 解题思路:\n",
+ "- 为方便大家准确理解题目,在习题实战中有所收获,本文档提供了解题思路提示\n",
+ "- 解题思路**仅供参考**,鼓励原创解题方法\n",
+ "- 为督促同学们自己思考,解题思路内容设置为**注释**,请注意查看\n",
+ "\n",
+ "### 3. 所用数据:\n",
+ "- 问题使用了多个数据库,请注意导入每个数据库后都先**查看和了解数据的基本性质**,后面的问题不再一一提醒"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "--------\n",
+ "## 操作题"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### 信用卡欺诈项目"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ " #### 前期数据导入,预览及处理(此部分勿修改,涉及的数据文件无需复制移动)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 64,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " SeriousDlqin2yrs \n",
+ " RevolvingUtilizationOfUnsecuredLines \n",
+ " age \n",
+ " NumberOfTime30-59DaysPastDueNotWorse \n",
+ " DebtRatio \n",
+ " MonthlyIncome \n",
+ " NumberOfOpenCreditLinesAndLoans \n",
+ " NumberOfTimes90DaysLate \n",
+ " NumberRealEstateLoansOrLines \n",
+ " NumberOfTime60-89DaysPastDueNotWorse \n",
+ " NumberOfDependents \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 1 \n",
+ " 0.766127 \n",
+ " 45.0 \n",
+ " 2.0 \n",
+ " 0.802982 \n",
+ " 9120.0 \n",
+ " 13.0 \n",
+ " 0.0 \n",
+ " 6.0 \n",
+ " 0.0 \n",
+ " 2.0 \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 0 \n",
+ " 0.957151 \n",
+ " 40.0 \n",
+ " 0.0 \n",
+ " 0.121876 \n",
+ " 2600.0 \n",
+ " 4.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 0 \n",
+ " 0.658180 \n",
+ " 38.0 \n",
+ " 1.0 \n",
+ " 0.085113 \n",
+ " 3042.0 \n",
+ " 2.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 0 \n",
+ " 0.233810 \n",
+ " 30.0 \n",
+ " 0.0 \n",
+ " 0.036050 \n",
+ " 3300.0 \n",
+ " 5.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " 0 \n",
+ " 0.907239 \n",
+ " 49.0 \n",
+ " 1.0 \n",
+ " 0.024926 \n",
+ " 63588.0 \n",
+ " 7.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " SeriousDlqin2yrs RevolvingUtilizationOfUnsecuredLines age \\\n",
+ "0 1 0.766127 45.0 \n",
+ "1 0 0.957151 40.0 \n",
+ "2 0 0.658180 38.0 \n",
+ "3 0 0.233810 30.0 \n",
+ "4 0 0.907239 49.0 \n",
+ "\n",
+ " NumberOfTime30-59DaysPastDueNotWorse DebtRatio MonthlyIncome \\\n",
+ "0 2.0 0.802982 9120.0 \n",
+ "1 0.0 0.121876 2600.0 \n",
+ "2 1.0 0.085113 3042.0 \n",
+ "3 0.0 0.036050 3300.0 \n",
+ "4 1.0 0.024926 63588.0 \n",
+ "\n",
+ " NumberOfOpenCreditLinesAndLoans NumberOfTimes90DaysLate \\\n",
+ "0 13.0 0.0 \n",
+ "1 4.0 0.0 \n",
+ "2 2.0 1.0 \n",
+ "3 5.0 0.0 \n",
+ "4 7.0 0.0 \n",
+ "\n",
+ " NumberRealEstateLoansOrLines NumberOfTime60-89DaysPastDueNotWorse \\\n",
+ "0 6.0 0.0 \n",
+ "1 0.0 0.0 \n",
+ "2 0.0 0.0 \n",
+ "3 0.0 0.0 \n",
+ "4 1.0 0.0 \n",
+ "\n",
+ " NumberOfDependents \n",
+ "0 2.0 \n",
+ "1 1.0 \n",
+ "2 0.0 \n",
+ "3 0.0 \n",
+ "4 0.0 "
+ ]
+ },
+ "execution_count": 64,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "import warnings\n",
+ "warnings.filterwarnings(\"ignore\")\n",
+ "import pandas as pd\n",
+ "import numpy as np\n",
+ "pd.set_option('display.max_columns', 500)\n",
+ "import zipfile\n",
+ "with zipfile.ZipFile('KaggleCredit2.csv.zip', 'r') as z:\n",
+ " f = z.open('KaggleCredit2.csv')\n",
+ " data = pd.read_csv(f, index_col=0)\n",
+ "data.head()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 65,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(112915, 11)"
+ ]
+ },
+ "execution_count": 65,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# 检查数据维度\n",
+ "data.shape"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 66,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "SeriousDlqin2yrs 0\n",
+ "RevolvingUtilizationOfUnsecuredLines 0\n",
+ "age 4267\n",
+ "NumberOfTime30-59DaysPastDueNotWorse 0\n",
+ "DebtRatio 0\n",
+ "MonthlyIncome 0\n",
+ "NumberOfOpenCreditLinesAndLoans 0\n",
+ "NumberOfTimes90DaysLate 0\n",
+ "NumberRealEstateLoansOrLines 0\n",
+ "NumberOfTime60-89DaysPastDueNotWorse 0\n",
+ "NumberOfDependents 4267\n",
+ "dtype: int64"
+ ]
+ },
+ "execution_count": 66,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# 查看数据缺失值情况\n",
+ "data.isnull().sum(axis=0)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 67,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "# 清除缺失值\n",
+ "data.dropna(inplace=True)\n",
+ "data.shapey = data['SeriousDlqin2yrs']\n",
+ "X = data.drop('SeriousDlqin2yrs', axis=1)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 68,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0.06742876076872101"
+ ]
+ },
+ "execution_count": 68,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# 取出对应的X和y\n",
+ "y = data['SeriousDlqin2yrs']\n",
+ "X = data.drop('SeriousDlqin2yrs', axis=1)\n",
+ "# 查看平均的欺诈率\n",
+ "y.mean()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### 以下为操作题"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### 1.把数据切分成训练集和测试集"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 69,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(81486, 10)"
+ ]
+ },
+ "execution_count": 69,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# 提示:查看train_test_split函数\n",
+ "import matplotlib as plt\n",
+ "import seaborn as sns\n",
+ "from sklearn.model_selection import train_test_split\n",
+ "x_train,x_test,y_train,y_test=train_test_split(X, y, \n",
+ " test_size=.25, \n",
+ " shuffle=True, \n",
+ " random_state=np.random.seed(1234))\n",
+ "x_train.shape"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 70,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "0 101322\n",
+ "1 7326\n",
+ "Name: SeriousDlqin2yrs, dtype: int64\n"
+ ]
+ },
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 70,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "# 通过SeriousDlqin2yrs字段查看正负样本分布情况\n",
+ "# 提示:value_counts\n",
+ "print(data['SeriousDlqin2yrs'].value_counts())\n",
+ "\n",
+ "# 绘制两种类别的柱状图\n",
+ "# 提示:dataframe可以直接plot(kind='bar')\n",
+ "num_0,num_1=data['SeriousDlqin2yrs'].value_counts()\n",
+ "sns.barplot(data,y=[num_0,num_1],x=data['SeriousDlqin2yrs'].unique())\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### 2.数据预处理之离散化"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 71,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "RevolvingUtilizationOfUnsecuredLines float64\n",
+ "age float64\n",
+ "NumberOfTime30-59DaysPastDueNotWorse float64\n",
+ "DebtRatio float64\n",
+ "MonthlyIncome float64\n",
+ "NumberOfOpenCreditLinesAndLoans float64\n",
+ "NumberOfTimes90DaysLate float64\n",
+ "NumberRealEstateLoansOrLines float64\n",
+ "NumberOfTime60-89DaysPastDueNotWorse float64\n",
+ "NumberOfDependents float64\n",
+ "age_label category\n",
+ "dtype: object"
+ ]
+ },
+ "execution_count": 71,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "# 请对年龄按照3岁一个区间进行离散化\n",
+ "# 提示:可以先计算出分桶边界,再基于pandas的cut函数进行离散化(分箱、分桶)\n",
+ "#画出分布情况\n",
+ "sns.histplot(x_train['age'],bins='auto')\n",
+ "sns.histplot(x_test['age'],bins='auto')\n",
+ "#计算边界\n",
+ "low=data['age'].min()\n",
+ "upp=data['age'].max()\n",
+ "block=np.arange(low,upp,3,dtype=np.int8)\n",
+ "#进行分桶\n",
+ "x_train['age_label']=pd.cut(x_train['age'],bins=block)\n",
+ "x_test['age_label']=pd.cut(x_test['age'],bins=block)\n",
+ "x_test.dtypes"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### 3.数据预处理之独热向量编码"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 77,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "(81486, 10)\n"
+ ]
+ }
+ ],
+ "source": [
+ "# 请对上述分箱后的年龄段进行独热向量编码\n",
+ "# 提示:使用pandas的get_dummies完成\n",
+ "#这里一开始把年龄段并入原始数据,由于年龄就是一个很好特征,第二遍跑就没把它扔进去,若想并入对应把下面注释去掉并把最后两行注释掉。\n",
+ "#对age_label列进行独热向量编码\n",
+ "age_label_vt=pd.get_dummies(x_train['age_label'], columns=x_train['age_label'].unique())\n",
+ "#合并到原来数据集中\n",
+ "#x_train=pd.concat([x_train,age_label_vt],axis=1)\n",
+ "#x_train=x_train.drop(['age_label','age'],axis=1)\n",
+ "print(x_train.shape)\n",
+ "#对测试集同样操作\n",
+ "age_label_vt=pd.get_dummies(x_test['age_label'], columns=x_test['age_label'].unique())\n",
+ "age_label_vt.head()\n",
+ "#x_test=pd.concat([x_test,age_label_vt],axis=1)\n",
+ "#x_test=x_test.drop(['age_label','age'],axis=1)\n",
+ "x_train=x_train.drop(['age_label'],axis=1)\n",
+ "x_test=x_test.drop(['age_label'],axis=1)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "tags": []
+ },
+ "source": [
+ "#### 4.数据预处理之幅度缩放"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 78,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(81486, 9)"
+ ]
+ },
+ "execution_count": 78,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# 请对连续值特征进行幅度缩放\n",
+ "# 提示:可以使用StandardScaler等幅度缩放器进行处理\n",
+ "from sklearn.preprocessing import StandardScaler\n",
+ "sc=StandardScaler()\n",
+ "sc.fit(x_train)\n",
+ "x_train_std=sc.transform(x_train)\n",
+ "x_test_std=sc.transform(x_test)\n",
+ "x_train_std.shape"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### 5.使用logistic regression建模,并且输出一下系数,分析重要度。 "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 79,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "[[-0.01389386 1.11481892 0.39271202 -0.18026142 -0.10258655 0.22288693\n",
+ " -0.39130612 -1.20356629 0.14438634]]\n"
+ ]
+ },
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 79,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "# 提示:fit建模,建完模之后可以取出coef属性\n",
+ "from sklearn.linear_model import LogisticRegression\n",
+ "#logistic regression建模\n",
+ "LG=LogisticRegression(C=10.0, \n",
+ " max_iter=10,\n",
+ " random_state=np.random.seed())\n",
+ "LG.fit(x_train_std,y_train)\n",
+ "#输出预测结果\n",
+ "y_LG_pred=LG.predict(x_test)\n",
+ "\n",
+ "print(LG.coef_)\n",
+ "sns.barplot(x=np.arange(len(LG.coef_[0])),y=np.abs(LG.coef_[0]))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### 6.使用网格搜索交叉验证进行调参\n",
+ "调整penalty和C参数,其中penalty候选为\"l1\"和\"l2\",C的候选为[1,10,100,500]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 80,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "# 提示:先按照上面要求准备好网格字典,再使用GridSearchCV进行调参\n",
+ "from sklearn.model_selection import GridSearchCV\n",
+ "#准备网格字典\n",
+ "param_grid = {'C':[1,10,100,500],\n",
+ " 'penalty':['l1','l2']}"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 81,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "best cross-validation accuracy: 0.9343445506974841\n",
+ "test set score: 0.9305279434504087\n",
+ "best parameters: {'C': 10, 'penalty': 'l2'}\n"
+ ]
+ }
+ ],
+ "source": [
+ "# 输出最好的超参数\n",
+ "# 输出最好的模型\n",
+ "\n",
+ "#创建网格实例\n",
+ "grid = GridSearchCV(LogisticRegression(), param_grid=param_grid, cv=5)\n",
+ "grid.fit(x_train_std,y_train)\n",
+ "print(\"best cross-validation accuracy:\", grid.best_score_)\n",
+ "print(\"test set score: \", grid.score(x_test_std, y_test))\n",
+ "print(\"best parameters: \", grid.best_params_)\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### 7.在测试集上进行预测,计算 查准率/查全率/auc/混淆矩阵/f1值 等测试指标"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 82,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "confusion_matrix: [[25238 1]\n",
+ " [ 1923 0]]\n",
+ "test_Accuracy: 0.93\n",
+ "precision_score: 0.00\n",
+ "recall_score: 0.00\n",
+ "f1_value_score: 0.00\n",
+ "roc_auc_score: 0.50\n"
+ ]
+ },
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 82,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "# 提示:在测试集上预测可以使用predict\n",
+ "# 提示:各种指标可以在sklearn.metrics中查到各种评估指标,分别是accuracy_score、recall_score、auc、confusion_matrix、f1_score\n",
+ "\n",
+ "from sklearn.metrics import confusion_matrix,ConfusionMatrixDisplay,recall_score,precision_score,accuracy_score,f1_score,roc_auc_score\n",
+ "#将上面得到的超参数代入重新训练模型\n",
+ "LG2=LogisticRegression(penalty='l2',\n",
+ " C=10.0, \n",
+ " max_iter=10,\n",
+ " random_state=np.random.seed())\n",
+ "LG2.fit(x_train_std,y_train)\n",
+ "#输出预测结果\n",
+ "y_LG_pred=LG2.predict(x_test)\n",
+ "#计算混淆矩阵\n",
+ "cm = confusion_matrix(y_test, y_LG_pred , labels=LG.classes_)\n",
+ "#显示各种评估指标值\n",
+ "print(f'confusion_matrix: {cm}')\n",
+ "print(f'test_Accuracy: {accuracy_score(y_test, y_LG_pred):.2f}')\n",
+ "print(f'precision_score: {precision_score(y_test,y_LG_pred):.2f}')\n",
+ "print(f'recall_score: {recall_score(y_test, y_LG_pred):.2f}')\n",
+ "print(f'f1_value_score: {f1_score(y_test, y_LG_pred):.2f}')\n",
+ "print(f'roc_auc_score: {roc_auc_score(y_test, y_LG_pred):.2f}')\n",
+ "#可视化混淆矩阵\n",
+ "dis_LG = ConfusionMatrixDisplay(confusion_matrix=cm,\n",
+ " display_labels=LG.classes_)\n",
+ "dis_LG.plot()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### 8.更多优化\n",
+ "银行通常会有更严格的要求,因为欺诈带来的后果通常比较严重,一般我们会调整模型的标准。 \n",
+ "\n",
+ "比如在logistic regression当中,一般我们的概率判定边界为0.5,但是我们可以把阈值设定低一些,来提高模型的“敏感度” \n",
+ "试试看把阈值设定为0.3,再看看这个时候的混淆矩阵等评估指标。"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 83,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "threshold=0.1_test_Accuracy: 0.93\n",
+ "threshold=0.1_confusion_matrix: [[25238 1]\n",
+ " [ 1923 0]]\n",
+ "threshold=0.2_test_Accuracy: 0.93\n",
+ "threshold=0.2_confusion_matrix: [[25238 1]\n",
+ " [ 1923 0]]\n",
+ "threshold=0.3_test_Accuracy: 0.93\n",
+ "threshold=0.3_confusion_matrix: [[25238 1]\n",
+ " [ 1923 0]]\n",
+ "threshold=0.4_test_Accuracy: 0.93\n",
+ "threshold=0.4_confusion_matrix: [[25238 1]\n",
+ " [ 1923 0]]\n",
+ "threshold=0.5_test_Accuracy: 0.93\n",
+ "threshold=0.5_confusion_matrix: [[25238 1]\n",
+ " [ 1923 0]]\n",
+ "threshold=0.6_test_Accuracy: 0.93\n",
+ "threshold=0.6_confusion_matrix: [[25238 1]\n",
+ " [ 1923 0]]\n",
+ "threshold=0.7_test_Accuracy: 0.93\n",
+ "threshold=0.7_confusion_matrix: [[25238 1]\n",
+ " [ 1923 0]]\n",
+ "threshold=0.8_test_Accuracy: 0.93\n",
+ "threshold=0.8_confusion_matrix: [[25238 1]\n",
+ " [ 1923 0]]\n",
+ "threshold=0.9_test_Accuracy: 0.93\n",
+ "threshold=0.9_confusion_matrix: [[25238 1]\n",
+ " [ 1923 0]]\n"
+ ]
+ }
+ ],
+ "source": [
+ "# 提示:thresholds = [0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9]\n",
+ "# 根据predict_proba的结果和threshold的比较确定结果,再评估各种结果指标\n",
+ "\n",
+ "thresholds = [0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9]\n",
+ "for th in thresholds:\n",
+ " y_LG_pred=LG2.predict_proba(x_test)\n",
+ " y_LG_pred=y_LG_pred>=th\n",
+ " cm = confusion_matrix(y_test, y_LG_pred.argmax(axis=1), labels=LG.classes_)\n",
+ " print(f'threshold={th}_test_Accuracy: {accuracy_score(y_test, y_LG_pred.argmax(axis=1)):.2f}')\n",
+ " print(f'threshold={th}_confusion_matrix: {cm}')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### 9.尝试对不同特征的重要度进行排序,通过特征选择的方式,对特征进行筛选。并重新建模,观察此时的模型准确率等评估指标。"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 84,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "# 可以根据逻辑回归的系数绝对值大小进行排序,也可以基于树模型的特征重要度进行排序\n",
+ "# 特征选择可以使用RFE或者selectFromModel\n",
+ "from sklearn.ensemble import RandomForestClassifier\n",
+ "from sklearn.feature_selection import SelectFromModel\n",
+ "forest = RandomForestClassifier(criterion='entropy', \n",
+ " n_estimators=10,\n",
+ " random_state=1,\n",
+ " n_jobs=2)\n",
+ "forest.fit(x_train, y_train)\n",
+ "importances = forest.feature_importances_\n",
+ "indices = np.argsort(importances)[::-1]\n",
+ "x_label=[]\n",
+ "for i in range(len(indices)):\n",
+ " x_label.append(str(indices[i]))\n",
+ "\n",
+ "sns.barplot(x=x_label,y=np.sort(importances)[::-1])\n",
+ "selector = SelectFromModel(estimator=forest,prefit=True)\n",
+ "#print(selector.get_support())\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 85,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "confusion_matrix: [[24989 250]\n",
+ " [ 1810 113]]\n",
+ "test_Accuracy: 0.92\n",
+ "precision_score: 0.31\n",
+ "recall_score: 0.06\n",
+ "f1_value_score: 0.10\n",
+ "roc_auc_score: 0.52\n"
+ ]
+ },
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 85,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "x_train2=x_train[:][x_train.keys()[selector.get_support().T]]\n",
+ "x_test2=x_test[:][x_test.keys()[selector.get_support().T]]\n",
+ "sc=StandardScaler()\n",
+ "sc.fit(x_train2)\n",
+ "x_train_std2=sc.transform(x_train2)\n",
+ "x_test_std2=sc.transform(x_test2)\n",
+ "forest = RandomForestClassifier(criterion='entropy', \n",
+ " n_estimators=10,\n",
+ " random_state=1,\n",
+ " n_jobs=2)\n",
+ "forest.fit(x_train2, y_train)\n",
+ "y_forest_pred=forest.predict(x_test2)\n",
+ "#计算混淆矩阵\n",
+ "cm = confusion_matrix(y_test, y_forest_pred , labels=forest.classes_)\n",
+ "\n",
+ "print(f'confusion_matrix: {cm}')\n",
+ "print(f'test_Accuracy: {accuracy_score(y_test, y_forest_pred):.2f}')\n",
+ "print(f'precision_score: {precision_score(y_test,y_forest_pred):.2f}')\n",
+ "print(f'recall_score: {recall_score(y_test, y_forest_pred):.2f}')\n",
+ "print(f'f1_value_score: {f1_score(y_test, y_forest_pred):.2f}')\n",
+ "print(f'roc_auc_score: {roc_auc_score(y_test, y_forest_pred):.2f}')\n",
+ "#可视化混淆矩阵\n",
+ "dis_LG = ConfusionMatrixDisplay(confusion_matrix=cm,\n",
+ " display_labels=forest.classes_)\n",
+ "dis_LG.plot()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### 10.其他模型算法尝试\n",
+ "使用RandomForestClassifier/SVM/KNN等sklearn分类算法进行分类,尝试上述超参数调优算法过程。"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 86,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "best cross-validation accuracy: 0.9365780687761559\n",
+ "test set score: 0.9324791988807893\n",
+ "best parameters: {'RandomForest__max_depth': 7, 'RandomForest__n_estimators': 8}\n"
+ ]
+ },
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 86,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "# 随机森林\n",
+ "#下面模型都没有剪特征\n",
+ "from sklearn.pipeline import Pipeline\n",
+ "from sklearn.ensemble import RandomForestClassifier\n",
+ "from sklearn.preprocessing import MinMaxScaler\n",
+ "\n",
+ "#用pipline将标准化操作与随机森林模型组合起来\n",
+ "pipe = Pipeline([(\"scaler\", MinMaxScaler()), (\"RandomForest\", RandomForestClassifier())])\n",
+ "#给出网格交叉验证的超参数及它们的取值(由于时间关系,这里只简单计算了一下。)\n",
+ "param_grid = {'RandomForest__n_estimators': [7,8,9],'RandomForest__max_depth':[5,6,7]}\n",
+ "#网格搜索\n",
+ "grid2 = GridSearchCV(pipe, param_grid=param_grid, cv=5)\n",
+ "grid2.fit(x_train, y_train)\n",
+ "#给出预测\n",
+ "y_froest_pred=grid2.predict(x_test)\n",
+ "#求出混淆矩阵\n",
+ "cm2 = confusion_matrix(y_test, y_froest_pred )\n",
+ "dis_forest = ConfusionMatrixDisplay(confusion_matrix=cm2)\n",
+ "#显示具体参数\n",
+ "print(\"best cross-validation accuracy:\", grid2.best_score_)\n",
+ "print(\"test set score: \", grid2.score(x_test, y_test))\n",
+ "print(\"best parameters: \", grid2.best_params_)\n",
+ "dis_forest.plot()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 88,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "best cross-validation accuracy: 0.9317\n",
+ "test set score: 0.9306383918709963\n",
+ "best parameters: {'svm__C': 100, 'svm__kernel': 'rbf'}\n"
+ ]
+ },
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 88,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "\n",
+ "# 支持向量机\n",
+ "from sklearn.svm import SVC\n",
+ "#用pipline将标准化操作与支持向量机模型组合起来\n",
+ "pipe = Pipeline([(\"scaler\", MinMaxScaler()), (\"svm\", SVC())])\n",
+ "#给出网格交叉验证的超参数及它们的取值\n",
+ "param_grid = {'svm__kernel': ['linear','rbf'],'svm__C':[0.1,1,10,100]}\n",
+ "#网格搜索\n",
+ "grid_svm = GridSearchCV(pipe, param_grid=param_grid, cv=5)\n",
+ "grid_svm.fit(x_train[:10000], y_train[:10000])\n",
+ "#给出预测\n",
+ "y_svm_pred=grid_svm.predict(x_test)\n",
+ "#求出混淆矩阵\n",
+ "cm_svm = confusion_matrix(y_test, y_svm_pred)\n",
+ "dis_svm = ConfusionMatrixDisplay(confusion_matrix=cm_svm)\n",
+ "#显示具体参数\n",
+ "print(\"best cross-validation accuracy:\", grid_svm.best_score_)\n",
+ "print(\"test set score: \", grid_svm.score(x_test, y_test))\n",
+ "print(\"best parameters: \", grid_svm.best_params_)\n",
+ "dis_svm.plot()\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 89,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "best cross-validation accuracy: 0.9305\n",
+ "test set score: 0.9290184817023783\n",
+ "best parameters: {'knn__n_neighbors': 8, 'knn__p': 1}\n"
+ ]
+ },
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 89,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "# K最近邻\n",
+ "from sklearn.neighbors import KNeighborsClassifier\n",
+ "#用pipline将标准化操作与 K最近邻组合起来\n",
+ "pipe = Pipeline([(\"scaler\", MinMaxScaler()), (\"knn\", KNeighborsClassifier())])\n",
+ "#给出网格交叉验证的超参数及它们的取值\n",
+ "param_grid = {'knn__n_neighbors': [5,6,7,8],'knn__p':[1,2,3]}\n",
+ "#网格搜索\n",
+ "grid_knn = GridSearchCV(pipe, param_grid=param_grid, cv=5)\n",
+ "grid_knn.fit(x_train[:10000], y_train[:10000])\n",
+ "#给出预测\n",
+ "y_knn_pred=grid_knn.predict(x_test)\n",
+ "#求出混淆矩阵\n",
+ "cm_knn = confusion_matrix(y_test, y_knn_pred)\n",
+ "dis_knn = ConfusionMatrixDisplay(confusion_matrix=cm_svm)\n",
+ "#显示具体参数\n",
+ "print(\"best cross-validation accuracy:\", grid_knn.best_score_)\n",
+ "print(\"test set score: \", grid_knn.score(x_test, y_test))\n",
+ "print(\"best parameters: \", grid_knn.best_params_)\n",
+ "dis_knn.plot()\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "python3.7(base)",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.7.11"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/2023/homework/Meng_Xiaofeng/homework-matplotlib.ipynb b/2023/homework/Meng_Xiaofeng/homework-matplotlib.ipynb
index a77a2a4f..02aab147 100644
--- a/2023/homework/Meng_Xiaofeng/homework-matplotlib.ipynb
+++ b/2023/homework/Meng_Xiaofeng/homework-matplotlib.ipynb
@@ -473,7 +473,7 @@
" \n",
" # flowers = ['setosa', 'versicolor', 'virginica']\n",
" # flower_color = ['g', 'r', 'k']\n",
- " for i in range(0, len(data.species)-1):\n",
+ " for i in range(0, len(data.species)):\n",
" if (data.species[i] == 'setosa'):\n",
" ax.scatter(x_data[i], y_data[i], color='g') # 绿色点代表 setosa\n",
" if (data.species[i] == 'versicolor'):\n",
diff --git a/2023/homework/PingShen/homework_credit_scoring.ipynb b/2023/homework/PingShen/homework_credit_scoring.ipynb
new file mode 100644
index 00000000..6ec26829
--- /dev/null
+++ b/2023/homework/PingShen/homework_credit_scoring.ipynb
@@ -0,0 +1,1134 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# 一起来打怪之 Credit Scoring 练习"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "---\n",
+ "## 作业说明\n",
+ "\n",
+ "- 答题步骤:\n",
+ " - 回答问题**请保留每一步**操作过程,请不要仅仅给出最后答案\n",
+ " - 请养成代码注释的好习惯\n",
+ "\n",
+ "- 解题思路:\n",
+ " - 为方便大家准确理解题目,在习题实战中有所收获,本文档提供了解题思路提示\n",
+ " - 解题思路**仅供参考**,鼓励原创解题方法\n",
+ " - 为督促同学们自己思考,解题思路内容设置为**白色**,必要时请从冒号后拖动鼠标查看\n",
+ "\n",
+ "- 所用数据\n",
+ " - 请注意导入数据库后先**查看和了解数据的基本性质**,后面的问题不再一一提醒"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## machine learning for credit scoring\n",
+ "\n",
+ "\n",
+ "Banks play a crucial role in market economies. They decide who can get finance and on what terms and can make or break investment decisions. For markets and society to function, individuals and companies need access to credit. \n",
+ "\n",
+ "Credit scoring algorithms, which make a guess at the probability of default, are the method banks use to determine whether or not a loan should be granted. This competition requires participants to improve on the state of the art in credit scoring, by predicting the probability that somebody will experience financial distress in the next two years. [Dataset](https://www.kaggle.com/c/GiveMeSomeCredit)\n",
+ "\n",
+ "Attribute Information:\n",
+ "\n",
+ "|Variable Name\t|\tDescription\t|\tType|\n",
+ "|----|----|----|\n",
+ "|SeriousDlqin2yrs\t|\tPerson experienced 90 days past due delinquency or worse \t|\tY/N|\n",
+ "|RevolvingUtilizationOfUnsecuredLines\t|\tTotal balance on credit divided by the sum of credit limits\t|\tpercentage|\n",
+ "|age\t|\tAge of borrower in years\t|\tinteger|\n",
+ "|NumberOfTime30-59DaysPastDueNotWorse\t|\tNumber of times borrower has been 30-59 days past due |\tinteger|\n",
+ "|DebtRatio\t|\tMonthly debt payments\t|\tpercentage|\n",
+ "|MonthlyIncome\t|\tMonthly income\t|\treal|\n",
+ "|NumberOfOpenCreditLinesAndLoans\t|\tNumber of Open loans |\tinteger|\n",
+ "|NumberOfTimes90DaysLate\t|\tNumber of times borrower has been 90 days or more past due.\t|\tinteger|\n",
+ "|NumberRealEstateLoansOrLines\t|\tNumber of mortgage and real estate loans\t|\tinteger|\n",
+ "|NumberOfTime60-89DaysPastDueNotWorse\t|\tNumber of times borrower has been 60-89 days past due |integer|\n",
+ "|NumberOfDependents\t|\tNumber of dependents in family\t|\tinteger|\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "----------\n",
+ "## Read the data into Pandas "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " SeriousDlqin2yrs \n",
+ " RevolvingUtilizationOfUnsecuredLines \n",
+ " age \n",
+ " NumberOfTime30-59DaysPastDueNotWorse \n",
+ " DebtRatio \n",
+ " MonthlyIncome \n",
+ " NumberOfOpenCreditLinesAndLoans \n",
+ " NumberOfTimes90DaysLate \n",
+ " NumberRealEstateLoansOrLines \n",
+ " NumberOfTime60-89DaysPastDueNotWorse \n",
+ " NumberOfDependents \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 1 \n",
+ " 0.766127 \n",
+ " 45.0 \n",
+ " 2.0 \n",
+ " 0.802982 \n",
+ " 9120.0 \n",
+ " 13.0 \n",
+ " 0.0 \n",
+ " 6.0 \n",
+ " 0.0 \n",
+ " 2.0 \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 0 \n",
+ " 0.957151 \n",
+ " 40.0 \n",
+ " 0.0 \n",
+ " 0.121876 \n",
+ " 2600.0 \n",
+ " 4.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 0 \n",
+ " 0.658180 \n",
+ " 38.0 \n",
+ " 1.0 \n",
+ " 0.085113 \n",
+ " 3042.0 \n",
+ " 2.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 0 \n",
+ " 0.233810 \n",
+ " 30.0 \n",
+ " 0.0 \n",
+ " 0.036050 \n",
+ " 3300.0 \n",
+ " 5.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " 0 \n",
+ " 0.907239 \n",
+ " 49.0 \n",
+ " 1.0 \n",
+ " 0.024926 \n",
+ " 63588.0 \n",
+ " 7.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " SeriousDlqin2yrs RevolvingUtilizationOfUnsecuredLines age \\\n",
+ "0 1 0.766127 45.0 \n",
+ "1 0 0.957151 40.0 \n",
+ "2 0 0.658180 38.0 \n",
+ "3 0 0.233810 30.0 \n",
+ "4 0 0.907239 49.0 \n",
+ "\n",
+ " NumberOfTime30-59DaysPastDueNotWorse DebtRatio MonthlyIncome \\\n",
+ "0 2.0 0.802982 9120.0 \n",
+ "1 0.0 0.121876 2600.0 \n",
+ "2 1.0 0.085113 3042.0 \n",
+ "3 0.0 0.036050 3300.0 \n",
+ "4 1.0 0.024926 63588.0 \n",
+ "\n",
+ " NumberOfOpenCreditLinesAndLoans NumberOfTimes90DaysLate \\\n",
+ "0 13.0 0.0 \n",
+ "1 4.0 0.0 \n",
+ "2 2.0 1.0 \n",
+ "3 5.0 0.0 \n",
+ "4 7.0 0.0 \n",
+ "\n",
+ " NumberRealEstateLoansOrLines NumberOfTime60-89DaysPastDueNotWorse \\\n",
+ "0 6.0 0.0 \n",
+ "1 0.0 0.0 \n",
+ "2 0.0 0.0 \n",
+ "3 0.0 0.0 \n",
+ "4 1.0 0.0 \n",
+ "\n",
+ " NumberOfDependents \n",
+ "0 2.0 \n",
+ "1 1.0 \n",
+ "2 0.0 \n",
+ "3 0.0 \n",
+ "4 0.0 "
+ ]
+ },
+ "execution_count": 1,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "import pandas as pd\n",
+ "pd.set_option('display.max_columns', 500)\n",
+ "import zipfile\n",
+ "with zipfile.ZipFile('KaggleCredit2.csv.zip', 'r') as z:\n",
+ " f = z.open('KaggleCredit2.csv')\n",
+ " data = pd.read_csv(f, index_col=0)\n",
+ "data.head()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(112915, 11)"
+ ]
+ },
+ "execution_count": 2,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "data.shape"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "------------\n",
+ "## Drop na"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "SeriousDlqin2yrs 0\n",
+ "RevolvingUtilizationOfUnsecuredLines 0\n",
+ "age 4267\n",
+ "NumberOfTime30-59DaysPastDueNotWorse 0\n",
+ "DebtRatio 0\n",
+ "MonthlyIncome 0\n",
+ "NumberOfOpenCreditLinesAndLoans 0\n",
+ "NumberOfTimes90DaysLate 0\n",
+ "NumberRealEstateLoansOrLines 0\n",
+ "NumberOfTime60-89DaysPastDueNotWorse 0\n",
+ "NumberOfDependents 4267\n",
+ "dtype: int64"
+ ]
+ },
+ "execution_count": 3,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "data.isnull().sum(axis=0)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(108648, 11)"
+ ]
+ },
+ "execution_count": 4,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "data.dropna(inplace=True)\n",
+ "data.shape"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "---------\n",
+ "## Create X and y"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [],
+ "source": [
+ "y = data['SeriousDlqin2yrs']\n",
+ "X = data.drop('SeriousDlqin2yrs', axis=1)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0 1\n",
+ "1 0\n",
+ "2 0\n",
+ "3 0\n",
+ "4 0\n",
+ " ..\n",
+ "112910 0\n",
+ "112911 0\n",
+ "112912 0\n",
+ "112913 0\n",
+ "112914 0\n",
+ "Name: SeriousDlqin2yrs, Length: 108648, dtype: int64"
+ ]
+ },
+ "execution_count": 6,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "y"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0.06742876076872101"
+ ]
+ },
+ "execution_count": 7,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "y.mean()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "---\n",
+ "## 练习1:把数据切分成训练集和测试集\n",
+ "- 提示:from sklearn.model_selection import train_test_split('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "((76053, 10), (32595, 10), (76053,), (32595,))"
+ ]
+ },
+ "execution_count": 8,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "## your code here\n",
+ "from sklearn.model_selection import train_test_split\n",
+ "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, shuffle=True, random_state=0)#70%的训练集,30%的测试集\n",
+ "\n",
+ "# 查看数据的维度\n",
+ "X_train.shape, X_test.shape, y_train.shape, y_test.shape"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "----\n",
+ "## 练习2:使用logistic regression/决策树/SVM/KNN...等sklearn分类算法进行分类\n",
+ "尝试查sklearn API了解模型参数含义,调整不同的参数"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Logistic regression\n",
+ "- 提示:from sklearn import linear_model('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0.9329344991563123"
+ ]
+ },
+ "execution_count": 9,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "## your code here \n",
+ "from sklearn.linear_model import LogisticRegression\n",
+ "from sklearn.metrics import accuracy_score\n",
+ "#创建模型\n",
+ "lr= LogisticRegression(C=100, random_state=1,max_iter=1000)#C:逆正则化参数,越大,权重系数越小\n",
+ "# 训练模型\n",
+ "lr.fit(X_train, y_train) \n",
+ "#评估模型\n",
+ "lr.score(X_test, y_test)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Decision Tree\n",
+ "- 提示:from sklearn.tree import DecisionTreeClassifier('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0.935020708697653"
+ ]
+ },
+ "execution_count": 10,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "from sklearn.tree import DecisionTreeClassifier\n",
+ "#创建模型\n",
+ "tree = DecisionTreeClassifier(criterion='entropy', max_depth=3, random_state=0)\n",
+ "# 训练模型\n",
+ "tree.fit(X_train, y_train) \n",
+ "#评估模型\n",
+ "tree.score(X_test, y_test)\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Random Forest\n",
+ "- 提示:from sklearn.ensemble import RandomForestClassifier('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0.9327197422917626"
+ ]
+ },
+ "execution_count": 11,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "from sklearn.ensemble import RandomForestClassifier\n",
+ "#创建模型\n",
+ "forest = RandomForestClassifier(criterion='entropy', \n",
+ " n_estimators=10, # The number of trees in the forest.\n",
+ " random_state=1,\n",
+ " n_jobs=2)\n",
+ "# 训练模型\n",
+ "forest.fit(X_train, y_train)\n",
+ "#评估模型\n",
+ "forest.score(X_test, y_test)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### SVM\n",
+ "- 提示:from sklearn.svm import SVC('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# ## your code here\n",
+ "# from sklearn.svm import SVC\n",
+ "# #创建模型:线性模型\n",
+ "# svm1 = SVC(kernel='linear', random_state=0, C=10.0)\n",
+ "# # 训练模型\n",
+ "# svm1.fit(X_train, y_train) \n",
+ "# #评估模型\n",
+ "# svm1.score(X_test, y_test)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 13,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0.9333640128854118"
+ ]
+ },
+ "execution_count": 13,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "## your code here\n",
+ "from sklearn.svm import SVC\n",
+ "#创建模型:核支持向量机,高斯核\n",
+ "svm = SVC(kernel='rbf', random_state=0, C=1.0,gamma=0.2)\n",
+ "# 训练模型\n",
+ "svm.fit(X_train, y_train) \n",
+ "#评估模型\n",
+ "svm.score(X_test, y_test)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### KNN\n",
+ "- 提示:from sklearn.neighbors import KNeighborsClassifier('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 15,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0.9321061512501917"
+ ]
+ },
+ "execution_count": 15,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "from sklearn.neighbors import KNeighborsClassifier\n",
+ "#创建模型\n",
+ "knn = KNeighborsClassifier(n_neighbors=5, p=2, metric='minkowski')\n",
+ "# 训练模型\n",
+ "knn.fit(X_train, y_train) \n",
+ "#评估模型\n",
+ "knn.score(X_test, y_test)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "---\n",
+ "\n",
+ "## 练习3:在测试集上进行预测,计算准确度"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Logistic regression\n",
+ "- 提示:y_pred_LR = clf_LR.predict(x_test)('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 16,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0.9329344991563123"
+ ]
+ },
+ "execution_count": 16,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "\n",
+ "## your code here\n",
+ "#在测试集上预测\n",
+ "lr_pred = lr.predict(X_test)\n",
+ "#计算准确度 \n",
+ "lr.score(X_test, y_test)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Decision Tree\n",
+ "- 提示:y_pred_tree = tree.predict(x_test)('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 17,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0.935020708697653"
+ ]
+ },
+ "execution_count": 17,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "## your code here\n",
+ "#在测试集上预测\n",
+ "tree_pred = tree.predict(X_test)\n",
+ "#计算准确度 \n",
+ "tree.score(X_test, y_test)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Random Forest\n",
+ "- 提示:y_pred_forest = forest.predict(x_test)('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 18,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0.9327197422917626"
+ ]
+ },
+ "execution_count": 18,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "## your code here\n",
+ "#在测试集上预测\n",
+ "forest_pred = forest.predict(X_test)\n",
+ "#计算准确度 \n",
+ "forest.score(X_test, y_test)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### SVM\n",
+ "- 提示:y_pred_SVC = clf_svc.predict(x_test)('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 19,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0.9333640128854118"
+ ]
+ },
+ "execution_count": 19,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "## your code here\n",
+ "#在测试集上预测\n",
+ "svm_pred =svm.predict(X_test)\n",
+ "#计算准确度 \n",
+ "svm.score(X_test, y_test)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### KNN\n",
+ "- 提示:y_pred_KNN = neigh.predict(x_test)('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 34,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0.9321061512501917"
+ ]
+ },
+ "execution_count": 34,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "## your code here\n",
+ "#在测试集上预测\n",
+ "knn_pred = knn.predict(X_test)\n",
+ "#计算准确度 \n",
+ "knn.score(X_test, y_test)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "---\n",
+ "## 练习4:查看sklearn的官方说明,了解分类问题的评估标准,并对此例进行评估"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**混淆矩阵(Confusion Matrix)相关学习链接**\n",
+ "\n",
+ "- Blog: \n",
+ "http://blog.csdn.net/vesper305/article/details/44927047 \n",
+ "- WiKi: \n",
+ "http://en.wikipedia.org/wiki/Confusion_matrix \n",
+ "- sklearn doc: \n",
+ "http://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 26,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 26,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png": "",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "y_test.value_counts().plot(kind='bar')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 31,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "Text(0.5, 1.0, 'lr')"
+ ]
+ },
+ "execution_count": 31,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png": "",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "from sklearn.metrics import ConfusionMatrixDisplay, confusion_matrix\n",
+ "import matplotlib.pyplot as plt\n",
+ "cm = confusion_matrix(y_test,lr_pred,labels=lr.classes_)\n",
+ "\n",
+ "cm_display = ConfusionMatrixDisplay(cm,display_labels=lr.classes_).plot()\n",
+ "plt.title('lr')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 32,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "array([0, 1])"
+ ]
+ },
+ "execution_count": 32,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "lr.classes_"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 39,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "image/png": "",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "image/png": "",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "image/png": "",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "image/png": "iVBORw0KGgoAAAANSUhEUgAAAg0AAAHFCAYAAABxS8rQAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/bCgiHAAAACXBIWXMAAA9hAAAPYQGoP6dpAABID0lEQVR4nO3de1xUdf4/8NdwGy7CyEUYUERMQQi8hAboeikVJFGpvqt+KVZNsbJkWWG1ckvbErxfytXMWnFJo37baplGUIpliheSCiW6oeIKgoqDIA4wc35/kOfbCOkMZ3CA83p+H+fxaM55nzPvmW/tvHl/Pp9zFIIgCCAiIiK6AytLJ0BERESdA4sGIiIiMgqLBiIiIjIKiwYiIiIyCosGIiIiMgqLBiIiIjIKiwYiIiIyCosGIiIiMgqLBiIiIjIKiwai31i6dCkUCgUuXbpk6VQ6JIVCgaVLl1o6DSKyEBYNREREZBQWDUQycP36dUunQERdAIsGojv4/vvv0bdvX4SHh6OyshJjxoxBSEgIjh8/jpEjR8LR0RF9+/bF8uXLodfrxfPy8vKgUCjw7rvvYvHixfDx8YGLiwvGjRuHkpISk3LIyMiAQqFAbm4uZs2aBTc3Nzg5OWHSpEn45ZdfDGJv5vfFF19g+PDhcHR0xBNPPAEAqKmpQWpqKvz9/WFnZ4eePXsiOTkZdXV1BteoqalBYmIi3N3d0a1bN0yYMAE//PBDG79BIuoqWDQQ3cbBgwcxfPhwDBw4EAcOHICnpycAoKKiAo899hgef/xxfPTRR4iJicHzzz+Pd955p8U1XnjhBZw9exZvvfUW3nzzTfz444+YNGkSdDqdyfnMnj0bVlZW2LlzJ9avX49jx45hzJgxuHr1qkFceXk5Hn/8ccTHx2Pfvn2YN28erl+/jtGjR2P79u1ISkrCJ598gkWLFiEjIwOTJ0/GzQfeCoKAuLg4ZGZmIiUlBbt27UJERARiYmJM/wKJqGsRiEi0ZMkSAYBQVVUlZGZmCnZ2dkJSUpKg0+nEmNGjRwsAhKNHjxqcGxwcLERHR4uvDxw4IAAQHnroIYO4999/XwAgHDlyxOi8tm3bJgAQHn74YYP9X331lQBAePXVV1vk9/nnnxvEpqenC1ZWVsLx48cN9v/73/8WAAj79u0TBEEQPvnkEwGAsGHDBoO4ZcuWCQCEJUuWGJ03EXUt7DQQtWLZsmWYOXMmli9fjg0bNsDKyvA/FbVajfvvv99g38CBA3H27NkW15o8eXKLOACtxt7JY489ZvB6+PDh8PPzw4EDBwz2u7q64sEHHzTY9/HHHyMkJASDBw9GU1OTuEVHR0OhUCAvLw8AxGvd+l7x8fEm50tEXYuNpRMg6ojeeecd9OzZE9OnT2/1uLu7e4t9SqUS9fX1d4xVKpUA0GrsnajV6lb3Xb582WCft7d3i7iLFy/ip59+gq2tbavXvrnM9PLly7CxsWmRd2vvTUTywqKBqBXZ2dmYNm0aRo4cic8//xx+fn6WTglA81yK1vb169fPYJ9CoWgR5+HhAQcHB/zzn/9s9doeHh4AmoucpqYmXL582aBwaO29iUheODxB1Ao/Pz98+eWXUCqVGDlyJH788UdLpwQA2LFjh8Hrw4cP4+zZsxgzZswdz42NjcXPP/8Md3d3DB06tMXWp08fAMADDzzQ6nvt3LnTLJ+BiDovdhqIfoe3tzcOHjyI6OhojBo1Crm5uQgJCbFoTidOnMCcOXPwxz/+EWVlZVi8eDF69uyJefPm3fHc5ORkfPDBBxg1ahT+8pe/YODAgdDr9Th37hxycnKQkpKC8PBwREVFYdSoUVi4cCHq6uowdOhQfPXVV8jMzLwLn5CIOjIWDUS34eHhgf3792PixIkYPXo0Pv30U4vm8/bbbyMzMxPTp0+HVqvFAw88gA0bNsDNze2O5zo5OeHLL7/E8uXL8eabb6K0tBQODg7o3bs3xo0bJ3YarKys8NFHH2HBggVYuXIlGhoaMGLECOzbtw8DBgxo509IRB2ZQhB+XZxNRB1WRkYGZs2ahePHj2Po0KGWToeIZIpzGoiIiMgoHJ4gsiBBEO54Z0hra+u7lA0R0e1xeILIgm4OO9zOgQMHjFodQUTU3lg0EFnQ5cuXUVpaetuYwMBAODs736WMiIh+H4sGIiIiMgonQhIREZFROvVESL1ejwsXLsDZ2bnV2+YSEVHHJggCrl27Bh8fnxYPhjOnGzduoKGhQfJ17OzsYG9vb4aMOilLPV7THMrKygQA3Lhx48atk29lZWXt9ltRX18vqD2tzZKnWq0W6uvrjXrfTZs2CaGhoYKzs7Pg7OwsREREiI+gFwRB0Ov1wpIlSwRvb2/B3t5eGD16tFBUVGRwjRs3bgjPPvus4O7uLjg6OgqTJk1q8V1duXJFePzxxwUXFxfBxcVFePzxx4Xq6mqDmLNnzwqxsbGCo6Oj4O7uLsyfP1/QarUmf5edutNwc3LY2a/7wKUbR1qoa3o4INTSKRC1myY04hD2tetk34aGBlRU6nC2oA9cnNv+W1FzTQ+/sDNoaGgwqtvQq1cvLF++XHyg3Pbt2zFlyhScPHkS9957L1auXIm1a9ciIyMDAQEBePXVVzF+/HiUlJSI30dycjL27NmDrKwsuLu7IyUlBbGxsSgoKBCXY8fHx+P8+fPIzs4GAMydOxcJCQnYs2cPAECn02HixIno0aMHDh06hMuXL2PGjBkQBAGvv/66Sd9Bp54IWVNTA5VKheof+kr6F4GoI4v2GWzpFIjaTZPQiDx8CI1GAxcXl3Z5j5u/FZd/8JdcNLgHlErK1c3NDatWrcITTzwBHx8fJCcnY9GiRQAArVYLLy8vrFixAk8++SQ0Gg169OiBzMxMTJs2DQBw4cIF+Pr6Yt++fYiOjkZxcTGCg4ORn5+P8PBwAEB+fj4iIyPx/fffIzAwEJ988gliY2NRVlYGHx8fAEBWVhZmzpyJyspKkz4Lf2mJiEgWdIJe8gY0FyG/3bRa7Z3fW6dDVlYW6urqEBkZidLSUlRUVCAqKkqMUSqVGD16NA4fPgwAKCgoQGNjo0GMj48PQkJCxJgjR45ApVKJBQMAREREQKVSGcSEhISIBQMAREdHQ6vVoqCgwKTvkEUDERHJgh6C5A0AfH19oVKpxC09Pf133/O7775Dt27doFQq8dRTT2HXrl0IDg5GRUUFAMDLy8sg3svLSzxWUVEBOzs7uLq63jbG09Ozxft6enoaxNz6Pq6urrCzsxNjjNWp5zQQERHdbWVlZQYtfaVS+buxgYGBKCwsxNWrV/HBBx9gxowZOHjwoHj81pV/giDccTXgrTGtxbclxhjsNBARkSzozfB/AODi4mKw3a5osLOzQ79+/TB06FCkp6dj0KBB2LBhA9RqNQC0+Eu/srJS7Aqo1Wo0NDSgurr6tjEXL15s8b5VVVUGMbe+T3V1NRobG1t0IO6ERQMREcmCThAkb1IJggCtVgt/f3+o1Wrk5uaKxxoaGnDw4EEMHz4cABAWFgZbW1uDmPLychQVFYkxkZGR0Gg0OHbsmBhz9OhRaDQag5iioiKUl5eLMTk5OVAqlQgLCzMpfw5PEBERtYMXXngBMTEx8PX1xbVr15CVlYW8vDxkZ2dDoVAgOTkZaWlp6N+/P/r374+0tDQ4OjoiPj4eAKBSqTB79mykpKTA3d0dbm5uSE1NRWhoKMaNGwcACAoKwoQJE5CYmIgtW7YAaF5yGRsbi8DAQABAVFQUgoODkZCQgFWrVuHKlStITU1FYmKiyatAWDQQEZEs/HYyY1vPN8XFixeRkJCA8vJyqFQqDBw4ENnZ2Rg/fjwAYOHChaivr8e8efNQXV2N8PBw5OTkGNyzYt26dbCxscHUqVNRX1+PsWPHIiMjQ7xHAwDs2LEDSUlJ4iqLyZMnY+PGjeJxa2tr7N27F/PmzcOIESPg4OCA+Ph4rF692uTvgPdpIOrgeJ8G6sru5n0aSr/3hrOE34pr1/TwH1Derrl2dPylJSIiIqNweIKIiGThbg9PdEUsGoiISBakroAwx+qJzo7DE0RERGQUdhqIiEgW9L9uUs6XOxYNREQkCzoI0EmYlyDl3K6CRQMREcmCTmjepJwvd5zTQEREREZhp4GIiGSBcxqkY9FARESyoIcCOpj2KOhbz5c7Dk8QERGRUdhpICIiWdALzZuU8+WORQMREcmCTuLwhJRzuwoOTxAREZFR2GkgIiJZYKdBOhYNREQkC3pBAb0gYfWEhHO7Cg5PEBERkVHYaSAiIlng8IR0LBqIiEgWdLCCTkKDXWfGXDorFg1ERCQLgsQ5DQLnNHBOAxERERmHnQYiIpIFzmmQjkUDERHJgk6wgk6QMKeBt5Hm8AQREREZh50GIiKSBT0U0Ev4W1kPthpYNBARkSxwToN0HJ4gIiIio7DTQEREsiB9IiSHJ1g0EBGRLDTPaZDwwCoOT3B4goiIiIzDTgMREcmCXuKzJ7h6gkUDERHJBOc0SMeigYiIZEEPK96nQSLOaSAiIiKjsNNARESyoBMU0El4vLWUc7sKFg1ERCQLOokTIXUcnuDwBBERERmHnQYiIpIFvWAFvYTVE3qunmDRQERE8sDhCek4PEFERERGYaeBiIhkQQ9pKyD05kul02LRQEREsiD95k5szvMbICIiIqOw00BERLIg/dkT/DubRQMREcmCHgroIWVOA+8IyaKBiIhkgZ0G6fgNEBERkVHYaSAiIlmQfnMn/p3NooGIiGRBLyigl3KfBj7lkmUTERERGYdFAxERyYL+1+GJtm6m3twpPT0dw4YNg7OzMzw9PREXF4eSkhKDmJkzZ0KhUBhsERERBjFarRbz58+Hh4cHnJycMHnyZJw/f94gprq6GgkJCVCpVFCpVEhISMDVq1cNYs6dO4dJkybByckJHh4eSEpKQkNDg0mfiUUDERHJws2nXErZTHHw4EE888wzyM/PR25uLpqamhAVFYW6ujqDuAkTJqC8vFzc9u3bZ3A8OTkZu3btQlZWFg4dOoTa2lrExsZCp9OJMfHx8SgsLER2djays7NRWFiIhIQE8bhOp8PEiRNRV1eHQ4cOISsrCx988AFSUlJM+kyc00BERNQOsrOzDV5v27YNnp6eKCgowKhRo8T9SqUSarW61WtoNBq8/fbbyMzMxLhx4wAA77zzDnx9ffHZZ58hOjoaxcXFyM7ORn5+PsLDwwEAW7duRWRkJEpKShAYGIicnBycPn0aZWVl8PHxAQCsWbMGM2fOxLJly+Di4mLUZ2KngYiIZEEHheQNAGpqagw2rVZr1PtrNBoAgJubm8H+vLw8eHp6IiAgAImJiaisrBSPFRQUoLGxEVFRUeI+Hx8fhISE4PDhwwCAI0eOQKVSiQUDAEREREClUhnEhISEiAUDAERHR0Or1aKgoMDo75BFAxERyYK5hid8fX3FuQMqlQrp6el3fG9BELBgwQL84Q9/QEhIiLg/JiYGO3bswP79+7FmzRocP34cDz74oFiIVFRUwM7ODq6urgbX8/LyQkVFhRjj6enZ4j09PT0NYry8vAyOu7q6ws7OTowxBocniIiITFBWVmbQzlcqlXc859lnn8W3336LQ4cOGeyfNm2a+M8hISEYOnQo/Pz8sHfvXjzyyCO/ez1BEKBQ/N8S0N/+s5SYO2GngYiIZEEHqUMUzVxcXAy2OxUN8+fPx0cffYQDBw6gV69et4319vaGn58ffvzxRwCAWq1GQ0MDqqurDeIqKyvFzoFarcbFixdbXKuqqsog5taOQnV1NRobG1t0IG6HRQMREcnC3V49IQgCnn32WfznP//B/v374e/vf8dzLl++jLKyMnh7ewMAwsLCYGtri9zcXDGmvLwcRUVFGD58OAAgMjISGo0Gx44dE2OOHj0KjUZjEFNUVITy8nIxJicnB0qlEmFhYUZ/Jg5PEBGRLNztB1Y988wz2LlzJz788EM4OzuLf+mrVCo4ODigtrYWS5cuxaOPPgpvb2+cOXMGL7zwAjw8PPDwww+LsbNnz0ZKSgrc3d3h5uaG1NRUhIaGiqspgoKCMGHCBCQmJmLLli0AgLlz5yI2NhaBgYEAgKioKAQHByMhIQGrVq3ClStXkJqaisTERKNXTgDsNBAREbWLzZs3Q6PRYMyYMfD29ha39957DwBgbW2N7777DlOmTEFAQABmzJiBgIAAHDlyBM7OzuJ11q1bh7i4OEydOhUjRoyAo6Mj9uzZA2trazFmx44dCA0NRVRUFKKiojBw4EBkZmaKx62trbF3717Y29tjxIgRmDp1KuLi4rB69WqTPpNCEARB4vdiMTU1NVCpVKj+oS9cnFn/UNcU7TPY0ikQtZsmoRF5+BAajcakv3hNcfO34rkjMVB2s23zdbS1jVge+Um75trRcXiCiIhk4W4PT3RF/AaIiIjIKOw0EBGRLPDR2NKxaCAiIlm4+bRKKefLHb8BIiIiMgo7DUREJAscnpCORQMREcmCHlbQS2iwSzm3q+A3QEREREZhp4GIiGRBJyigkzDEIOXcroJFAxERyQLnNEjHooGIiGRBaMOTKm89X+74DRAREZFR2GkgIiJZ0EEBHSTMaZBwblfBooGIiGRBL0ibl6DvtM+ENh8OTxAREZFR2GnoovZsd8fef3ngYpkdAMAv8AYe+0sFhj14rd3e88u9KvxrpTfKz9rB268BM58rx4gYTauxWa97Ylu6D+LmVOHpv/+33XIias32o6eh9m1ssf+jDHf844VeeDylAmOmXEUPn0Y0Nijw03cO2LZcjZKTThbIlsxFL3EipJRzuwqLfwObNm2Cv78/7O3tERYWhi+//NLSKXUJPbwb8cQLF/D6Jz/g9U9+wKAR17B0lj/OlNi36Xo577nhr4/2+93jp084Iu2pPhj7P1ewKbcEY//nCpY92Qfff+3YIrak0AH73nGHf3B9m3IhkiopJgDTBwWL23PT+gIAvtzTHQDw31+U+MfinnjywQCkxPVDRZkd0t/9BSq3JgtmTVLpoZC8yZ1Fi4b33nsPycnJWLx4MU6ePImRI0ciJiYG586ds2RaXUJEVA3uH3sNve7Rotc9Wsx6rgL2Tnp8X9D8I97YoMBbr3gj/r5gTL4nFEkT++Obw93a/H67tvbAfaOuYfr8SvTur8X0+ZUY/Idr2LW1h0FcfZ0VVjzrh+RVZXBW6SR9RqK20lyxQXWVrbiFj6vBhVI7fHukuZNwYJcrTn7pjIpzSpz9wR5vLvWBk4uehS7JnkWLhrVr12L27NmYM2cOgoKCsH79evj6+mLz5s2WTKvL0emAvN3dob1uhaChdQCANX/xxanjTnh+81m88XkJRsZexeLH+uK/v9i16T2KC5wQNtpw6GPomGs4fcKwnbvxhV64f2wN7htV27YPQ2RmNrZ6PPhoNT7NcgNa+UvSxlaPhx6/jFqNFX457XD3EySzuXlHSCmb3FlsTkNDQwMKCgrw3HPPGeyPiorC4cOHLZRV11JabI/kSf3RoLWCg5MeL71dCr8ALS6csUPeblfsKDgFd3Vzu/WPT1fhxAEXfPqeO554vtzk96quskF3D8Mx4u4ejaiu+r9/xfJ2d8dP3zng9X0/SPtgRGY0fEINurnokPO+m8H+8HE1eH7zWSgd9Lhy0QbPT78HNVc4Dawz45wG6Sz2X8ClS5eg0+ng5eVlsN/LywsVFRWtnqPVaqHVasXXNTU17ZpjZ9frHi025ZagrsYah/Z2x+o/+2HVf37E2RJ7CIICT/whyCC+scEKLq7NRUTleVskjhkgHtPpFNA1KjClX6i478FHq/HnFefF14oWRbhC/MOt8r+22PxST6S9+zPs7LluiTqO6P+9jOMHXHDloq3B/sKvnDBvfABc3JoQ89gVLN5yFkkT+0Fz2fZ3rkTU9Vm8bFbc8ksjCEKLfTelp6fj5ZdfvhtpdQm2dgJ6+jcAAAIG1aOk0BG73+qBQSNqYWUtYGP2D7CyNvwBd3DSAwDc1Y3YlFsi7v9qX3cc2qfCoo1nxX1Oznrxn117NKG6yvB/TK9esoGrR3MR8tO3jrh6yRbPTggUj+t1CnyX74SPtnng4zPfwNraTB+cyEiePRswZGQtXpnTp8Uxbb01LpyxxoUzSnz/tRP+eagYE/73Ct7b6NXyQtQp6CHx2ROcCGm5osHDwwPW1tYtugqVlZUtug83Pf/881iwYIH4uqamBr6+vu2aZ1fT2GCFfiH10OsUuHrZBqHhda3GWdtALDgAoLtHE5T2gsG+3woKq8PXXzjjkblV4r6Cg84I/nUOxeCR17Bl//cG56z5S2/49ruBqc9UsmAgi4iafgVXL9ng6Gcud4xVKABbJbtknZkgcQWEwKLBckWDnZ0dwsLCkJubi4cffljcn5ubiylTprR6jlKphFKpvFspdmr/TPfGsAdr0MOnEfW1Vsj7sDu+PdwNr+74Gb3u0eLBR65gVVJvzF1yAf1C6qG5Yo3CQ87wD6rH/WNNv5dD3JwqpD7SH+9t9ERktAZHPlXh5JfOWLv7RwCAYzc9+gy4YXCOvaMezq66FvuJ7gaFQkDUtCv47P+5Qq/7vx8DpYMO8X+uxJGc5iELF7cmxM64DA/vRnFJJnVOfMqldBYdnliwYAESEhIwdOhQREZG4s0338S5c+fw1FNPWTKtLuFqlQ1WzffDlUobODrr4B90A6/u+Blho5tXLaSsO4ed69V482UfXK6whYurDkFhdbh/bNvmidw77Dpe2HwGGSu88a9Vanj7NeCFN85gwH3XzfmxiMxmyKhaePVqxKdZ7gb79XoFevXT4sU/noGLmw7Xqq3xwzeOSHm4H87+0Lb7nBB1FQpBECzab9u0aRNWrlyJ8vJyhISEYN26dRg1apRR59bU1EClUqH6h75wceasVuqaon0GWzoFonbTJDQiDx9Co9HAxeXOw0RtcfO34uHcWbB1atuycgBorGvArvHb2jXXjs7iEyHnzZuHefPmWToNIiLq4jg8IR3/PCciIiKjWLzTQEREdDdIfX4El1yyaCAiIpng8IR0HJ4gIiIio7DTQEREssBOg3QsGoiISBZYNEjH4QkiIiIyCjsNREQkC+w0SMeigYiIZEGAtGWTfFwZiwYiIpIJdhqk45wGIiIiMgo7DUREJAvsNEjHooGIiGSBRYN0HJ4gIiIio7DTQEREssBOg3QsGoiISBYEQQFBwg+/lHO7Cg5PEBERkVHYaSAiIlnQQyHp5k5Szu0qWDQQEZEscE6DdByeICIiIqOw00BERLLAiZDSsWggIiJZ4PCEdCwaiIhIFthpkI5zGoiIiNpBeno6hg0bBmdnZ3h6eiIuLg4lJSUGMYIgYOnSpfDx8YGDgwPGjBmDU6dOGcRotVrMnz8fHh4ecHJywuTJk3H+/HmDmOrqaiQkJEClUkGlUiEhIQFXr141iDl37hwmTZoEJycneHh4ICkpCQ0NDSZ9JhYNREQkC8KvwxNt3UztNBw8eBDPPPMM8vPzkZubi6amJkRFRaGurk6MWblyJdauXYuNGzfi+PHjUKvVGD9+PK5duybGJCcnY9euXcjKysKhQ4dQW1uL2NhY6HQ6MSY+Ph6FhYXIzs5GdnY2CgsLkZCQIB7X6XSYOHEi6urqcOjQIWRlZeGDDz5ASkqKSZ9JIQiCYNIZHUhNTQ1UKhWqf+gLF2fWP9Q1RfsMtnQKRO2mSWhEHj6ERqOBi4tLu7zHzd+KIf9eAGtHZZuvo7uuxcn/WdvmXKuqquDp6YmDBw9i1KhREAQBPj4+SE5OxqJFiwA0dxW8vLywYsUKPPnkk9BoNOjRowcyMzMxbdo0AMCFCxfg6+uLffv2ITo6GsXFxQgODkZ+fj7Cw8MBAPn5+YiMjMT333+PwMBAfPLJJ4iNjUVZWRl8fHwAAFlZWZg5cyYqKyuN/jz8pSUiIjJBTU2NwabVao06T6PRAADc3NwAAKWlpaioqEBUVJQYo1QqMXr0aBw+fBgAUFBQgMbGRoMYHx8fhISEiDFHjhyBSqUSCwYAiIiIgEqlMogJCQkRCwYAiI6OhlarRUFBgdGfnUUDERHJws07QkrZAMDX11ecO6BSqZCenn7H9xYEAQsWLMAf/vAHhISEAAAqKioAAF5eXgaxXl5e4rGKigrY2dnB1dX1tjGenp4t3tPT09Mg5tb3cXV1hZ2dnRhjDK6eICIiWTDX6omysjKDdr5Seechj2effRbffvstDh061OKYQmGYkyAILfa1zMUwprX4tsTcCTsNREREJnBxcTHY7lQ0zJ8/Hx999BEOHDiAXr16ifvVajUAtPhLv7KyUuwKqNVqNDQ0oLq6+rYxFy9ebPG+VVVVBjG3vk91dTUaGxtbdCBuh0UDERHJgpSVE225MZQgCHj22Wfxn//8B/v374e/v7/BcX9/f6jVauTm5or7GhoacPDgQQwfPhwAEBYWBltbW4OY8vJyFBUViTGRkZHQaDQ4duyYGHP06FFoNBqDmKKiIpSXl4sxOTk5UCqVCAsLM/ozcXiCiIhkQRCaNynnm+KZZ57Bzp078eGHH8LZ2Vn8S1+lUsHBwQEKhQLJyclIS0tD//790b9/f6SlpcHR0RHx8fFi7OzZs5GSkgJ3d3e4ubkhNTUVoaGhGDduHAAgKCgIEyZMQGJiIrZs2QIAmDt3LmJjYxEYGAgAiIqKQnBwMBISErBq1SpcuXIFqampSExMNGklCIsGIiKidrB582YAwJgxYwz2b9u2DTNnzgQALFy4EPX19Zg3bx6qq6sRHh6OnJwcODs7i/Hr1q2DjY0Npk6divr6eowdOxYZGRmwtrYWY3bs2IGkpCRxlcXkyZOxceNG8bi1tTX27t2LefPmYcSIEXBwcEB8fDxWr15t0mfifRqIOjjep4G6srt5n4bgrIWS79NwevrKds21o2OngYiIZIHPnpCORQMREcmCXlBAwadcSsKePhERERmFnQYiIpKFu716oiti0UBERLLQXDRImdNgxmQ6KQ5PEBERkVHYaSAiIlng6gnpWDQQEZEsCL9uUs6XOw5PEBERkVHYaSAiIlng8IR0LBqIiEgeOD4hGYsGIiKSB4mdBrDTwDkNREREZBx2GoiISBZ4R0jpWDQQEZEscCKkdByeICIiIqOw00BERPIgKKRNZmSngUUDERHJA+c0SMfhCSIiIjIKOw1ERCQPvLmTZEYVDa+99prRF0xKSmpzMkRERO2FqyekM6poWLdunVEXUygULBqIiIi6KKOKhtLS0vbOg4iIqP1xiEGSNk+EbGhoQElJCZqamsyZDxERUbu4OTwhZZM7k4uG69evY/bs2XB0dMS9996Lc+fOAWiey7B8+XKzJ0hERGQWghk2mTO5aHj++efxzTffIC8vD/b29uL+cePG4b333jNrckRERNRxmLzkcvfu3XjvvfcQEREBheL/WjXBwcH4+eefzZocERGR+Sh+3aScL28mFw1VVVXw9PRssb+urs6giCAiIupQeJ8GyUwenhg2bBj27t0rvr5ZKGzduhWRkZHmy4yIiIg6FJM7Denp6ZgwYQJOnz6NpqYmbNiwAadOncKRI0dw8ODB9siRiIhIOnYaJDO50zB8+HB89dVXuH79Ou655x7k5OTAy8sLR44cQVhYWHvkSEREJN3Np1xK2WSuTc+eCA0Nxfbt282dCxEREXVgbSoadDoddu3aheLiYigUCgQFBWHKlCmwseHzr4iIqGPio7GlM/lXvqioCFOmTEFFRQUCAwMBAD/88AN69OiBjz76CKGhoWZPkoiISDLOaZDM5DkNc+bMwb333ovz58/j66+/xtdff42ysjIMHDgQc+fObY8ciYiIqAMwudPwzTff4MSJE3B1dRX3ubq6YtmyZRg2bJhZkyMiIjIbqZMZORHS9E5DYGAgLl682GJ/ZWUl+vXrZ5akiIiIzE0hSN/kzqhOQ01NjfjPaWlpSEpKwtKlSxEREQEAyM/Px9///nesWLGifbIkIiKSinMaJDOqaOjevbvBLaIFQcDUqVPFfcKvU0onTZoEnU7XDmkSERGRpRlVNBw4cKC98yAiImpfnNMgmVFFw+jRo9s7DyIiovbF4QnJ2nw3puvXr+PcuXNoaGgw2D9w4EDJSREREVHH06ZHY8+aNQuffPJJq8c5p4GIiDokdhokM3nJZXJyMqqrq5Gfnw8HBwdkZ2dj+/bt6N+/Pz766KP2yJGIiEg6wQybzJncadi/fz8+/PBDDBs2DFZWVvDz88P48ePh4uKC9PR0TJw4sT3yJCIiIgszudNQV1cHT09PAICbmxuqqqoAND/58uuvvzZvdkRERObCR2NL1qY7QpaUlAAABg8ejC1btuC///0v3njjDXh7e5s9QSIiInPgHSGlM3l4Ijk5GeXl5QCAJUuWIDo6Gjt27ICdnR0yMjLMnR8RERF1ECYXDY899pj4z0OGDMGZM2fw/fffo3fv3vDw8DBrckRERGbD1ROStfk+DTc5OjrivvvuM0cuRERE1IEZVTQsWLDA6AuuXbu2zckQERG1FwWkzUswdRrkF198gVWrVqGgoADl5eXYtWsX4uLixOMzZ87E9u3bDc4JDw9Hfn6++Fqr1SI1NRXvvvsu6uvrMXbsWGzatAm9evUSY6qrq5GUlCTe9mDy5Ml4/fXX0b17dzHm3LlzeOaZZ7B//344ODggPj4eq1evhp2dnUmfyaii4eTJk0Zd7LcPtSIiIpKzuro6DBo0CLNmzcKjjz7aasyECROwbds28fWtP+LJycnYs2cPsrKy4O7ujpSUFMTGxqKgoADW1tYAgPj4eJw/fx7Z2dkAgLlz5yIhIQF79uwB0HzTxYkTJ6JHjx44dOgQLl++jBkzZkAQBLz++usmfaYu8cCqP44cCxsr06olos7joqUTIOoa7vIDq2JiYhATE3PbGKVSCbVa3eoxjUaDt99+G5mZmRg3bhwA4J133oGvry8+++wzREdHo7i4GNnZ2cjPz0d4eDgAYOvWrYiMjERJSQkCAwORk5OD06dPo6ysDD4+PgCANWvWYObMmVi2bBlcXFyM/kwmL7kkIiLqlMx0R8iamhqDTavVtjmlvLw8eHp6IiAgAImJiaisrBSPFRQUoLGxEVFRUeI+Hx8fhISE4PDhwwCAI0eOQKVSiQUDAEREREClUhnEhISEiAUDAERHR0Or1aKgoMCkfFk0EBERmcDX1xcqlUrc0tPT23SdmJgY7NixA/v378eaNWtw/PhxPPjgg2IRUlFRATs7O7i6uhqc5+XlhYqKCjHm5g0Xf8vT09MgxsvLy+C4q6sr7OzsxBhjSV49QURE1CmYacllWVmZQUtfqVS26XLTpk0T/zkkJARDhw6Fn58f9u7di0ceeeT30xAEgzmErc0nbEuMMdhpICIiWTDXHSFdXFwMtrYWDbfy9vaGn58ffvzxRwCAWq1GQ0MDqqurDeIqKyvFzoFarcbFiy3nPVVVVRnE3NpRqK6uRmNjY4sOxJ2waCAiIuoALl++jLKyMvGRDGFhYbC1tUVubq4YU15ejqKiIgwfPhwAEBkZCY1Gg2PHjokxR48ehUajMYgpKioS7+YMADk5OVAqlQgLCzMpxzYVDZmZmRgxYgR8fHxw9uxZAMD69evx4YcftuVyRERE7e8uPxq7trYWhYWFKCwsBACUlpaisLAQ586dQ21tLVJTU3HkyBGcOXMGeXl5mDRpEjw8PPDwww8DAFQqFWbPno2UlBR8/vnnOHnyJB5//HGEhoaKqymCgoIwYcIEJCYmIj8/H/n5+UhMTERsbCwCAwMBAFFRUQgODkZCQgJOnjyJzz//HKmpqUhMTDRp5QTQhqJh8+bNWLBgAR566CFcvXoVOp0OANC9e3esX7/e1MsRERHdHXe5aDhx4gSGDBmCIUOGAGi+UeKQIUPw0ksvwdraGt999x2mTJmCgIAAzJgxAwEBAThy5AicnZ3Fa6xbtw5xcXGYOnUqRowYAUdHR+zZs0e8RwMA7NixA6GhoYiKikJUVBQGDhyIzMxM8bi1tTX27t0Le3t7jBgxAlOnTkVcXBxWr15t2gcCoBAEwaSvITg4GGlpaYiLi4OzszO++eYb9O3bF0VFRRgzZgwuXbpkchJtVVNTA5VKhXFeibxPA3VZTRW8TwN1XU1CI/LwITQajcl/9Rrr5m9Fn1eWwcrevs3X0d+4gTMvLm7XXDs6k1dPlJaWilXTbymVStTV1ZklKSIiInOT+nhrPhq7DcMT/v7+4vjMb33yyScIDg42R05ERETmd/OOkFI2mTO50/DXv/4VzzzzDG7cuAFBEHDs2DG8++67SE9Px1tvvdUeORIREUnHR2NLZnLRMGvWLDQ1NWHhwoW4fv064uPj0bNnT2zYsAHTp09vjxyJiIioA2jTHSETExORmJiIS5cuQa/Xt3oLSyIioo6Ecxqkk3QbaQ8PD3PlQURE1L44PCGZyUWDv7//be9V/csvv0hKiIiIiDomk4uG5ORkg9eNjY04efIksrOz8de//tVceREREZmXxOEJdhraUDT8+c9/bnX/P/7xD5w4cUJyQkRERO2CwxOSme2BVTExMfjggw/MdTkiIiLqYCRNhPytf//733BzczPX5YiIiMyLnQbJTC4ahgwZYjARUhAEVFRUoKqqCps2bTJrckRERObCJZfSmVw0xMXFGby2srJCjx49MGbMGAwYMMBceREREVEHY1LR0NTUhD59+iA6Ohpqtbq9ciIiIqIOyKSJkDY2Nnj66aeh1WrbKx8iIqL2IZhhkzmTV0+Eh4fj5MmT7ZELERFRu7k5p0HKJncmz2mYN28eUlJScP78eYSFhcHJycng+MCBA82WHBEREXUcRhcNTzzxBNavX49p06YBAJKSksRjCoUCgiBAoVBAp9OZP0siIiJzYLdAEqOLhu3bt2P58uUoLS1tz3yIiIjaB+/TIJnRRYMgNH9bfn5+7ZYMERERdVwmzWm43dMtiYiIOjLe3Ek6k4qGgICAOxYOV65ckZQQERFRu+DwhGQmFQ0vv/wyVCpVe+VCREREHZhJRcP06dPh6enZXrkQERG1Gw5PSGd00cD5DERE1KlxeEIyo+8IeXP1BBEREcmT0Z0GvV7fnnkQERG1L3YaJDP5NtJERESdEec0SMeigYiI5IGdBslMfsolERERyRM7DUREJA/sNEjGooGIiGSBcxqk4/AEERERGYWdBiIikgcOT0jGooGIiGSBwxPScXiCiIiIjMJOAxERyQOHJyRj0UBERPLAokEyDk8QERGRUdhpICIiWVD8ukk5X+5YNBARkTxweEIyFg1ERCQLXHIpHec0EBERkVHYaSAiInng8IRkLBqIiEg++MMvCYcniIiIyCjsNBARkSxwIqR0LBqIiEgeOKdBMg5PEBERkVHYaSAiIlng8IR07DQQEZE8CGbYTPDFF19g0qRJ8PHxgUKhwO7duw3TEQQsXboUPj4+cHBwwJgxY3Dq1CmDGK1Wi/nz58PDwwNOTk6YPHkyzp8/bxBTXV2NhIQEqFQqqFQqJCQk4OrVqwYx586dw6RJk+Dk5AQPDw8kJSWhoaHBtA8EFg1ERETtoq6uDoMGDcLGjRtbPb5y5UqsXbsWGzduxPHjx6FWqzF+/Hhcu3ZNjElOTsauXbuQlZWFQ4cOoba2FrGxsdDpdGJMfHw8CgsLkZ2djezsbBQWFiIhIUE8rtPpMHHiRNTV1eHQoUPIysrCBx98gJSUFJM/E4cniIhIFu728ERMTAxiYmJaPSYIAtavX4/FixfjkUceAQBs374dXl5e2LlzJ5588kloNBq8/fbbyMzMxLhx4wAA77zzDnx9ffHZZ58hOjoaxcXFyM7ORn5+PsLDwwEAW7duRWRkJEpKShAYGIicnBycPn0aZWVl8PHxAQCsWbMGM2fOxLJly+Di4mL0Z2KngYiI5MFMwxM1NTUGm1arNTmV0tJSVFRUICoqStynVCoxevRoHD58GABQUFCAxsZGgxgfHx+EhISIMUeOHIFKpRILBgCIiIiASqUyiAkJCRELBgCIjo6GVqtFQUGBSXmzaCAiInkwU9Hg6+srzh9QqVRIT083OZWKigoAgJeXl8F+Ly8v8VhFRQXs7Ozg6up62xhPT88W1/f09DSIufV9XF1dYWdnJ8YYi8MTREREJigrKzNo6SuVyjZfS6FQGLwWBKHFvlvdGtNafFtijMFOAxERycLNOQ1SNgBwcXEx2NpSNKjVagBo8Zd+ZWWl2BVQq9VoaGhAdXX1bWMuXrzY4vpVVVUGMbe+T3V1NRobG1t0IO6ERQMREcnDXV5yeTv+/v5Qq9XIzc0V9zU0NODgwYMYPnw4ACAsLAy2trYGMeXl5SgqKhJjIiMjodFocOzYMTHm6NGj0Gg0BjFFRUUoLy8XY3JycqBUKhEWFmZS3hyeICIiage1tbX46aefxNelpaUoLCyEm5sbevfujeTkZKSlpaF///7o378/0tLS4OjoiPj4eACASqXC7NmzkZKSAnd3d7i5uSE1NRWhoaHiaoqgoCBMmDABiYmJ2LJlCwBg7ty5iI2NRWBgIAAgKioKwcHBSEhIwKpVq3DlyhWkpqYiMTHRpJUTAIsGIiKSCYUgQCG0vV1g6rknTpzAAw88IL5esGABAGDGjBnIyMjAwoULUV9fj3nz5qG6uhrh4eHIycmBs7OzeM66detgY2ODqVOnor6+HmPHjkVGRgasra3FmB07diApKUlcZTF58mSDe0NYW1tj7969mDdvHkaMGAEHBwfEx8dj9erVbfkOJHyDFlZTUwOVSoVxXomwsbKzdDpE7aKpouV4JVFX0SQ0Ig8fQqPRmPxXr7Fu/lYMfnwZrO3s23wdXcMNFL6zuF1z7eg4p4GIiIiMwuEJIiKSBT6wSjoWDUREJA9SV0CwaODwBBERERmHnQYiIpIFDk9Ix6KBiIjkgcMTkrFoICIiWWCnQTrOaSAiIiKjsNNARETywOEJyVg0EBGRbHCIQRoOTxAREZFR2GkgIiJ5EITmTcr5MseigYiIZIGrJ6Tj8AQREREZhZ0GIiKSB66ekIxFAxERyYJC37xJOV/uODxBRERERmGnQWb+OOsXDH+wEr361KFBa4Xib7pj22sB+O9ZJzFm+IMXMeHR8+g3oAYq10bMnx6BX35w+Z0rCnj59a8xdMRlvLJgMPLzPMUj3Zwb8eTC7xE+qgoAcPSLHnhjxQDU1dq250ckuqPYP13CxD9dhpdvAwDgbIk9dqzzwokDzf+ej4i5iocSLqP/wHqo3HR4enwAfjnlYMmUyRw4PCGZRTsNX3zxBSZNmgQfHx8oFArs3r3bkunIQmhYNfa+74uUGeH429NDYW0j4NVNBVDaN4kxSgcdigu7I+P1/ne8Xtxj5yAIilaP/TXtW/QNuIaX5t+Hl+bfh74B15Dy6ndm+yxEbVVVbot/pnljfkwA5scE4JuvumHptjPwC7gBALB31OP0cSf8M83bwpmSOd1cPSFlkzuLdhrq6uowaNAgzJo1C48++qglU5GNl54NM3i9bkkI3t2fh37BNTj1tRsA4MBeHwCAp3f9ba/l3/8a4h47g78kROCd3IMGx3z9azF0xGUs+NP9KCnqDgB47dVgrN1+DD396gw6G0R329FclcHrjBXeiP3TZQwIq8PZH+zx+QfN/y149WqwRHrUXnifBsksWjTExMQgJibGkinInpNzc4ehVmPakIHSXoeF6d/ijRVBqL6sbHF8wEANaq/ZiAUDAJR81x2112wQNOgqiwbqMKysBIycdBVKRz2KT/DfS6Lb6VRzGrRaLbRarfi6pqbGgtl0BQISF5Sg6GR3nP3Z2aQzE1NKUPxNd+Qf9Gz1uKu7Fpordi32a67YwdVd28oZRHdXnwH1WL/nJ9gp9aivs8LfZ/fBuR/tLZ0WtSPe3Em6TrV6Ij09HSqVStx8fX0tnVKn9vRz36NP/2tY+fxAk84LH1WJgcOu4M3VgbeNa7WTpwDwO3MgiO6m8z8rMW98AP4c2x8f/8sDqRvOoXf/G5ZOi9qTYIZN5jpVp+H555/HggULxNc1NTUsHNroqYXFCB9ViUVzhuFypWl/XQ28/wq8e13H+wcPGOx/YVUhTp10xfNzh6H6shLd3VuOB6tcG1DdSgeC6G5rarTChTPNQ2s/fuuIwMHXETenCq8t4v+mEP2eTlU0KJVKKJUtx8/JFAKeWvQ9Ih+oxPOJQ3HxgqPJV/j3Nn/k7OppsG/T/zuCrWsCceyLHgCA779VoZtzEwLu1eCHU82TzgJDrqKbcxOKv+ku+VMQtQdbO/4p2ZVxeEK6TlU0kHTznivG6JgKvPKXwai/biPOL6irtUGD1hoA0M2lEZ7qerj1aD7Ws891AED1ZaXBdquqCgexCCkr7YYTX7lj/ounsHFZMABg/t9O4+gXHpwESRY367lyHN/vjKoLdnDopsOYKVcxcHgt/vZYXwCAc/cm9OjZCHevRgCA7z3NwxbVlTaoruJ9Rjotrp6QzKJFQ21tLX766SfxdWlpKQoLC+Hm5obevXtbMLOua+LU8wCAFW+dMNi/bsm9+GxPc/cgYnQl/vLyKfHYc8u/BQDs2NIXO7f0M/q9Vi0eiKcWfo9X/1EAADj6hSc2Lx8gKX8ic+jeowl/ff0c3DybcP2aNUqL7fG3x/ri6y+aJwRHRNUgdX2ZGP/CG+cAAJlrvPDOGrVFcibqCBSCYLnSKS8vDw888ECL/TNmzEBGRsYdz6+pqYFKpcI4r0TYWHGcnLqmpoqLlk6BqN00CY3Iw4fQaDRwcfm9O89Kc/O3IjLm77CxbfsKmabGGzjyyUvtmmtHZ9FOw5gxY2DBmoWIiOSEt5GWrFMtuSQiIiLL4URIIiKSBa6ekI5FAxERyYNeaN6knC9zLBqIiEgeOKdBMs5pICIiIqOw00BERLKggMQ5DWbLpPNi0UBERPLAO0JKxuEJIiIiMgo7DUREJAtccikdiwYiIpIHrp6QjMMTREREZBR2GoiISBYUggCFhMmMUs7tKlg0EBGRPOh/3aScL3McniAiIiKjsNNARESywOEJ6Vg0EBGRPHD1hGQsGoiISB54R0jJOKeBiIiIjMJOAxERyQLvCCkdiwYiIpIHDk9IxuEJIiIiMgqLBiIikgWFXvpmiqVLl0KhUBhsarVaPC4IApYuXQofHx84ODhgzJgxOHXqlME1tFot5s+fDw8PDzg5OWHy5Mk4f/68QUx1dTUSEhKgUqmgUqmQkJCAq1evtvVrui0WDUREJA83hyekbCa69957UV5eLm7fffedeGzlypVYu3YtNm7ciOPHj0OtVmP8+PG4du2aGJOcnIxdu3YhKysLhw4dQm1tLWJjY6HT6cSY+Ph4FBYWIjs7G9nZ2SgsLERCQoK07+p3cE4DERFRO7GxsTHoLtwkCALWr1+PxYsX45FHHgEAbN++HV5eXti5cyeefPJJaDQavP3228jMzMS4ceMAAO+88w58fX3x2WefITo6GsXFxcjOzkZ+fj7Cw8MBAFu3bkVkZCRKSkoQGBho1s/DTgMREcmDYIYNQE1NjcGm1Wp/9y1//PFH+Pj4wN/fH9OnT8cvv/wCACgtLUVFRQWioqLEWKVSidGjR+Pw4cMAgIKCAjQ2NhrE+Pj4ICQkRIw5cuQIVCqVWDAAQEREBFQqlRhjTiwaiIhIFm7eRlrKBgC+vr7i/AGVSoX09PRW3y88PBz/+te/8Omnn2Lr1q2oqKjA8OHDcfnyZVRUVAAAvLy8DM7x8vISj1VUVMDOzg6urq63jfH09Gzx3p6enmKMOXF4goiIyARlZWVwcXERXyuVylbjYmJixH8ODQ1FZGQk7rnnHmzfvh0REREAAIVCYXCOIAgt9t3q1pjW4o25Tluw00BERPJgpomQLi4uBtvvFQ23cnJyQmhoKH788UdxnsOt3YDKykqx+6BWq9HQ0IDq6urbxly8eLHFe1VVVbXoYpgDiwYiIpIHAYBewibx3k5arRbFxcXw9vaGv78/1Go1cnNzxeMNDQ04ePAghg8fDgAICwuDra2tQUx5eTmKiorEmMjISGg0Ghw7dkyMOXr0KDQajRhjThyeICIiWbjbj8ZOTU3FpEmT0Lt3b1RWVuLVV19FTU0NZsyYAYVCgeTkZKSlpaF///7o378/0tLS4OjoiPj4eACASqXC7NmzkZKSAnd3d7i5uSE1NRWhoaHiaoqgoCBMmDABiYmJ2LJlCwBg7ty5iI2NNfvKCYBFAxERUbs4f/48/vd//xeXLl1Cjx49EBERgfz8fPj5+QEAFi5ciPr6esybNw/V1dUIDw9HTk4OnJ2dxWusW7cONjY2mDp1Kurr6zF27FhkZGTA2tpajNmxYweSkpLEVRaTJ0/Gxo0b2+UzKQSh895Mu6amBiqVCuO8EmFjZWfpdIjaRVNFy/FKoq6iSWhEHj6ERqMxmFxoTjd/Kx4c/BxsrI2bf9CaJp0W+wuXt2uuHR07DUREJA98YJVknAhJRERERmGngYiI5EEPQMqtC0x8YFVXxKKBiIhk4W6vnuiKODxBRERERmGngYiI5IETISVj0UBERPLAokEyDk8QERGRUdhpICIieWCnQTIWDUREJA9ccikZiwYiIpIFLrmUjnMaiIiIyCjsNBARkTxwToNkLBqIiEge9AKgkPDDr2fRwOEJIiIiMgo7DUREJA8cnpCMRQMREcmExKIBLBo4PEFERERGYaeBiIjkgcMTkrFoICIiedALkDTEwNUTHJ4gIiIi47DTQERE8iDomzcp58sciwYiIpIHzmmQjEUDERHJA+c0SMY5DURERGQUdhqIiEgeODwhGYsGIiKSBwESiwazZdJpcXiCiIiIjMJOAxERyQOHJyRj0UBERPKg1wOQcK8FPe/TwOEJIiIiMgo7DUREJA8cnpCMRQMREckDiwbJODxBRERERmGngYiI5IG3kZaMRQMREcmCIOghSHhSpZRzuwoWDUREJA+CIK1bwDkNnNNARERExmGngYiI5EGQOKeBnQYWDUREJBN6PaCQMC+Bcxo4PEFERETGYaeBiIjkgcMTkrFoICIiWRD0eggShie45JLDE0RERGQkdhqIiEgeODwhGYsGIiKSB70AKFg0SMHhCSIiIjIKOw1ERCQPggBAyn0a2Glg0UBERLIg6AUIEoYnBBYNLBqIiEgmBD2kdRq45JJzGoiIiMgo7DQQEZEscHhCOhYNREQkDxyekKxTFw03q74mfYOFMyFqP01Co6VTIGo3TWj+9/tu/BXfhEZJ93a6maucdeqi4dq1awCAvKrtFs6EiIikuHbtGlQqVbtc287ODmq1Gocq9km+llqthp2dnRmy6pwUQicepNHr9bhw4QKcnZ2hUCgsnY4s1NTUwNfXF2VlZXBxcbF0OkRmxX+/7z5BEHDt2jX4+PjAyqr95ubfuHEDDQ3Su9J2dnawt7c3Q0adU6fuNFhZWaFXr16WTkOWXFxc+D+q1GXx3++7q706DL9lb28v6x97c+GSSyIiIjIKiwYiIiIyCosGMolSqcSSJUugVCotnQqR2fHfb6Lb69QTIYmIiOjuYaeBiIiIjMKigYiIiIzCooGIiIiMwqKBiIiIjMKigYy2adMm+Pv7w97eHmFhYfjyyy8tnRKRWXzxxReYNGkSfHx8oFAosHv3bkunRNQhsWggo7z33ntITk7G4sWLcfLkSYwcORIxMTE4d+6cpVMjkqyurg6DBg3Cxo0bLZ0KUYfGJZdklPDwcNx3333YvHmzuC8oKAhxcXFIT0+3YGZE5qVQKLBr1y7ExcVZOhWiDoedBrqjhoYGFBQUICoqymB/VFQUDh8+bKGsiIjobmPRQHd06dIl6HQ6eHl5Gez38vJCRUWFhbIiIqK7jUUDGe3Wx48LgsBHkhMRyQiLBrojDw8PWFtbt+gqVFZWtug+EBFR18Wige7Izs4OYWFhyM3NNdifm5uL4cOHWygrIiK622wsnQB1DgsWLEBCQgKGDh2KyMhIvPnmmzh37hyeeuopS6dGJFltbS1++ukn8XVpaSkKCwvh5uaG3r17WzAzoo6FSy7JaJs2bcLKlStRXl6OkJAQrFu3DqNGjbJ0WkSS5eXl4YEHHmixf8aMGcjIyLj7CRF1UCwaiIiIyCic00BERERGYdFARERERmHRQEREREZh0UBERERGYdFARERERmHRQEREREZh0UBERERGYdFAJNHSpUsxePBg8fXMmTMRFxd31/M4c+YMFAoFCgsLfzemT58+WL9+vdHXzMjIQPfu3SXnplAosHv3bsnXISLLYtFAXdLMmTOhUCigUChga2uLvn37IjU1FXV1de3+3hs2bDD6LoLG/NATEXUUfPYEdVkTJkzAtm3b0NjYiC+//BJz5sxBXV0dNm/e3CK2sbERtra2ZnlflUpllusQEXU07DRQl6VUKqFWq+Hr64v4+Hg89thjYov85pDCP//5T/Tt2xdKpRKCIECj0WDu3Lnw9PSEi4sLHnzwQXzzzTcG112+fDm8vLzg7OyM2bNn48aNGwbHbx2e0Ov1WLFiBfr16welUonevXtj2bJlAAB/f38AwJAhQ6BQKDBmzBjxvG3btiEoKAj29vYYMGAANm3aZPA+x44dw5AhQ2Bvb4+hQ4fi5MmTJn9Ha9euRWhoKJycnODr64t58+ahtra2Rdzu3bsREBAAe3t7jB8/HmVlZQbH9+zZg7CwMNjb26Nv3754+eWX0dTUZHI+RNSxsWgg2XBwcEBjY6P4+qeffsL777+PDz74QBwemDhxIioqKrBv3z4UFBTgvvvuw9ixY3HlyhUAwPvvv48lS5Zg2bJlOHHiBLy9vVv8mN/q+eefx4oVK/Diiy/i9OnT2LlzJ7y8vAA0//ADwGeffYby8nL85z//AQBs3boVixcvxrJly1BcXIy0tDS8+OKL2L59OwCgrq4OsbGxCAwMREFBAZYuXYrU1FSTvxMrKyu89tprKCoqwvbt27F//34sXLjQIOb69etYtmwZtm/fjq+++go1NTWYPn26ePzTTz/F448/jqSkJJw+fRpbtmxBRkaGWBgRURciEHVBM2bMEKZMmSK+Pnr0qODu7i5MnTpVEARBWLJkiWBraytUVlaKMZ9//rng4uIi3Lhxw+Ba99xzj7BlyxZBEAQhMjJSeOqppwyOh4eHC4MGDWr1vWtqagSlUils3bq11TxLS0sFAMLJkycN9vv6+go7d+402PfKK68IkZGRgiAIwpYtWwQ3Nzehrq5OPL558+ZWr/Vbfn5+wrp16373+Pvvvy+4u7uLr7dt2yYAEPLz88V9xcXFAgDh6NGjgiAIwsiRI4W0tDSD62RmZgre3t7iawDCrl27fvd9iahz4JwG6rI+/vhjdOvWDU1NTWhsbMSUKVPw+uuvi8f9/PzQo0cP8XVBQQFqa2vh7u5ucJ36+nr8/PPPAIDi4mI89dRTBscjIyNx4MCBVnMoLi6GVqvF2LFjjc67qqoKZWVlmD17NhITE8X9TU1N4nyJ4uJiDBo0CI6OjgZ5mOrAgQNIS0vD6dOnUVNTg6amJty4cQN1dXVwcnICANjY2GDo0KHiOQMGDED37t1RXFyM+++/HwUFBTh+/LhBZ0Gn0+HGjRu4fv26QY5E1LmxaKAu64EHHsDmzZtha2sLHx+fFhMdb/4o3qTX6+Ht7Y28vLwW12rrskMHBweTz9Hr9QCahyjCw8MNjllbWwMABDM80f7s2bN46KGH8NRTT+GVV16Bm5sbDh06hNmzZxsM4wDNSyZvdXOfXq/Hyy+/jEceeaRFjL29veQ8iajjYNFAXZaTkxP69etndPx9992HiooK2NjYoE+fPq3GBAUFIT8/H3/605/Effn5+b97zf79+8PBwQGff/455syZ0+K4nZ0dgOa/zG/y8vJCz5498csvv+Cxxx5r9brBwcHIzMxEfX29WJjcLo/WnDhxAk1NTVizZg2srJqnN73//vst4pqamnDixAncf//9AICSkhJcvXoVAwYMAND8vZWUlJj0XRNR58SigehX48aNQ2RkJOLi4rBixQoEBgbiwoUL2LdvH+Li4jB06FD8+c9/xowZMzB06FD84Q9/wI4dO3Dq1Cn07du31Wva29tj0aJFWLhwIezs7DBixAhUVVXh1KlTmD17Njw9PeHg4IDs7Gz06tUL9vb2UKlUWLp0KZKSkuDi4oKYmBhotVqcOHEC1dXVWLBgAeLj47F48WLMnj0bf/vb33DmzBmsXr3apM97zz33oKmpCa+//jomTZqEr776Cm+88UaLOFtbW8yfPx+vvfYabG1t8eyzzyIiIkIsIl566SXExsbC19cXf/zjH2FlZYVvv/0W3333HV599VXT/x9BRB0WV08Q/UqhUGDfvn0YNWoUnnjiCQQEBGD69Ok4c+aMuNph2rRpeOmll7Bo0SKEhYXh7NmzePrpp2973RdffBEpKSl46aWXEBQUhGnTpqGyshJA83yB1157DVu2bIGPjw+mTJkCAJgzZw7eeustZGRkIDQ0FKNHj0ZGRoa4RLNbt27Ys2cPTp8+jSFDhmDx4sVYsWKFSZ938ODBWLt2LVasWIGQkBDs2LED6enpLeIcHR2xaNEixMfHIzIyEg4ODsjKyhKPR0dH4+OPP0Zubi6GDRuGiIgIrF27Fn5+fiblQ0Qdn0Iwx+AoERERdXnsNBAREZFRWDQQERGRUVg0EBERkVFYNBAREZFRWDQQERGRUVg0EBERkVFYNBAREZFRWDQQERGRUVg0EBERkVFYNBAREZFRWDQQERGRUVg0EBERkVH+PzruqjNrlSDMAAAAAElFTkSuQmCC",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "preds=[lr_pred,tree_pred,forest_pred,svm_pred,knn_pred]\n",
+ "titles=['lr_pred','tree_pred','forest_pred','svm_pred','knn_pred']\n",
+ "for i in range(len(preds)):\n",
+ "\n",
+ " cm = confusion_matrix(y_test,preds[i],labels=[0,1])#混淆矩阵\n",
+ " cm_display = ConfusionMatrixDisplay(cm,display_labels=[0,1]).plot()#画图\n",
+ " plt.title(\"%s\"%(titles[i]))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 练习5:调整模型的标准\n",
+ "\n",
+ "银行通常会有更严格的要求,因为fraud带来的后果通常比较严重,一般我们会调整模型的标准。 \n",
+ "\n",
+ "比如在logistic regression当中,一般我们的概率判定边界为0.5,但是我们可以把阈值设定低一些,来提高模型的“敏感度”,试试看把阈值设定为0.3,再看看这时的评估指标(主要是准确率和召回率)。\n",
+ "\n",
+ "- 提示:sklearn的很多分类模型,predict_prob可以拿到预估的概率,可以根据它和设定的阈值大小去判断最终结果(分类类别)('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 41,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "阈值为0.5的准确率: 0.9329344991563123\n",
+ "阈值为0.3的准确率: 0.933087896916705\n"
+ ]
+ },
+ {
+ "data": {
+ "text/plain": [
+ "Text(0.5, 1.0, 'threshold:0.3')"
+ ]
+ },
+ "execution_count": 41,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png": "",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "image/png": "",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "## your code here\n",
+ "#from sklearn.linear_model import LogisticRegression\n",
+ "# #创建模型\n",
+ "# lr= LogisticRegression(C=100, random_state=1,max_iter=1000)#C:逆正则化参数,越大,权重系数越小\n",
+ "# # 训练模型\n",
+ "# lr.fit(X_train, y_train) \n",
+ "# #在测试集上预测\n",
+ "# lr_pred = lr.predict(X_test)\n",
+ "# #评估模型\n",
+ "acc1=lr.score(X_test, y_test)\n",
+ "print('阈值为0.5的准确率:',acc1)\n",
+ " \n",
+ "# 预测样本属于正类的概率\n",
+ "y_prob = lr.decision_function(X_test)\n",
+ "# 设置阈值\n",
+ "threshold = 0.3\n",
+ "# 将概率值转换为类别\n",
+ "lr_pred2 = [1 if prob >= threshold else 0 for prob in y_prob]\n",
+ "from sklearn.metrics import accuracy_score\n",
+ "# 计算准确性\n",
+ "acc2 = accuracy_score(y_test, lr_pred2)\n",
+ "print(\"阈值为0.3的准确率:\", acc2)\n",
+ "\n",
+ "\n",
+ "# 画混淆矩阵图\n",
+ "cm = confusion_matrix(y_test,lr_pred,labels=lr.classes_)\n",
+ "cm_display = ConfusionMatrixDisplay(cm,display_labels=lr.classes_).plot()\n",
+ "plt.title('threshold:0.5')\n",
+ "\n",
+ "cm2 = confusion_matrix(y_test,lr_pred2,labels=lr.classes_)\n",
+ "cm_display2 = ConfusionMatrixDisplay(cm2,display_labels=lr.classes_).plot()\n",
+ "plt.title('threshold:0.3')\n",
+ "\n"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "tdi",
+ "language": "python",
+ "name": "tdi"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.9.16"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/2023/homework/PingShen/homework_credit_scoring_finetune_ensemble.ipynb b/2023/homework/PingShen/homework_credit_scoring_finetune_ensemble.ipynb
new file mode 100644
index 00000000..7861a5ed
--- /dev/null
+++ b/2023/homework/PingShen/homework_credit_scoring_finetune_ensemble.ipynb
@@ -0,0 +1,2352 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# 一起来打怪之 Credit Scoring 练习"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "-------\n",
+ "## >>>说明:\n",
+ "### 1. 答题步骤:\n",
+ "- 回答问题**请保留每一步**操作过程,请不要仅仅给出最后答案\n",
+ "- 请养成代码注释的好习惯\n",
+ "\n",
+ "### 2. 解题思路:\n",
+ "- 为方便大家准确理解题目,在习题实战中有所收获,本文档提供了解题思路提示\n",
+ "- 解题思路**仅供参考**,鼓励原创解题方法\n",
+ "- 为督促同学们自己思考,解题思路内容设置为**注释**,请注意查看\n",
+ "\n",
+ "### 3. 所用数据:\n",
+ "- 问题使用了多个数据库,请注意导入每个数据库后都先**查看和了解数据的基本性质**,后面的问题不再一一提醒"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "--------\n",
+ "## 操作题"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### 信用卡欺诈项目"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ " #### 前期数据导入,预览及处理(此部分勿修改,涉及的数据文件无需复制移动)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " SeriousDlqin2yrs \n",
+ " RevolvingUtilizationOfUnsecuredLines \n",
+ " age \n",
+ " NumberOfTime30-59DaysPastDueNotWorse \n",
+ " DebtRatio \n",
+ " MonthlyIncome \n",
+ " NumberOfOpenCreditLinesAndLoans \n",
+ " NumberOfTimes90DaysLate \n",
+ " NumberRealEstateLoansOrLines \n",
+ " NumberOfTime60-89DaysPastDueNotWorse \n",
+ " NumberOfDependents \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 1 \n",
+ " 0.766127 \n",
+ " 45.0 \n",
+ " 2.0 \n",
+ " 0.802982 \n",
+ " 9120.0 \n",
+ " 13.0 \n",
+ " 0.0 \n",
+ " 6.0 \n",
+ " 0.0 \n",
+ " 2.0 \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 0 \n",
+ " 0.957151 \n",
+ " 40.0 \n",
+ " 0.0 \n",
+ " 0.121876 \n",
+ " 2600.0 \n",
+ " 4.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 0 \n",
+ " 0.658180 \n",
+ " 38.0 \n",
+ " 1.0 \n",
+ " 0.085113 \n",
+ " 3042.0 \n",
+ " 2.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 0 \n",
+ " 0.233810 \n",
+ " 30.0 \n",
+ " 0.0 \n",
+ " 0.036050 \n",
+ " 3300.0 \n",
+ " 5.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " 0 \n",
+ " 0.907239 \n",
+ " 49.0 \n",
+ " 1.0 \n",
+ " 0.024926 \n",
+ " 63588.0 \n",
+ " 7.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " SeriousDlqin2yrs RevolvingUtilizationOfUnsecuredLines age \\\n",
+ "0 1 0.766127 45.0 \n",
+ "1 0 0.957151 40.0 \n",
+ "2 0 0.658180 38.0 \n",
+ "3 0 0.233810 30.0 \n",
+ "4 0 0.907239 49.0 \n",
+ "\n",
+ " NumberOfTime30-59DaysPastDueNotWorse DebtRatio MonthlyIncome \\\n",
+ "0 2.0 0.802982 9120.0 \n",
+ "1 0.0 0.121876 2600.0 \n",
+ "2 1.0 0.085113 3042.0 \n",
+ "3 0.0 0.036050 3300.0 \n",
+ "4 1.0 0.024926 63588.0 \n",
+ "\n",
+ " NumberOfOpenCreditLinesAndLoans NumberOfTimes90DaysLate \\\n",
+ "0 13.0 0.0 \n",
+ "1 4.0 0.0 \n",
+ "2 2.0 1.0 \n",
+ "3 5.0 0.0 \n",
+ "4 7.0 0.0 \n",
+ "\n",
+ " NumberRealEstateLoansOrLines NumberOfTime60-89DaysPastDueNotWorse \\\n",
+ "0 6.0 0.0 \n",
+ "1 0.0 0.0 \n",
+ "2 0.0 0.0 \n",
+ "3 0.0 0.0 \n",
+ "4 1.0 0.0 \n",
+ "\n",
+ " NumberOfDependents \n",
+ "0 2.0 \n",
+ "1 1.0 \n",
+ "2 0.0 \n",
+ "3 0.0 \n",
+ "4 0.0 "
+ ]
+ },
+ "execution_count": 1,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "import pandas as pd\n",
+ "import seaborn as sns\n",
+ "import numpy as np\n",
+ "import matplotlib.pyplot as plt\n",
+ "pd.set_option('display.max_columns', 500)\n",
+ "import zipfile\n",
+ "with zipfile.ZipFile('KaggleCredit2.csv.zip', 'r') as z:\n",
+ " f = z.open('KaggleCredit2.csv')\n",
+ " data = pd.read_csv(f, index_col=0)\n",
+ "data.head()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(112915, 11)"
+ ]
+ },
+ "execution_count": 2,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# 检查数据维度\n",
+ "data.shape"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "SeriousDlqin2yrs 0\n",
+ "RevolvingUtilizationOfUnsecuredLines 0\n",
+ "age 4267\n",
+ "NumberOfTime30-59DaysPastDueNotWorse 0\n",
+ "DebtRatio 0\n",
+ "MonthlyIncome 0\n",
+ "NumberOfOpenCreditLinesAndLoans 0\n",
+ "NumberOfTimes90DaysLate 0\n",
+ "NumberRealEstateLoansOrLines 0\n",
+ "NumberOfTime60-89DaysPastDueNotWorse 0\n",
+ "NumberOfDependents 4267\n",
+ "dtype: int64"
+ ]
+ },
+ "execution_count": 3,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# 查看数据缺失值情况\n",
+ "data.isnull().sum(axis=0)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "/tmp/ipykernel_814457/2980780030.py:3: UserWarning: Pandas doesn't allow columns to be created via a new attribute name - see https://pandas.pydata.org/pandas-docs/stable/indexing.html#attribute-access\n",
+ " data.shapey = data['SeriousDlqin2yrs']\n"
+ ]
+ }
+ ],
+ "source": [
+ "# 清除缺失值\n",
+ "data.dropna(inplace=True)\n",
+ "data.shapey = data['SeriousDlqin2yrs']\n",
+ "X = data.drop('SeriousDlqin2yrs', axis=1)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0.06742876076872101"
+ ]
+ },
+ "execution_count": 5,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# 取出对应的X和y\n",
+ "y = data['SeriousDlqin2yrs']\n",
+ "X = data.drop('SeriousDlqin2yrs', axis=1)\n",
+ "# 查看平均的欺诈率\n",
+ "y.mean()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " RevolvingUtilizationOfUnsecuredLines \n",
+ " age \n",
+ " NumberOfTime30-59DaysPastDueNotWorse \n",
+ " DebtRatio \n",
+ " MonthlyIncome \n",
+ " NumberOfOpenCreditLinesAndLoans \n",
+ " NumberOfTimes90DaysLate \n",
+ " NumberRealEstateLoansOrLines \n",
+ " NumberOfTime60-89DaysPastDueNotWorse \n",
+ " NumberOfDependents \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 0.766127 \n",
+ " 45.0 \n",
+ " 2.0 \n",
+ " 0.802982 \n",
+ " 9120.0 \n",
+ " 13.0 \n",
+ " 0.0 \n",
+ " 6.0 \n",
+ " 0.0 \n",
+ " 2.0 \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 0.957151 \n",
+ " 40.0 \n",
+ " 0.0 \n",
+ " 0.121876 \n",
+ " 2600.0 \n",
+ " 4.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 0.658180 \n",
+ " 38.0 \n",
+ " 1.0 \n",
+ " 0.085113 \n",
+ " 3042.0 \n",
+ " 2.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 0.233810 \n",
+ " 30.0 \n",
+ " 0.0 \n",
+ " 0.036050 \n",
+ " 3300.0 \n",
+ " 5.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " 0.907239 \n",
+ " 49.0 \n",
+ " 1.0 \n",
+ " 0.024926 \n",
+ " 63588.0 \n",
+ " 7.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " \n",
+ " \n",
+ " 112910 \n",
+ " 0.385742 \n",
+ " 50.0 \n",
+ " 0.0 \n",
+ " 0.404293 \n",
+ " 3400.0 \n",
+ " 7.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 112911 \n",
+ " 0.040674 \n",
+ " 74.0 \n",
+ " 0.0 \n",
+ " 0.225131 \n",
+ " 2100.0 \n",
+ " 4.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 112912 \n",
+ " 0.299745 \n",
+ " 44.0 \n",
+ " 0.0 \n",
+ " 0.716562 \n",
+ " 5584.0 \n",
+ " 4.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 2.0 \n",
+ " \n",
+ " \n",
+ " 112913 \n",
+ " 0.000000 \n",
+ " 30.0 \n",
+ " 0.0 \n",
+ " 0.000000 \n",
+ " 5716.0 \n",
+ " 4.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 112914 \n",
+ " 0.850283 \n",
+ " 64.0 \n",
+ " 0.0 \n",
+ " 0.249908 \n",
+ " 8158.0 \n",
+ " 8.0 \n",
+ " 0.0 \n",
+ " 2.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
108648 rows × 10 columns
\n",
+ "
"
+ ],
+ "text/plain": [
+ " RevolvingUtilizationOfUnsecuredLines age \\\n",
+ "0 0.766127 45.0 \n",
+ "1 0.957151 40.0 \n",
+ "2 0.658180 38.0 \n",
+ "3 0.233810 30.0 \n",
+ "4 0.907239 49.0 \n",
+ "... ... ... \n",
+ "112910 0.385742 50.0 \n",
+ "112911 0.040674 74.0 \n",
+ "112912 0.299745 44.0 \n",
+ "112913 0.000000 30.0 \n",
+ "112914 0.850283 64.0 \n",
+ "\n",
+ " NumberOfTime30-59DaysPastDueNotWorse DebtRatio MonthlyIncome \\\n",
+ "0 2.0 0.802982 9120.0 \n",
+ "1 0.0 0.121876 2600.0 \n",
+ "2 1.0 0.085113 3042.0 \n",
+ "3 0.0 0.036050 3300.0 \n",
+ "4 1.0 0.024926 63588.0 \n",
+ "... ... ... ... \n",
+ "112910 0.0 0.404293 3400.0 \n",
+ "112911 0.0 0.225131 2100.0 \n",
+ "112912 0.0 0.716562 5584.0 \n",
+ "112913 0.0 0.000000 5716.0 \n",
+ "112914 0.0 0.249908 8158.0 \n",
+ "\n",
+ " NumberOfOpenCreditLinesAndLoans NumberOfTimes90DaysLate \\\n",
+ "0 13.0 0.0 \n",
+ "1 4.0 0.0 \n",
+ "2 2.0 1.0 \n",
+ "3 5.0 0.0 \n",
+ "4 7.0 0.0 \n",
+ "... ... ... \n",
+ "112910 7.0 0.0 \n",
+ "112911 4.0 0.0 \n",
+ "112912 4.0 0.0 \n",
+ "112913 4.0 0.0 \n",
+ "112914 8.0 0.0 \n",
+ "\n",
+ " NumberRealEstateLoansOrLines NumberOfTime60-89DaysPastDueNotWorse \\\n",
+ "0 6.0 0.0 \n",
+ "1 0.0 0.0 \n",
+ "2 0.0 0.0 \n",
+ "3 0.0 0.0 \n",
+ "4 1.0 0.0 \n",
+ "... ... ... \n",
+ "112910 0.0 0.0 \n",
+ "112911 1.0 0.0 \n",
+ "112912 1.0 0.0 \n",
+ "112913 0.0 0.0 \n",
+ "112914 2.0 0.0 \n",
+ "\n",
+ " NumberOfDependents \n",
+ "0 2.0 \n",
+ "1 1.0 \n",
+ "2 0.0 \n",
+ "3 0.0 \n",
+ "4 0.0 \n",
+ "... ... \n",
+ "112910 0.0 \n",
+ "112911 0.0 \n",
+ "112912 2.0 \n",
+ "112913 0.0 \n",
+ "112914 0.0 \n",
+ "\n",
+ "[108648 rows x 10 columns]"
+ ]
+ },
+ "execution_count": 6,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "X "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0 1\n",
+ "1 0\n",
+ "2 0\n",
+ "3 0\n",
+ "4 0\n",
+ " ..\n",
+ "112910 0\n",
+ "112911 0\n",
+ "112912 0\n",
+ "112913 0\n",
+ "112914 0\n",
+ "Name: SeriousDlqin2yrs, Length: 108648, dtype: int64"
+ ]
+ },
+ "execution_count": 7,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "y"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### 以下为操作题"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### 1.把数据切分成训练集和测试集"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "((76053, 10), (32595, 10), (76053,), (32595,))"
+ ]
+ },
+ "execution_count": 8,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "from sklearn.model_selection import train_test_split\n",
+ "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, shuffle=True, random_state=0)#70%的训练集,30%的测试集,random_state 随机切分的标记\n",
+ "\n",
+ "# 查看数据的维度\n",
+ "X_train.shape, X_test.shape, y_train.shape, y_test.shape"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "SeriousDlqin2yrs\n",
+ "0 101322\n",
+ "1 7326\n",
+ "Name: count, dtype: int64\n"
+ ]
+ },
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 9,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png": "",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "# 通过SeriousDlqin2yrs字段查看正负样本分布情况\n",
+ "# 提示:value_counts\n",
+ "print(y.value_counts())\n",
+ "\n",
+ "# 绘制两种类别的柱状图\n",
+ "# 提示:dataframe可以直接plot(kind='bar')\n",
+ "y.value_counts().plot(kind='bar')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### 2.数据预处理之离散化"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# 请对年龄按照3岁一个区间进行离散化\n",
+ "# 提示:可以先计算出分桶边界,再基于pandas的cut函数进行离散化(分箱、分桶)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "count 76053.000000\n",
+ "mean 51.343129\n",
+ "std 14.437048\n",
+ "min 0.000000\n",
+ "25% 41.000000\n",
+ "50% 51.000000\n",
+ "75% 62.000000\n",
+ "max 103.000000\n",
+ "Name: age, dtype: float64"
+ ]
+ },
+ "execution_count": 11,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "#查看分桶边界\n",
+ "X_train['age'].describe()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "array([ 0, 3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36,\n",
+ " 39, 42, 45, 48, 51, 54, 57, 60, 63, 66, 69, 72, 75,\n",
+ " 78, 81, 84, 87, 90, 93, 96, 99, 102, 105])"
+ ]
+ },
+ "execution_count": 12,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "bins=np.arange(0,106,3)#分箱\n",
+ "bins"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 13,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [],
+ "source": [
+ "X_train.loc[:,'age_bins']=pd.cut(X_train['age'],bins) #区间离散化\n",
+ "X_test.loc[:,'age_bins']=pd.cut(X_test['age'],bins) #区间离散化"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 14,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "((76053, 11), (32595, 11), (76053,), (32595,))"
+ ]
+ },
+ "execution_count": 14,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# 查看新数据的维度\n",
+ "X_train.shape, X_test.shape, y_train.shape, y_test.shape"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 15,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ " age age_bins\n",
+ "64329 27.0 (24, 27]\n",
+ "70087 83.0 (81, 84]\n",
+ "77642 72.0 (69, 72]\n",
+ "6017 56.0 (54, 57]\n",
+ "106521 83.0 (81, 84]\n",
+ "... ... ...\n",
+ "22092 48.0 (45, 48]\n",
+ "47726 28.0 (27, 30]\n",
+ "44326 67.0 (66, 69]\n",
+ "45320 29.0 (27, 30]\n",
+ "70966 58.0 (57, 60]\n",
+ "\n",
+ "[76053 rows x 2 columns]\n"
+ ]
+ }
+ ],
+ "source": [
+ "print(X_train[['age','age_bins']])#查看区间离散化情况\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### 3.数据预处理之独热向量编码"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 16,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " RevolvingUtilizationOfUnsecuredLines \n",
+ " age \n",
+ " NumberOfTime30-59DaysPastDueNotWorse \n",
+ " DebtRatio \n",
+ " MonthlyIncome \n",
+ " NumberOfOpenCreditLinesAndLoans \n",
+ " NumberOfTimes90DaysLate \n",
+ " NumberRealEstateLoansOrLines \n",
+ " NumberOfTime60-89DaysPastDueNotWorse \n",
+ " NumberOfDependents \n",
+ " age_bins_(0, 3] \n",
+ " age_bins_(3, 6] \n",
+ " age_bins_(6, 9] \n",
+ " age_bins_(9, 12] \n",
+ " age_bins_(12, 15] \n",
+ " age_bins_(15, 18] \n",
+ " age_bins_(18, 21] \n",
+ " age_bins_(21, 24] \n",
+ " age_bins_(24, 27] \n",
+ " age_bins_(27, 30] \n",
+ " age_bins_(30, 33] \n",
+ " age_bins_(33, 36] \n",
+ " age_bins_(36, 39] \n",
+ " age_bins_(39, 42] \n",
+ " age_bins_(42, 45] \n",
+ " age_bins_(45, 48] \n",
+ " age_bins_(48, 51] \n",
+ " age_bins_(51, 54] \n",
+ " age_bins_(54, 57] \n",
+ " age_bins_(57, 60] \n",
+ " age_bins_(60, 63] \n",
+ " age_bins_(63, 66] \n",
+ " age_bins_(66, 69] \n",
+ " age_bins_(69, 72] \n",
+ " age_bins_(72, 75] \n",
+ " age_bins_(75, 78] \n",
+ " age_bins_(78, 81] \n",
+ " age_bins_(81, 84] \n",
+ " age_bins_(84, 87] \n",
+ " age_bins_(87, 90] \n",
+ " age_bins_(90, 93] \n",
+ " age_bins_(93, 96] \n",
+ " age_bins_(96, 99] \n",
+ " age_bins_(99, 102] \n",
+ " age_bins_(102, 105] \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 64329 \n",
+ " 0.116959 \n",
+ " 27.0 \n",
+ " 0.0 \n",
+ " 0.326024 \n",
+ " 3100.0 \n",
+ " 7.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " True \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " \n",
+ " \n",
+ " 70087 \n",
+ " 0.050666 \n",
+ " 83.0 \n",
+ " 0.0 \n",
+ " 0.177182 \n",
+ " 10000.0 \n",
+ " 9.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " True \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " \n",
+ " \n",
+ " 77642 \n",
+ " 0.038865 \n",
+ " 72.0 \n",
+ " 0.0 \n",
+ " 0.016600 \n",
+ " 5722.0 \n",
+ " 7.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " True \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " \n",
+ " \n",
+ " 6017 \n",
+ " 0.023513 \n",
+ " 56.0 \n",
+ " 0.0 \n",
+ " 0.266260 \n",
+ " 10500.0 \n",
+ " 8.0 \n",
+ " 0.0 \n",
+ " 2.0 \n",
+ " 0.0 \n",
+ " 3.0 \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " True \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " \n",
+ " \n",
+ " 106521 \n",
+ " 0.039350 \n",
+ " 83.0 \n",
+ " 0.0 \n",
+ " 0.195179 \n",
+ " 1700.0 \n",
+ " 3.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " True \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " \n",
+ " \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " \n",
+ " \n",
+ " 22092 \n",
+ " 0.576336 \n",
+ " 48.0 \n",
+ " 1.0 \n",
+ " 0.303042 \n",
+ " 10750.0 \n",
+ " 17.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 3.0 \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " True \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " \n",
+ " \n",
+ " 47726 \n",
+ " 1.000000 \n",
+ " 28.0 \n",
+ " 0.0 \n",
+ " 0.068123 \n",
+ " 2700.0 \n",
+ " 2.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " True \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " \n",
+ " \n",
+ " 44326 \n",
+ " 0.275989 \n",
+ " 67.0 \n",
+ " 0.0 \n",
+ " 0.267683 \n",
+ " 4000.0 \n",
+ " 4.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " True \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " \n",
+ " \n",
+ " 45320 \n",
+ " 0.979204 \n",
+ " 29.0 \n",
+ " 1.0 \n",
+ " 0.098725 \n",
+ " 4000.0 \n",
+ " 5.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 3.0 \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " True \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " \n",
+ " \n",
+ " 70966 \n",
+ " 0.102984 \n",
+ " 58.0 \n",
+ " 2.0 \n",
+ " 0.191410 \n",
+ " 10500.0 \n",
+ " 17.0 \n",
+ " 0.0 \n",
+ " 2.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " True \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
76053 rows × 45 columns
\n",
+ "
"
+ ],
+ "text/plain": [
+ " RevolvingUtilizationOfUnsecuredLines age \\\n",
+ "64329 0.116959 27.0 \n",
+ "70087 0.050666 83.0 \n",
+ "77642 0.038865 72.0 \n",
+ "6017 0.023513 56.0 \n",
+ "106521 0.039350 83.0 \n",
+ "... ... ... \n",
+ "22092 0.576336 48.0 \n",
+ "47726 1.000000 28.0 \n",
+ "44326 0.275989 67.0 \n",
+ "45320 0.979204 29.0 \n",
+ "70966 0.102984 58.0 \n",
+ "\n",
+ " NumberOfTime30-59DaysPastDueNotWorse DebtRatio MonthlyIncome \\\n",
+ "64329 0.0 0.326024 3100.0 \n",
+ "70087 0.0 0.177182 10000.0 \n",
+ "77642 0.0 0.016600 5722.0 \n",
+ "6017 0.0 0.266260 10500.0 \n",
+ "106521 0.0 0.195179 1700.0 \n",
+ "... ... ... ... \n",
+ "22092 1.0 0.303042 10750.0 \n",
+ "47726 0.0 0.068123 2700.0 \n",
+ "44326 0.0 0.267683 4000.0 \n",
+ "45320 1.0 0.098725 4000.0 \n",
+ "70966 2.0 0.191410 10500.0 \n",
+ "\n",
+ " NumberOfOpenCreditLinesAndLoans NumberOfTimes90DaysLate \\\n",
+ "64329 7.0 0.0 \n",
+ "70087 9.0 0.0 \n",
+ "77642 7.0 0.0 \n",
+ "6017 8.0 0.0 \n",
+ "106521 3.0 0.0 \n",
+ "... ... ... \n",
+ "22092 17.0 0.0 \n",
+ "47726 2.0 0.0 \n",
+ "44326 4.0 0.0 \n",
+ "45320 5.0 0.0 \n",
+ "70966 17.0 0.0 \n",
+ "\n",
+ " NumberRealEstateLoansOrLines NumberOfTime60-89DaysPastDueNotWorse \\\n",
+ "64329 0.0 0.0 \n",
+ "70087 1.0 0.0 \n",
+ "77642 0.0 0.0 \n",
+ "6017 2.0 0.0 \n",
+ "106521 0.0 0.0 \n",
+ "... ... ... \n",
+ "22092 1.0 0.0 \n",
+ "47726 0.0 0.0 \n",
+ "44326 1.0 0.0 \n",
+ "45320 0.0 0.0 \n",
+ "70966 2.0 0.0 \n",
+ "\n",
+ " NumberOfDependents age_bins_(0, 3] age_bins_(3, 6] age_bins_(6, 9] \\\n",
+ "64329 1.0 False False False \n",
+ "70087 0.0 False False False \n",
+ "77642 0.0 False False False \n",
+ "6017 3.0 False False False \n",
+ "106521 0.0 False False False \n",
+ "... ... ... ... ... \n",
+ "22092 3.0 False False False \n",
+ "47726 0.0 False False False \n",
+ "44326 0.0 False False False \n",
+ "45320 3.0 False False False \n",
+ "70966 0.0 False False False \n",
+ "\n",
+ " age_bins_(9, 12] age_bins_(12, 15] age_bins_(15, 18] \\\n",
+ "64329 False False False \n",
+ "70087 False False False \n",
+ "77642 False False False \n",
+ "6017 False False False \n",
+ "106521 False False False \n",
+ "... ... ... ... \n",
+ "22092 False False False \n",
+ "47726 False False False \n",
+ "44326 False False False \n",
+ "45320 False False False \n",
+ "70966 False False False \n",
+ "\n",
+ " age_bins_(18, 21] age_bins_(21, 24] age_bins_(24, 27] \\\n",
+ "64329 False False True \n",
+ "70087 False False False \n",
+ "77642 False False False \n",
+ "6017 False False False \n",
+ "106521 False False False \n",
+ "... ... ... ... \n",
+ "22092 False False False \n",
+ "47726 False False False \n",
+ "44326 False False False \n",
+ "45320 False False False \n",
+ "70966 False False False \n",
+ "\n",
+ " age_bins_(27, 30] age_bins_(30, 33] age_bins_(33, 36] \\\n",
+ "64329 False False False \n",
+ "70087 False False False \n",
+ "77642 False False False \n",
+ "6017 False False False \n",
+ "106521 False False False \n",
+ "... ... ... ... \n",
+ "22092 False False False \n",
+ "47726 True False False \n",
+ "44326 False False False \n",
+ "45320 True False False \n",
+ "70966 False False False \n",
+ "\n",
+ " age_bins_(36, 39] age_bins_(39, 42] age_bins_(42, 45] \\\n",
+ "64329 False False False \n",
+ "70087 False False False \n",
+ "77642 False False False \n",
+ "6017 False False False \n",
+ "106521 False False False \n",
+ "... ... ... ... \n",
+ "22092 False False False \n",
+ "47726 False False False \n",
+ "44326 False False False \n",
+ "45320 False False False \n",
+ "70966 False False False \n",
+ "\n",
+ " age_bins_(45, 48] age_bins_(48, 51] age_bins_(51, 54] \\\n",
+ "64329 False False False \n",
+ "70087 False False False \n",
+ "77642 False False False \n",
+ "6017 False False False \n",
+ "106521 False False False \n",
+ "... ... ... ... \n",
+ "22092 True False False \n",
+ "47726 False False False \n",
+ "44326 False False False \n",
+ "45320 False False False \n",
+ "70966 False False False \n",
+ "\n",
+ " age_bins_(54, 57] age_bins_(57, 60] age_bins_(60, 63] \\\n",
+ "64329 False False False \n",
+ "70087 False False False \n",
+ "77642 False False False \n",
+ "6017 True False False \n",
+ "106521 False False False \n",
+ "... ... ... ... \n",
+ "22092 False False False \n",
+ "47726 False False False \n",
+ "44326 False False False \n",
+ "45320 False False False \n",
+ "70966 False True False \n",
+ "\n",
+ " age_bins_(63, 66] age_bins_(66, 69] age_bins_(69, 72] \\\n",
+ "64329 False False False \n",
+ "70087 False False False \n",
+ "77642 False False True \n",
+ "6017 False False False \n",
+ "106521 False False False \n",
+ "... ... ... ... \n",
+ "22092 False False False \n",
+ "47726 False False False \n",
+ "44326 False True False \n",
+ "45320 False False False \n",
+ "70966 False False False \n",
+ "\n",
+ " age_bins_(72, 75] age_bins_(75, 78] age_bins_(78, 81] \\\n",
+ "64329 False False False \n",
+ "70087 False False False \n",
+ "77642 False False False \n",
+ "6017 False False False \n",
+ "106521 False False False \n",
+ "... ... ... ... \n",
+ "22092 False False False \n",
+ "47726 False False False \n",
+ "44326 False False False \n",
+ "45320 False False False \n",
+ "70966 False False False \n",
+ "\n",
+ " age_bins_(81, 84] age_bins_(84, 87] age_bins_(87, 90] \\\n",
+ "64329 False False False \n",
+ "70087 True False False \n",
+ "77642 False False False \n",
+ "6017 False False False \n",
+ "106521 True False False \n",
+ "... ... ... ... \n",
+ "22092 False False False \n",
+ "47726 False False False \n",
+ "44326 False False False \n",
+ "45320 False False False \n",
+ "70966 False False False \n",
+ "\n",
+ " age_bins_(90, 93] age_bins_(93, 96] age_bins_(96, 99] \\\n",
+ "64329 False False False \n",
+ "70087 False False False \n",
+ "77642 False False False \n",
+ "6017 False False False \n",
+ "106521 False False False \n",
+ "... ... ... ... \n",
+ "22092 False False False \n",
+ "47726 False False False \n",
+ "44326 False False False \n",
+ "45320 False False False \n",
+ "70966 False False False \n",
+ "\n",
+ " age_bins_(99, 102] age_bins_(102, 105] \n",
+ "64329 False False \n",
+ "70087 False False \n",
+ "77642 False False \n",
+ "6017 False False \n",
+ "106521 False False \n",
+ "... ... ... \n",
+ "22092 False False \n",
+ "47726 False False \n",
+ "44326 False False \n",
+ "45320 False False \n",
+ "70966 False False \n",
+ "\n",
+ "[76053 rows x 45 columns]"
+ ]
+ },
+ "execution_count": 16,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# 请对上述分箱后的年龄段进行独热向量编码\n",
+ "# 提示:使用pandas的get_dummies完成\n",
+ "\n",
+ "\n",
+ "#pandas.get_dummies(data, prefix=None, prefix_sep='_', dummy_na=False, columns=None, sparse=False, drop_first=False)\n",
+ "#对分箱后的年龄进行独热向量编码,即将不同的年龄分区值变成不同的列名\n",
+ "\n",
+ "age=pd.get_dummies(X_train,columns=['age_bins'],prefix_sep='_')\n",
+ "age\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### 4.数据预处理之幅度缩放"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 17,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "((76053, 10), (32595, 10))"
+ ]
+ },
+ "execution_count": 17,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# 请对连续值特征进行幅度缩放\n",
+ "# 提示:可以使用StandardScaler等幅度缩放器进行处理\n",
+ "from sklearn.preprocessing import StandardScaler\n",
+ "std=StandardScaler()\n",
+ "\n",
+ "\n",
+ "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, shuffle=True, random_state=0)#70%的训练集,30%的测试集,random_state 随机切分的标记\n",
+ "\n",
+ "# 查看数据的维度\n",
+ "X_train.shape, X_test.shape, y_train.shape, y_test.shape\n",
+ "\n",
+ "#标准化\n",
+ "X_train_std=std.fit_transform(X_train)\n",
+ "X_test_std=std.fit_transform(X_test)\n",
+ "## 查看数据维度\n",
+ "X_train_std.shape, X_test_std.shape"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 18,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "-4.295787804590142e-17\n",
+ "1.0\n",
+ "-3.0387991214897363e-17\n",
+ "0.9999999999999999\n"
+ ]
+ }
+ ],
+ "source": [
+ "print(X_train_std.mean()) # 瞅一眼训练集的平均值\n",
+ "print(X_train_std.std()) # 瞅一眼训练集的方差\n",
+ "print(X_test_std.mean()) # 瞅一眼测试集的平均值\n",
+ "print(X_test_std.std()) # 瞅一眼测试集的方差"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### 5.使用logistic regression建模,并且输出一下系数,分析重要度。 "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 61,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "测试集预测不对的样本数: 2156\n",
+ "系数为 [[-0.01428302 -0.36429906 1.72892458 0.31210449 -0.11519959 -0.09188134\n",
+ " 1.68983565 -0.1964285 -3.24882741 0.11639198]]\n",
+ "训练集的准确率:0.933\n",
+ "测试集的准确率:0.934\n",
+ "截距: -2.859\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "/tmp/ipykernel_570228/4124148766.py:28: DeprecationWarning: Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.)\n",
+ " print('截距: %.3f'%lr_intercept)\n"
+ ]
+ }
+ ],
+ "source": [
+ "# 提示:fit建模,建完模之后可以取出coef属性\n",
+ "from sklearn.linear_model import LogisticRegression\n",
+ "\n",
+ "# 建立模型\n",
+ "lr = LogisticRegression(C=100.0**40, random_state=0, penalty='l2') \n",
+ "# 训练模型\n",
+ "lr.fit(X_train_std, y_train)\n",
+ "#预测\n",
+ "y_train_pred = lr.predict(np.array(X_train_std)) # 训练集上的预测\n",
+ "y_test_pred = lr.predict(np.array(X_test_std)) # 测试集上的预测\n",
+ "\n",
+ "# 测试集预测不对的样本数\n",
+ "print('测试集预测不对的样本数: %d' % (y_test != y_test_pred).sum()) \n",
+ "\n",
+ "#系数\n",
+ "lr_Coef =lr.coef_\n",
+ "print(\"系数为\",lr_Coef)\n",
+ "\n",
+ "from sklearn.metrics import accuracy_score\n",
+ "#准确率\n",
+ "train_acc=accuracy_score(y_train, y_train_pred)\n",
+ "test_acc=accuracy_score(y_test, y_test_pred)\n",
+ "print(\"训练集的准确率:%.3f\"%train_acc)\n",
+ "print(\"测试集的准确率:%.3f\"%test_acc)\n",
+ "\n",
+ "# 截距\n",
+ "lr_intercept=lr.intercept_\n",
+ "print('截距: %.3f'%lr_intercept)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### 6.使用网格搜索交叉验证进行调参\n",
+ "调整penalty和C参数,其中penalty候选为\"l1\"和\"l2\",C的候选为[1,10,100,500]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 66,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [],
+ "source": [
+ "# 提示:先按照上面要求准备好网格字典,再使用GridSearchCV进行调参\n",
+ "#创建网格\n",
+ "param_grid = {'C': [1, 10, 100, 500],\n",
+ " 'penalty': ['l1','l2']}\n",
+ "\n",
+ "from sklearn.model_selection import GridSearchCV\n",
+ "from sklearn.linear_model import LogisticRegression\n",
+ "lr = LogisticRegression() \n",
+ "#五折网格搜索\n",
+ "grid_search = GridSearchCV(lr,param_grid,cv=5)\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 68,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "/home/sp/.local/lib/python3.9/site-packages/sklearn/model_selection/_validation.py:425: FitFailedWarning: \n",
+ "20 fits failed out of a total of 40.\n",
+ "The score on these train-test partitions for these parameters will be set to nan.\n",
+ "If these failures are not expected, you can try to debug them by setting error_score='raise'.\n",
+ "\n",
+ "Below are more details about the failures:\n",
+ "--------------------------------------------------------------------------------\n",
+ "20 fits failed with the following error:\n",
+ "Traceback (most recent call last):\n",
+ " File \"/home/sp/.local/lib/python3.9/site-packages/sklearn/model_selection/_validation.py\", line 729, in _fit_and_score\n",
+ " estimator.fit(X_train, y_train, **fit_params)\n",
+ " File \"/home/sp/.local/lib/python3.9/site-packages/sklearn/base.py\", line 1152, in wrapper\n",
+ " return fit_method(estimator, *args, **kwargs)\n",
+ " File \"/home/sp/.local/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py\", line 1169, in fit\n",
+ " solver = _check_solver(self.solver, self.penalty, self.dual)\n",
+ " File \"/home/sp/.local/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py\", line 56, in _check_solver\n",
+ " raise ValueError(\n",
+ "ValueError: Solver lbfgs supports only 'l2' or 'none' penalties, got l1 penalty.\n",
+ "\n",
+ " warnings.warn(some_fits_failed_message, FitFailedWarning)\n",
+ "/home/sp/.local/lib/python3.9/site-packages/sklearn/model_selection/_search.py:979: UserWarning: One or more of the test scores are non-finite: [ nan 0.93300724 nan 0.93307299 nan 0.93305984\n",
+ " nan 0.93305984]\n",
+ " warnings.warn(\n"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "最好的超参数: {'C': 10, 'penalty': 'l2'}\n",
+ "最好的模型: LogisticRegression(C=10)\n"
+ ]
+ }
+ ],
+ "source": [
+ "grid_search.fit(X_train_std, y_train)\n",
+ "\n",
+ "# 输出最好的超参数\n",
+ "print(\"最好的超参数: \", grid_search.best_params_)\n",
+ "\n",
+ "# 输出最好的模型\n",
+ "print(\"最好的模型: \", grid_search.best_estimator_)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### 7.在测试集上进行预测,计算 查准率/查全率/auc/混淆矩阵/f1值 等测试指标"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 72,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "查准率: 0.9338855652707471\n",
+ "查全率: 0.046522339935513586\n",
+ "auc: 0.5218642464862948\n",
+ "f1值: 0.08570216376750106\n"
+ ]
+ },
+ {
+ "data": {
+ "image/png": "",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "# 提示:在测试集上预测可以使用predict\n",
+ "# 提示:各种指标可以在sklearn.metrics中查到各种评估指标,分别是accuracy_score、recall_score、auc、confusion_matrix、f1_score\n",
+ "from sklearn.metrics import accuracy_score,recall_score,roc_auc_score,confusion_matrix,f1_score,ConfusionMatrixDisplay\n",
+ "lr_best=LogisticRegression(C=10,penalty='l2')\n",
+ "lr_best.fit(X_train_std,y_train)\n",
+ "lr_best_pred=lr_best.predict(X_test_std)\n",
+ "\n",
+ "print('查准率: ',lr_best.score(X_test_std,y_test))\n",
+ "print('查全率: ',recall_score(y_test, lr_best_pred))\n",
+ "print('auc: ',roc_auc_score(y_test, lr_best_pred))\n",
+ "print('f1值: ',f1_score(y_test, lr_best_pred))\n",
+ "\n",
+ "lr_best_cm = confusion_matrix(y_test,lr_best_pred)\n",
+ "\n",
+ "cm_display = ConfusionMatrixDisplay(lr_best_cm).plot()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### 8.更多优化\n",
+ "银行通常会有更严格的要求,因为欺诈带来的后果通常比较严重,一般我们会调整模型的标准。 \n",
+ "\n",
+ "比如在logistic regression当中,一般我们的概率判定边界为0.5,但是我们可以把阈值设定低一些,来提高模型的“敏感度” \n",
+ "试试看把阈值设定为0.3,再看看这个时候的混淆矩阵等评估指标。"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 76,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "阈值为0.5的准确率: 0.9338855652707471\n",
+ "阈值为0.3的准确率: 0.9336401288541187\n"
+ ]
+ },
+ {
+ "data": {
+ "text/plain": [
+ "Text(0.5, 1.0, 'threshold:0.3')"
+ ]
+ },
+ "execution_count": 76,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png": "",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "image/png": "",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "lr_best=LogisticRegression(C=10,penalty='l2')\n",
+ "lr_best.fit(X_train_std,y_train)\n",
+ "acc1=lr_best.score(X_test_std,y_test)\n",
+ "print('阈值为0.5的准确率:',acc1)\n",
+ " \n",
+ "# 预测样本属于正类的概率\n",
+ "y_prob = lr_best.decision_function(X_test_std)\n",
+ "# 设置阈值\n",
+ "threshold = 0.3\n",
+ "# 将概率值转换为类别\n",
+ "lr_best_pred2 = [1 if prob >= threshold else 0 for prob in y_prob]\n",
+ "from sklearn.metrics import accuracy_score\n",
+ "# 计算准确性\n",
+ "acc2 = accuracy_score(y_test, lr_best_pred2)\n",
+ "print(\"阈值为0.3的准确率:\", acc2)\n",
+ "\n",
+ "\n",
+ "# 画混淆矩阵图\n",
+ "cm = confusion_matrix(y_test,lr_best_pred)\n",
+ "cm_display = ConfusionMatrixDisplay(cm).plot()\n",
+ "plt.title('threshold:0.5')\n",
+ "\n",
+ "cm2 = confusion_matrix(y_test,lr_best_pred2)\n",
+ "cm_display2 = ConfusionMatrixDisplay(cm2).plot()\n",
+ "plt.title('threshold:0.3')\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### 9.尝试对不同特征的重要度进行排序,通过特征选择的方式,对特征进行筛选。并重新建模,观察此时的模型准确率等评估指标。"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 80,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "十个特征的重要性 [[-0.01428717 -0.36429751 1.72767996 0.31211219 -0.11518815 -0.09190189\n",
+ " 1.68720119 -0.19646079 -3.24496378 0.11641158]]\n",
+ "用于特征选择的阈值; 0.7870504201386906\n",
+ "特征是否保留 [False False True False False False True False True False]\n",
+ "特征提取结果 [[-0.10690634 -0.06094714 -0.05438132]\n",
+ " [-0.10690634 -0.06094714 -0.05438132]\n",
+ " [-0.10690634 -0.06094714 -0.05438132]\n",
+ " ...\n",
+ " [-0.10690634 -0.06094714 -0.05438132]\n",
+ " [ 0.17284964 -0.06094714 -0.05438132]\n",
+ " [ 0.45260561 -0.06094714 -0.05438132]]\n"
+ ]
+ }
+ ],
+ "source": [
+ "# 可以根据逻辑回归的系数绝对值大小进行排序,也可以基于树模型的特征重要度进行排序\n",
+ "# 特征选择可以使用RFE或者selectFromModel\n",
+ "from sklearn.feature_selection import SelectFromModel\n",
+ "\n",
+ "lr=LogisticRegression(C=10,penalty='l2')\n",
+ "# sfm=SelectFromModel(lr,prefit=True)\n",
+ "\n",
+ "# X_train_std_selected=sfm.transform(X_train_std)\n",
+ "# lr_selected = LogisticRegression(C=10,penalty='l2')\n",
+ "# lr_selected.fit(X_train_std_selected, y_train)\n",
+ "\n",
+ "\n",
+ "# 建立评估器\n",
+ "selector = SelectFromModel(estimator=lr).fit(X_train_std, y_train)\n",
+ "# estimator的模型参数系数\n",
+ "print(\"十个特征的重要性\",selector.estimator_.coef_)\n",
+ "\n",
+ "# 根据estimator中特征重要性均值获得阈值\n",
+ "print(\"用于特征选择的阈值;\",selector.threshold_)\n",
+ "\n",
+ "# 哪些特征入选最后特征,true表示入选\n",
+ "print(\"特征是否保留\",selector.get_support())\n",
+ "# 获得最后结果\n",
+ "print(\"特征提取结果\",selector.transform(X_train_std))\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### 10.其他模型算法尝试\n",
+ "使用RandomForestClassifier/SVM/KNN等sklearn分类算法进行分类,尝试上述超参数调优算法过程。"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 81,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "最好的超参数: {'criterion': 'log_loss', 'n_estimators': 100}\n",
+ "最好的模型: RandomForestClassifier(criterion='log_loss')\n"
+ ]
+ }
+ ],
+ "source": [
+ "# 随机森林\n",
+ "from sklearn.ensemble import RandomForestClassifier\n",
+ "\n",
+ "param_grid = {'criterion': ['gini', 'entropy', 'log_loss'],\n",
+ " 'n_estimators': [10,50,100]}\n",
+ "model = RandomForestClassifier() \n",
+ "\n",
+ "#五折网格搜索\n",
+ "grid_search = GridSearchCV(model,param_grid,cv=5)\n",
+ "grid_search.fit(X_train_std, y_train)\n",
+ "\n",
+ "# 输出最好的超参数\n",
+ "print(\"最好的超参数: \", grid_search.best_params_)\n",
+ "\n",
+ "# 输出最好的模型\n",
+ "print(\"最好的模型: \", grid_search.best_estimator_)\n",
+ "\n",
+ "\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 82,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0.9356036201871453"
+ ]
+ },
+ "execution_count": 82,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "forest_best = RandomForestClassifier(criterion='log_loss', \n",
+ " n_estimators=100, # The number of trees in the forest.\n",
+ " )\n",
+ "# 训练模型\n",
+ "forest_best.fit(X_train_std, y_train)\n",
+ "#评估模型\n",
+ "forest_best.score(X_test_std, y_test)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# 支持向量机\n",
+ "from sklearn.svm import SVC\n",
+ "\n",
+ "param_grid = {'kernel': ['linear', 'rbf'],'max_iter': [100000] }\n",
+ "model = SVC()\n",
+ "\n",
+ "#五折网格搜索\n",
+ "grid_search = GridSearchCV(model,param_grid,cv=5)\n",
+ "grid_search.fit(X_train_std, y_train)\n",
+ "\n",
+ "# 输出最好的超参数\n",
+ "print(\"最好的超参数: \", grid_search.best_params_)\n",
+ "\n",
+ "#耗时太久"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 20,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "/home/sp/.local/lib/python3.9/site-packages/sklearn/svm/_base.py:297: ConvergenceWarning: Solver terminated early (max_iter=100000). Consider pre-processing your data with StandardScaler or MinMaxScaler.\n",
+ " warnings.warn(\n"
+ ]
+ },
+ {
+ "data": {
+ "text/plain": [
+ "0.9334560515416475"
+ ]
+ },
+ "execution_count": 20,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "from sklearn.svm import SVC\n",
+ "SVM_best = SVC(kernel='linear',max_iter=100000)\n",
+ "# 训练模型\n",
+ "SVM_best.fit(X_train_std, y_train)\n",
+ "#评估模型\n",
+ "SVM_best.score(X_test_std, y_test)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 19,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "最好的超参数: {'n_neighbors': 7}\n"
+ ]
+ }
+ ],
+ "source": [
+ "# K近邻\n",
+ "from sklearn.neighbors import KNeighborsClassifier\n",
+ "from sklearn.model_selection import GridSearchCV\n",
+ "param_grid = {'n_neighbors': [5,6,7]\n",
+ " }\n",
+ "model = knn = KNeighborsClassifier(metric='minkowski')\n",
+ "\n",
+ "#五折网格搜索\n",
+ "grid_search = GridSearchCV(model,param_grid,cv=5)\n",
+ "grid_search.fit(X_train_std, y_train)\n",
+ "\n",
+ "# 输出最好的超参数\n",
+ "print(\"最好的超参数: \", grid_search.best_params_)\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 14,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0.9336401288541187"
+ ]
+ },
+ "execution_count": 14,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "knn_best = KNeighborsClassifier(n_neighbors=7, metric='minkowski')\n",
+ "# 训练模型\n",
+ "knn_best.fit(X_train_std, y_train)\n",
+ "#评估模型\n",
+ "knn_best.score(X_test_std, y_test)"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "tdi",
+ "language": "python",
+ "name": "tdi"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.9.16"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/2023/homework/TinglanWang b/2023/homework/TinglanWang
deleted file mode 160000
index 29edfff4..00000000
--- a/2023/homework/TinglanWang
+++ /dev/null
@@ -1 +0,0 @@
-Subproject commit 29edfff435514b020a6b22966debdca091d28662
diff --git a/2023/homework/TinglanWang/homework_credit_scoring.ipynb b/2023/homework/TinglanWang/homework_credit_scoring.ipynb
new file mode 100644
index 00000000..0a65bc25
--- /dev/null
+++ b/2023/homework/TinglanWang/homework_credit_scoring.ipynb
@@ -0,0 +1,1873 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# 一起来打怪之 Credit Scoring 练习"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "---\n",
+ "## 作业说明\n",
+ "\n",
+ "- 答题步骤:\n",
+ " - 回答问题**请保留每一步**操作过程,请不要仅仅给出最后答案\n",
+ " - 请养成代码注释的好习惯\n",
+ "\n",
+ "- 解题思路:\n",
+ " - 为方便大家准确理解题目,在习题实战中有所收获,本文档提供了解题思路提示\n",
+ " - 解题思路**仅供参考**,鼓励原创解题方法\n",
+ " - 为督促同学们自己思考,解题思路内容设置为**白色**,必要时请从冒号后拖动鼠标查看\n",
+ "\n",
+ "- 所用数据\n",
+ " - 请注意导入数据库后先**查看和了解数据的基本性质**,后面的问题不再一一提醒"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## machine learning for credit scoring\n",
+ "\n",
+ "\n",
+ "Banks play a crucial role in market economies. They decide who can get finance and on what terms and can make or break investment decisions. For markets and society to function, individuals and companies need access to credit. \n",
+ "\n",
+ "Credit scoring algorithms, which make a guess at the probability of default, are the method banks use to determine whether or not a loan should be granted. This competition requires participants to improve on the state of the art in credit scoring, by predicting the probability that somebody will experience financial distress in the next two years. [Dataset](https://www.kaggle.com/c/GiveMeSomeCredit)\n",
+ "\n",
+ "Attribute Information:\n",
+ "\n",
+ "|Variable Name\t|\tDescription\t|\tType|\n",
+ "|----|----|----|\n",
+ "|SeriousDlqin2yrs\t|\tPerson experienced 90 days past due delinquency or worse \t|\tY/N|\n",
+ "|RevolvingUtilizationOfUnsecuredLines\t|\tTotal balance on credit divided by the sum of credit limits\t|\tpercentage|\n",
+ "|age\t|\tAge of borrower in years\t|\tinteger|\n",
+ "|NumberOfTime30-59DaysPastDueNotWorse\t|\tNumber of times borrower has been 30-59 days past due |\tinteger|\n",
+ "|DebtRatio\t|\tMonthly debt payments\t|\tpercentage|\n",
+ "|MonthlyIncome\t|\tMonthly income\t|\treal|\n",
+ "|NumberOfOpenCreditLinesAndLoans\t|\tNumber of Open loans |\tinteger|\n",
+ "|NumberOfTimes90DaysLate\t|\tNumber of times borrower has been 90 days or more past due.\t|\tinteger|\n",
+ "|NumberRealEstateLoansOrLines\t|\tNumber of mortgage and real estate loans\t|\tinteger|\n",
+ "|NumberOfTime60-89DaysPastDueNotWorse\t|\tNumber of times borrower has been 60-89 days past due |integer|\n",
+ "|NumberOfDependents\t|\tNumber of dependents in family\t|\tinteger|\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "----------\n",
+ "## Read the data into Pandas "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import pandas as pd\n",
+ "import numpy as np\n",
+ "pd.set_option('display.max_columns', 500)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "DataDictionary_path = \"/home/Wangtl2022/PyProjects/Bootcamp/Data Dictionary.xls\"\n",
+ "cs_test_path = \"/home/Wangtl2022/PyProjects/Bootcamp/cs-test.csv\"\n",
+ "cs_train_path = \"/home/Wangtl2022/PyProjects/Bootcamp/cs-training.csv\"\n",
+ "se_path = \"/home/Wangtl2022/PyProjects/Bootcamp/sampleEntry.csv\"\n",
+ "\n",
+ "DataDictionary = pd.read_excel(DataDictionary_path)\n",
+ "cs_train = pd.read_csv(cs_train_path)\n",
+ "cs_test = pd.read_csv(cs_test_path)\n",
+ "se = pd.read_csv(se_path)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#!pip install xlrd"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " Unnamed: 0 \n",
+ " Unnamed: 1 \n",
+ " Unnamed: 2 \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " Variable Name \n",
+ " Description \n",
+ " Type \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " SeriousDlqin2yrs \n",
+ " Person experienced 90 days past due delinquenc... \n",
+ " Y/N \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " RevolvingUtilizationOfUnsecuredLines \n",
+ " Total balance on credit cards and personal lin... \n",
+ " percentage \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " age \n",
+ " Age of borrower in years \n",
+ " integer \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " NumberOfTime30-59DaysPastDueNotWorse \n",
+ " Number of times borrower has been 30-59 days p... \n",
+ " integer \n",
+ " \n",
+ " \n",
+ " 5 \n",
+ " DebtRatio \n",
+ " Monthly debt payments, alimony,living costs di... \n",
+ " percentage \n",
+ " \n",
+ " \n",
+ " 6 \n",
+ " MonthlyIncome \n",
+ " Monthly income \n",
+ " real \n",
+ " \n",
+ " \n",
+ " 7 \n",
+ " NumberOfOpenCreditLinesAndLoans \n",
+ " Number of Open loans (installment like car loa... \n",
+ " integer \n",
+ " \n",
+ " \n",
+ " 8 \n",
+ " NumberOfTimes90DaysLate \n",
+ " Number of times borrower has been 90 days or m... \n",
+ " integer \n",
+ " \n",
+ " \n",
+ " 9 \n",
+ " NumberRealEstateLoansOrLines \n",
+ " Number of mortgage and real estate loans inclu... \n",
+ " integer \n",
+ " \n",
+ " \n",
+ " 10 \n",
+ " NumberOfTime60-89DaysPastDueNotWorse \n",
+ " Number of times borrower has been 60-89 days p... \n",
+ " integer \n",
+ " \n",
+ " \n",
+ " 11 \n",
+ " NumberOfDependents \n",
+ " Number of dependents in family excluding thems... \n",
+ " integer \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " Unnamed: 0 \\\n",
+ "0 Variable Name \n",
+ "1 SeriousDlqin2yrs \n",
+ "2 RevolvingUtilizationOfUnsecuredLines \n",
+ "3 age \n",
+ "4 NumberOfTime30-59DaysPastDueNotWorse \n",
+ "5 DebtRatio \n",
+ "6 MonthlyIncome \n",
+ "7 NumberOfOpenCreditLinesAndLoans \n",
+ "8 NumberOfTimes90DaysLate \n",
+ "9 NumberRealEstateLoansOrLines \n",
+ "10 NumberOfTime60-89DaysPastDueNotWorse \n",
+ "11 NumberOfDependents \n",
+ "\n",
+ " Unnamed: 1 Unnamed: 2 \n",
+ "0 Description Type \n",
+ "1 Person experienced 90 days past due delinquenc... Y/N \n",
+ "2 Total balance on credit cards and personal lin... percentage \n",
+ "3 Age of borrower in years integer \n",
+ "4 Number of times borrower has been 30-59 days p... integer \n",
+ "5 Monthly debt payments, alimony,living costs di... percentage \n",
+ "6 Monthly income real \n",
+ "7 Number of Open loans (installment like car loa... integer \n",
+ "8 Number of times borrower has been 90 days or m... integer \n",
+ "9 Number of mortgage and real estate loans inclu... integer \n",
+ "10 Number of times borrower has been 60-89 days p... integer \n",
+ "11 Number of dependents in family excluding thems... integer "
+ ]
+ },
+ "execution_count": 4,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "DataDictionary"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " Unnamed: 0 \n",
+ " SeriousDlqin2yrs \n",
+ " RevolvingUtilizationOfUnsecuredLines \n",
+ " age \n",
+ " NumberOfTime30-59DaysPastDueNotWorse \n",
+ " DebtRatio \n",
+ " MonthlyIncome \n",
+ " NumberOfOpenCreditLinesAndLoans \n",
+ " NumberOfTimes90DaysLate \n",
+ " NumberRealEstateLoansOrLines \n",
+ " NumberOfTime60-89DaysPastDueNotWorse \n",
+ " NumberOfDependents \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 1 \n",
+ " 1 \n",
+ " 0.766127 \n",
+ " 45 \n",
+ " 2 \n",
+ " 0.802982 \n",
+ " 9120.0 \n",
+ " 13 \n",
+ " 0 \n",
+ " 6 \n",
+ " 0 \n",
+ " 2.0 \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 2 \n",
+ " 0 \n",
+ " 0.957151 \n",
+ " 40 \n",
+ " 0 \n",
+ " 0.121876 \n",
+ " 2600.0 \n",
+ " 4 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 1.0 \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 3 \n",
+ " 0 \n",
+ " 0.658180 \n",
+ " 38 \n",
+ " 1 \n",
+ " 0.085113 \n",
+ " 3042.0 \n",
+ " 2 \n",
+ " 1 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 4 \n",
+ " 0 \n",
+ " 0.233810 \n",
+ " 30 \n",
+ " 0 \n",
+ " 0.036050 \n",
+ " 3300.0 \n",
+ " 5 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " 5 \n",
+ " 0 \n",
+ " 0.907239 \n",
+ " 49 \n",
+ " 1 \n",
+ " 0.024926 \n",
+ " 63588.0 \n",
+ " 7 \n",
+ " 0 \n",
+ " 1 \n",
+ " 0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " \n",
+ " \n",
+ " 149995 \n",
+ " 149996 \n",
+ " 0 \n",
+ " 0.040674 \n",
+ " 74 \n",
+ " 0 \n",
+ " 0.225131 \n",
+ " 2100.0 \n",
+ " 4 \n",
+ " 0 \n",
+ " 1 \n",
+ " 0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 149996 \n",
+ " 149997 \n",
+ " 0 \n",
+ " 0.299745 \n",
+ " 44 \n",
+ " 0 \n",
+ " 0.716562 \n",
+ " 5584.0 \n",
+ " 4 \n",
+ " 0 \n",
+ " 1 \n",
+ " 0 \n",
+ " 2.0 \n",
+ " \n",
+ " \n",
+ " 149997 \n",
+ " 149998 \n",
+ " 0 \n",
+ " 0.246044 \n",
+ " 58 \n",
+ " 0 \n",
+ " 3870.000000 \n",
+ " NaN \n",
+ " 18 \n",
+ " 0 \n",
+ " 1 \n",
+ " 0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 149998 \n",
+ " 149999 \n",
+ " 0 \n",
+ " 0.000000 \n",
+ " 30 \n",
+ " 0 \n",
+ " 0.000000 \n",
+ " 5716.0 \n",
+ " 4 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 149999 \n",
+ " 150000 \n",
+ " 0 \n",
+ " 0.850283 \n",
+ " 64 \n",
+ " 0 \n",
+ " 0.249908 \n",
+ " 8158.0 \n",
+ " 8 \n",
+ " 0 \n",
+ " 2 \n",
+ " 0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
150000 rows × 12 columns
\n",
+ "
"
+ ],
+ "text/plain": [
+ " Unnamed: 0 SeriousDlqin2yrs RevolvingUtilizationOfUnsecuredLines \\\n",
+ "0 1 1 0.766127 \n",
+ "1 2 0 0.957151 \n",
+ "2 3 0 0.658180 \n",
+ "3 4 0 0.233810 \n",
+ "4 5 0 0.907239 \n",
+ "... ... ... ... \n",
+ "149995 149996 0 0.040674 \n",
+ "149996 149997 0 0.299745 \n",
+ "149997 149998 0 0.246044 \n",
+ "149998 149999 0 0.000000 \n",
+ "149999 150000 0 0.850283 \n",
+ "\n",
+ " age NumberOfTime30-59DaysPastDueNotWorse DebtRatio MonthlyIncome \\\n",
+ "0 45 2 0.802982 9120.0 \n",
+ "1 40 0 0.121876 2600.0 \n",
+ "2 38 1 0.085113 3042.0 \n",
+ "3 30 0 0.036050 3300.0 \n",
+ "4 49 1 0.024926 63588.0 \n",
+ "... ... ... ... ... \n",
+ "149995 74 0 0.225131 2100.0 \n",
+ "149996 44 0 0.716562 5584.0 \n",
+ "149997 58 0 3870.000000 NaN \n",
+ "149998 30 0 0.000000 5716.0 \n",
+ "149999 64 0 0.249908 8158.0 \n",
+ "\n",
+ " NumberOfOpenCreditLinesAndLoans NumberOfTimes90DaysLate \\\n",
+ "0 13 0 \n",
+ "1 4 0 \n",
+ "2 2 1 \n",
+ "3 5 0 \n",
+ "4 7 0 \n",
+ "... ... ... \n",
+ "149995 4 0 \n",
+ "149996 4 0 \n",
+ "149997 18 0 \n",
+ "149998 4 0 \n",
+ "149999 8 0 \n",
+ "\n",
+ " NumberRealEstateLoansOrLines NumberOfTime60-89DaysPastDueNotWorse \\\n",
+ "0 6 0 \n",
+ "1 0 0 \n",
+ "2 0 0 \n",
+ "3 0 0 \n",
+ "4 1 0 \n",
+ "... ... ... \n",
+ "149995 1 0 \n",
+ "149996 1 0 \n",
+ "149997 1 0 \n",
+ "149998 0 0 \n",
+ "149999 2 0 \n",
+ "\n",
+ " NumberOfDependents \n",
+ "0 2.0 \n",
+ "1 1.0 \n",
+ "2 0.0 \n",
+ "3 0.0 \n",
+ "4 0.0 \n",
+ "... ... \n",
+ "149995 0.0 \n",
+ "149996 2.0 \n",
+ "149997 0.0 \n",
+ "149998 0.0 \n",
+ "149999 0.0 \n",
+ "\n",
+ "[150000 rows x 12 columns]"
+ ]
+ },
+ "execution_count": 5,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "cs_train"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " Id \n",
+ " Probability \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 1 \n",
+ " 0.080807 \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 2 \n",
+ " 0.040719 \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 3 \n",
+ " 0.011968 \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 4 \n",
+ " 0.067640 \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " 5 \n",
+ " 0.108264 \n",
+ " \n",
+ " \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " \n",
+ " \n",
+ " 101498 \n",
+ " 101499 \n",
+ " 0.045363 \n",
+ " \n",
+ " \n",
+ " 101499 \n",
+ " 101500 \n",
+ " 0.343775 \n",
+ " \n",
+ " \n",
+ " 101500 \n",
+ " 101501 \n",
+ " 0.006970 \n",
+ " \n",
+ " \n",
+ " 101501 \n",
+ " 101502 \n",
+ " 0.121994 \n",
+ " \n",
+ " \n",
+ " 101502 \n",
+ " 101503 \n",
+ " 0.044248 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
101503 rows × 2 columns
\n",
+ "
"
+ ],
+ "text/plain": [
+ " Id Probability\n",
+ "0 1 0.080807\n",
+ "1 2 0.040719\n",
+ "2 3 0.011968\n",
+ "3 4 0.067640\n",
+ "4 5 0.108264\n",
+ "... ... ...\n",
+ "101498 101499 0.045363\n",
+ "101499 101500 0.343775\n",
+ "101500 101501 0.006970\n",
+ "101501 101502 0.121994\n",
+ "101502 101503 0.044248\n",
+ "\n",
+ "[101503 rows x 2 columns]"
+ ]
+ },
+ "execution_count": 6,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "se"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " Unnamed: 0 \n",
+ " SeriousDlqin2yrs \n",
+ " RevolvingUtilizationOfUnsecuredLines \n",
+ " age \n",
+ " NumberOfTime30-59DaysPastDueNotWorse \n",
+ " DebtRatio \n",
+ " MonthlyIncome \n",
+ " NumberOfOpenCreditLinesAndLoans \n",
+ " NumberOfTimes90DaysLate \n",
+ " NumberRealEstateLoansOrLines \n",
+ " NumberOfTime60-89DaysPastDueNotWorse \n",
+ " NumberOfDependents \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 1 \n",
+ " NaN \n",
+ " 0.885519 \n",
+ " 43 \n",
+ " 0 \n",
+ " 0.177513 \n",
+ " 5700.0 \n",
+ " 4 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 2 \n",
+ " NaN \n",
+ " 0.463295 \n",
+ " 57 \n",
+ " 0 \n",
+ " 0.527237 \n",
+ " 9141.0 \n",
+ " 15 \n",
+ " 0 \n",
+ " 4 \n",
+ " 0 \n",
+ " 2.0 \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 3 \n",
+ " NaN \n",
+ " 0.043275 \n",
+ " 59 \n",
+ " 0 \n",
+ " 0.687648 \n",
+ " 5083.0 \n",
+ " 12 \n",
+ " 0 \n",
+ " 1 \n",
+ " 0 \n",
+ " 2.0 \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 4 \n",
+ " NaN \n",
+ " 0.280308 \n",
+ " 38 \n",
+ " 1 \n",
+ " 0.925961 \n",
+ " 3200.0 \n",
+ " 7 \n",
+ " 0 \n",
+ " 2 \n",
+ " 0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " 5 \n",
+ " NaN \n",
+ " 1.000000 \n",
+ " 27 \n",
+ " 0 \n",
+ " 0.019917 \n",
+ " 3865.0 \n",
+ " 4 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 1.0 \n",
+ " \n",
+ " \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " \n",
+ " \n",
+ " 101498 \n",
+ " 101499 \n",
+ " NaN \n",
+ " 0.282653 \n",
+ " 24 \n",
+ " 0 \n",
+ " 0.068522 \n",
+ " 1400.0 \n",
+ " 5 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 101499 \n",
+ " 101500 \n",
+ " NaN \n",
+ " 0.922156 \n",
+ " 36 \n",
+ " 3 \n",
+ " 0.934217 \n",
+ " 7615.0 \n",
+ " 8 \n",
+ " 0 \n",
+ " 2 \n",
+ " 0 \n",
+ " 4.0 \n",
+ " \n",
+ " \n",
+ " 101500 \n",
+ " 101501 \n",
+ " NaN \n",
+ " 0.081596 \n",
+ " 70 \n",
+ " 0 \n",
+ " 836.000000 \n",
+ " NaN \n",
+ " 3 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " NaN \n",
+ " \n",
+ " \n",
+ " 101501 \n",
+ " 101502 \n",
+ " NaN \n",
+ " 0.335457 \n",
+ " 56 \n",
+ " 0 \n",
+ " 3568.000000 \n",
+ " NaN \n",
+ " 8 \n",
+ " 0 \n",
+ " 2 \n",
+ " 1 \n",
+ " 3.0 \n",
+ " \n",
+ " \n",
+ " 101502 \n",
+ " 101503 \n",
+ " NaN \n",
+ " 0.441842 \n",
+ " 29 \n",
+ " 0 \n",
+ " 0.198918 \n",
+ " 5916.0 \n",
+ " 12 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
101503 rows × 12 columns
\n",
+ "
"
+ ],
+ "text/plain": [
+ " Unnamed: 0 SeriousDlqin2yrs RevolvingUtilizationOfUnsecuredLines \\\n",
+ "0 1 NaN 0.885519 \n",
+ "1 2 NaN 0.463295 \n",
+ "2 3 NaN 0.043275 \n",
+ "3 4 NaN 0.280308 \n",
+ "4 5 NaN 1.000000 \n",
+ "... ... ... ... \n",
+ "101498 101499 NaN 0.282653 \n",
+ "101499 101500 NaN 0.922156 \n",
+ "101500 101501 NaN 0.081596 \n",
+ "101501 101502 NaN 0.335457 \n",
+ "101502 101503 NaN 0.441842 \n",
+ "\n",
+ " age NumberOfTime30-59DaysPastDueNotWorse DebtRatio MonthlyIncome \\\n",
+ "0 43 0 0.177513 5700.0 \n",
+ "1 57 0 0.527237 9141.0 \n",
+ "2 59 0 0.687648 5083.0 \n",
+ "3 38 1 0.925961 3200.0 \n",
+ "4 27 0 0.019917 3865.0 \n",
+ "... ... ... ... ... \n",
+ "101498 24 0 0.068522 1400.0 \n",
+ "101499 36 3 0.934217 7615.0 \n",
+ "101500 70 0 836.000000 NaN \n",
+ "101501 56 0 3568.000000 NaN \n",
+ "101502 29 0 0.198918 5916.0 \n",
+ "\n",
+ " NumberOfOpenCreditLinesAndLoans NumberOfTimes90DaysLate \\\n",
+ "0 4 0 \n",
+ "1 15 0 \n",
+ "2 12 0 \n",
+ "3 7 0 \n",
+ "4 4 0 \n",
+ "... ... ... \n",
+ "101498 5 0 \n",
+ "101499 8 0 \n",
+ "101500 3 0 \n",
+ "101501 8 0 \n",
+ "101502 12 0 \n",
+ "\n",
+ " NumberRealEstateLoansOrLines NumberOfTime60-89DaysPastDueNotWorse \\\n",
+ "0 0 0 \n",
+ "1 4 0 \n",
+ "2 1 0 \n",
+ "3 2 0 \n",
+ "4 0 0 \n",
+ "... ... ... \n",
+ "101498 0 0 \n",
+ "101499 2 0 \n",
+ "101500 0 0 \n",
+ "101501 2 1 \n",
+ "101502 0 0 \n",
+ "\n",
+ " NumberOfDependents \n",
+ "0 0.0 \n",
+ "1 2.0 \n",
+ "2 2.0 \n",
+ "3 0.0 \n",
+ "4 1.0 \n",
+ "... ... \n",
+ "101498 0.0 \n",
+ "101499 4.0 \n",
+ "101500 NaN \n",
+ "101501 3.0 \n",
+ "101502 0.0 \n",
+ "\n",
+ "[101503 rows x 12 columns]"
+ ]
+ },
+ "execution_count": 7,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "cs_test"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "cs_train = cs_train.drop(columns=[\"Unnamed: 0\"])\n",
+ "#删除无用列"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "cs_test = cs_test.drop(columns=[\"Unnamed: 0\",\"SeriousDlqin2yrs\"])\n",
+ "#删除无用列和label列"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "------------\n",
+ "## Drop na"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "metadata": {
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "SeriousDlqin2yrs 0\n",
+ "RevolvingUtilizationOfUnsecuredLines 0\n",
+ "age 0\n",
+ "NumberOfTime30-59DaysPastDueNotWorse 0\n",
+ "DebtRatio 0\n",
+ "MonthlyIncome 29731\n",
+ "NumberOfOpenCreditLinesAndLoans 0\n",
+ "NumberOfTimes90DaysLate 0\n",
+ "NumberRealEstateLoansOrLines 0\n",
+ "NumberOfTime60-89DaysPastDueNotWorse 0\n",
+ "NumberOfDependents 3924\n",
+ "dtype: int64"
+ ]
+ },
+ "execution_count": 10,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "cs_train.isnull().sum(axis=0)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "RevolvingUtilizationOfUnsecuredLines 0\n",
+ "age 0\n",
+ "NumberOfTime30-59DaysPastDueNotWorse 0\n",
+ "DebtRatio 0\n",
+ "MonthlyIncome 20103\n",
+ "NumberOfOpenCreditLinesAndLoans 0\n",
+ "NumberOfTimes90DaysLate 0\n",
+ "NumberRealEstateLoansOrLines 0\n",
+ "NumberOfTime60-89DaysPastDueNotWorse 0\n",
+ "NumberOfDependents 2626\n",
+ "dtype: int64"
+ ]
+ },
+ "execution_count": 11,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "cs_test.isnull().sum(axis=0)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "cs_train.dropna(inplace=True)\n",
+ "cs_test.dropna(inplace=True)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 13,
+ "metadata": {
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "((120269, 11), (81400, 10))"
+ ]
+ },
+ "execution_count": 13,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "cs_train.shape,cs_test.shape"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "---------\n",
+ "## Create X and y"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 14,
+ "metadata": {
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [],
+ "source": [
+ "y =cs_train['SeriousDlqin2yrs']\n",
+ "X = cs_train.drop('SeriousDlqin2yrs', axis=1)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "---\n",
+ "## 练习1:把数据切分成训练集和测试集\n",
+ "- 提示:from sklearn.model_selection import train_test_split('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 15,
+ "metadata": {
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [],
+ "source": [
+ "from sklearn.model_selection import train_test_split\n",
+ "\n",
+ "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "----\n",
+ "## 练习2:使用logistic regression/决策树/SVM/KNN...等sklearn分类算法进行分类\n",
+ "尝试查sklearn API了解模型参数含义,调整不同的参数"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Logistic regression\n",
+ "- 提示:from sklearn import linear_model('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 27,
+ "metadata": {
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [],
+ "source": [
+ "from sklearn.linear_model import LogisticRegression"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 28,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "/usr/local/lib64/python3.6/site-packages/sklearn/linear_model/_logistic.py:765: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
+ "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
+ "\n",
+ "Increase the number of iterations (max_iter) or scale the data as shown in:\n",
+ " https://scikit-learn.org/stable/modules/preprocessing.html\n",
+ "Please also refer to the documentation for alternative solver options:\n",
+ " https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
+ " extra_warning_msg=_LOGISTIC_SOLVER_CONVERGENCE_MSG)\n"
+ ]
+ },
+ {
+ "data": {
+ "text/plain": [
+ "LogisticRegression(max_iter=300)"
+ ]
+ },
+ "execution_count": 28,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "from sklearn.metrics import accuracy_score\n",
+ "\n",
+ "\n",
+ "lr = LogisticRegression(max_iter=200)\n",
+ "##迭代次数默认为100,结果显示不收敛(但acc=0.93),因此调整为200.\n",
+ "\n",
+ "lr.fit(X_train, y_train)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Decision Tree\n",
+ "- 提示:from sklearn.tree import DecisionTreeClassifier('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 29,
+ "metadata": {
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "DecisionTreeClassifier(random_state=42)"
+ ]
+ },
+ "execution_count": 29,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "from sklearn.tree import DecisionTreeClassifier\n",
+ "\n",
+ "DT = DecisionTreeClassifier(random_state=42)\n",
+ "\n",
+ "# 训练模型\n",
+ "DT.fit(X_train, y_train)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Random Forest\n",
+ "- 提示:from sklearn.ensemble import RandomForestClassifier('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 30,
+ "metadata": {
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "RandomForestClassifier(max_depth=1, n_estimators=1, random_state=42)"
+ ]
+ },
+ "execution_count": 30,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "from sklearn.ensemble import RandomForestClassifier\n",
+ "\n",
+ "RF = RandomForestClassifier(n_estimators = 1, max_depth = 1, random_state = 42)\n",
+ "\n",
+ "RF.fit(X_train, y_train)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### SVM\n",
+ "- 提示:from sklearn.svm import SVC('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 31,
+ "metadata": {
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "/usr/local/lib64/python3.6/site-packages/sklearn/svm/_base.py:258: ConvergenceWarning: Solver terminated early (max_iter=2000). Consider pre-processing your data with StandardScaler or MinMaxScaler.\n",
+ " % self.max_iter, ConvergenceWarning)\n"
+ ]
+ },
+ {
+ "data": {
+ "text/plain": [
+ "SVC(kernel='linear', max_iter=2000, random_state=42)"
+ ]
+ },
+ "execution_count": 31,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "from sklearn.svm import SVC\n",
+ "from sklearn.metrics import accuracy_score\n",
+ "\n",
+ "svc = SVC(kernel='linear', max_iter = 2000,random_state=42) #收敛太慢,因此设置最大迭代次数\n",
+ "\n",
+ "svc.fit(X_train, y_train)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### KNN\n",
+ "- 提示:from sklearn.neighbors import KNeighborsClassifier('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 32,
+ "metadata": {
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [],
+ "source": [
+ "from sklearn.neighbors import KNeighborsClassifier\n",
+ "\n",
+ "\n",
+ "knn = KNeighborsClassifier(n_neighbors=5)\n",
+ "\n",
+ "knn.fit(X_train, y_train)\n",
+ "\n",
+ "y_pred = model.predict(X_test)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "---\n",
+ "\n",
+ "## 练习3:在测试集上进行预测,计算准确度"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Logistic regression\n",
+ "- 提示:y_pred_LR = clf_LR.predict(x_test)('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 36,
+ "metadata": {
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0.9289515257337657"
+ ]
+ },
+ "execution_count": 36,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "from sklearn.metrics import accuracy_score\n",
+ "\n",
+ "lr_pred = lr.predict(X_test)\n",
+ "\n",
+ "accuracy = accuracy_score(y_test, lr_pred)\n",
+ "accuracy"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Decision Tree\n",
+ "- 提示:y_pred_tree = tree.predict(x_test)('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 37,
+ "metadata": {
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0.8903716637565477"
+ ]
+ },
+ "execution_count": 37,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "dt_pred = DT.predict(X_test)\n",
+ "\n",
+ "accuracy = accuracy_score(y_test, dt_pred)\n",
+ "accuracy"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Random Forest\n",
+ "- 提示:y_pred_forest = forest.predict(x_test)('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 38,
+ "metadata": {
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0.9294088301322025"
+ ]
+ },
+ "execution_count": 38,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "rf_pred = RF.predict(X_test)\n",
+ "\n",
+ "accuracy = accuracy_score(y_test, rf_pred)\n",
+ "accuracy"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### SVM\n",
+ "- 提示:y_pred_SVC = clf_svc.predict(x_test)('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 39,
+ "metadata": {
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0.9244200548765278"
+ ]
+ },
+ "execution_count": 39,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "svc_pred = svc.predict(X_test)\n",
+ "\n",
+ "accuracy = accuracy_score(y_test, svc_pred)\n",
+ "accuracy"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### KNN\n",
+ "- 提示:y_pred_KNN = neigh.predict(x_test)('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 40,
+ "metadata": {
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0.9278290513012388"
+ ]
+ },
+ "execution_count": 40,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "knn_pred = knn.predict(X_test)\n",
+ "\n",
+ "accuracy = accuracy_score(y_test, knn_pred)\n",
+ "accuracy"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "---\n",
+ "## 练习4:查看sklearn的官方说明,了解分类问题的评估标准,并对此例进行评估"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**混淆矩阵(Confusion Matrix)相关学习链接**\n",
+ "\n",
+ "- Blog: \n",
+ "http://blog.csdn.net/vesper305/article/details/44927047 \n",
+ "- WiKi: \n",
+ "http://en.wikipedia.org/wiki/Confusion_matrix \n",
+ "- sklearn doc: \n",
+ "http://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 44,
+ "metadata": {
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "array([[22300, 56],\n",
+ " [ 1653, 45]])"
+ ]
+ },
+ "execution_count": 44,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "from sklearn.metrics import confusion_matrix\n",
+ "import seaborn as sns\n",
+ "import matplotlib.pyplot as plt\n",
+ "\n",
+ "cm = confusion_matrix(y_test, lr_pred)\n",
+ "cm"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 45,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "array([[22356, 0],\n",
+ " [ 1698, 0]])"
+ ]
+ },
+ "execution_count": 45,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "\n",
+ "cm = confusion_matrix(y_test, rf_pred)\n",
+ "cm"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 46,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "array([[22235, 121],\n",
+ " [ 1697, 1]])"
+ ]
+ },
+ "execution_count": 46,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "\n",
+ "cm = confusion_matrix(y_test, svc_pred)\n",
+ "cm"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 47,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "array([[20968, 1388],\n",
+ " [ 1249, 449]])"
+ ]
+ },
+ "execution_count": 47,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "cm = confusion_matrix(y_test, dt_pred)\n",
+ "cm"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 48,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "array([[22293, 63],\n",
+ " [ 1673, 25]])"
+ ]
+ },
+ "execution_count": 48,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "cm = confusion_matrix(y_test, knn_pred)\n",
+ "cm"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 练习5:调整模型的标准\n",
+ "\n",
+ "银行通常会有更严格的要求,因为fraud带来的后果通常比较严重,一般我们会调整模型的标准。 \n",
+ "\n",
+ "比如在logistic regression当中,一般我们的概率判定边界为0.5,但是我们可以把阈值设定低一些,来提高模型的“敏感度”,试试看把阈值设定为0.3,再看看这时的评估指标(主要是准确率和召回率)。\n",
+ "\n",
+ "- 提示:sklearn的很多分类模型,predict_prob可以拿到预估的概率,可以根据它和设定的阈值大小去判断最终结果(分类类别)('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 49,
+ "metadata": {
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "array([0, 0, 0, ..., 0, 0, 0])"
+ ]
+ },
+ "execution_count": 49,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "lr_pred"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 51,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "lr_pred_prob = lr.predict_proba(X_test)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 53,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "array([0.75683688, 0.24316312])"
+ ]
+ },
+ "execution_count": 53,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "lr_pred_prob"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 61,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "lr_pred_new = []\n",
+ "\n",
+ "for i in lr_pred_prob:\n",
+ " if i[0]>=0.3:\n",
+ " lr_pred_new.append(0)\n",
+ " else:\n",
+ " lr_pred_new.append(1)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 62,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "lr_pred_new = np.array(lr_pred_new)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 63,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(0.44554455445544555,\n",
+ " 0.026501766784452298,\n",
+ " 0.44680851063829785,\n",
+ " 0.012367491166077738)"
+ ]
+ },
+ "execution_count": 63,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "from sklearn.metrics import precision_score, recall_score\n",
+ "\n",
+ "#计算原来的recall和pre\n",
+ "pre = precision_score(y_test, lr_pred)\n",
+ "\n",
+ "recall = recall_score(y_test, lr_pred)\n",
+ "\n",
+ "#计算新的\n",
+ "pre_03 = precision_score(y_test, lr_pred_new)\n",
+ "\n",
+ "recall_03 = recall_score(y_test, lr_pred_new)\n",
+ "pre,recall,pre_03,recall_03"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 64,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#pre基本没变,但recall大幅下降。"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "ml",
+ "language": "python",
+ "name": "ml"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.6.8"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/2023/homework/TinglanWang/homework_credit_scoring_finetune_ensemble.ipynb b/2023/homework/TinglanWang/homework_credit_scoring_finetune_ensemble.ipynb
new file mode 100644
index 00000000..f03f0fae
--- /dev/null
+++ b/2023/homework/TinglanWang/homework_credit_scoring_finetune_ensemble.ipynb
@@ -0,0 +1,3118 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# 一起来打怪之 Credit Scoring 练习"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "-------\n",
+ "## >>>说明:\n",
+ "### 1. 答题步骤:\n",
+ "- 回答问题**请保留每一步**操作过程,请不要仅仅给出最后答案\n",
+ "- 请养成代码注释的好习惯\n",
+ "\n",
+ "### 2. 解题思路:\n",
+ "- 为方便大家准确理解题目,在习题实战中有所收获,本文档提供了解题思路提示\n",
+ "- 解题思路**仅供参考**,鼓励原创解题方法\n",
+ "- 为督促同学们自己思考,解题思路内容设置为**注释**,请注意查看\n",
+ "\n",
+ "### 3. 所用数据:\n",
+ "- 问题使用了多个数据库,请注意导入每个数据库后都先**查看和了解数据的基本性质**,后面的问题不再一一提醒"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "--------\n",
+ "## 操作题"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### 信用卡欺诈项目"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ " #### 前期数据导入,预览及处理(此部分勿修改,涉及的数据文件无需复制移动)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " SeriousDlqin2yrs \n",
+ " RevolvingUtilizationOfUnsecuredLines \n",
+ " age \n",
+ " NumberOfTime30-59DaysPastDueNotWorse \n",
+ " DebtRatio \n",
+ " MonthlyIncome \n",
+ " NumberOfOpenCreditLinesAndLoans \n",
+ " NumberOfTimes90DaysLate \n",
+ " NumberRealEstateLoansOrLines \n",
+ " NumberOfTime60-89DaysPastDueNotWorse \n",
+ " NumberOfDependents \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 1 \n",
+ " 0.766127 \n",
+ " 45.0 \n",
+ " 2.0 \n",
+ " 0.802982 \n",
+ " 9120.0 \n",
+ " 13.0 \n",
+ " 0.0 \n",
+ " 6.0 \n",
+ " 0.0 \n",
+ " 2.0 \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 0 \n",
+ " 0.957151 \n",
+ " 40.0 \n",
+ " 0.0 \n",
+ " 0.121876 \n",
+ " 2600.0 \n",
+ " 4.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 0 \n",
+ " 0.658180 \n",
+ " 38.0 \n",
+ " 1.0 \n",
+ " 0.085113 \n",
+ " 3042.0 \n",
+ " 2.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 0 \n",
+ " 0.233810 \n",
+ " 30.0 \n",
+ " 0.0 \n",
+ " 0.036050 \n",
+ " 3300.0 \n",
+ " 5.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " 0 \n",
+ " 0.907239 \n",
+ " 49.0 \n",
+ " 1.0 \n",
+ " 0.024926 \n",
+ " 63588.0 \n",
+ " 7.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " SeriousDlqin2yrs RevolvingUtilizationOfUnsecuredLines age \\\n",
+ "0 1 0.766127 45.0 \n",
+ "1 0 0.957151 40.0 \n",
+ "2 0 0.658180 38.0 \n",
+ "3 0 0.233810 30.0 \n",
+ "4 0 0.907239 49.0 \n",
+ "\n",
+ " NumberOfTime30-59DaysPastDueNotWorse DebtRatio MonthlyIncome \\\n",
+ "0 2.0 0.802982 9120.0 \n",
+ "1 0.0 0.121876 2600.0 \n",
+ "2 1.0 0.085113 3042.0 \n",
+ "3 0.0 0.036050 3300.0 \n",
+ "4 1.0 0.024926 63588.0 \n",
+ "\n",
+ " NumberOfOpenCreditLinesAndLoans NumberOfTimes90DaysLate \\\n",
+ "0 13.0 0.0 \n",
+ "1 4.0 0.0 \n",
+ "2 2.0 1.0 \n",
+ "3 5.0 0.0 \n",
+ "4 7.0 0.0 \n",
+ "\n",
+ " NumberRealEstateLoansOrLines NumberOfTime60-89DaysPastDueNotWorse \\\n",
+ "0 6.0 0.0 \n",
+ "1 0.0 0.0 \n",
+ "2 0.0 0.0 \n",
+ "3 0.0 0.0 \n",
+ "4 1.0 0.0 \n",
+ "\n",
+ " NumberOfDependents \n",
+ "0 2.0 \n",
+ "1 1.0 \n",
+ "2 0.0 \n",
+ "3 0.0 \n",
+ "4 0.0 "
+ ]
+ },
+ "execution_count": 1,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "import pandas as pd\n",
+ "pd.set_option('display.max_columns', 500)\n",
+ "import zipfile\n",
+ "with zipfile.ZipFile('KaggleCredit2.csv.zip', 'r') as z:\n",
+ " f = z.open('KaggleCredit2.csv')\n",
+ " data = pd.read_csv(f, index_col=0)\n",
+ "data.head()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(112915, 11)"
+ ]
+ },
+ "execution_count": 2,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# 检查数据维度\n",
+ "data.shape"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "SeriousDlqin2yrs 0\n",
+ "RevolvingUtilizationOfUnsecuredLines 0\n",
+ "age 4267\n",
+ "NumberOfTime30-59DaysPastDueNotWorse 0\n",
+ "DebtRatio 0\n",
+ "MonthlyIncome 0\n",
+ "NumberOfOpenCreditLinesAndLoans 0\n",
+ "NumberOfTimes90DaysLate 0\n",
+ "NumberRealEstateLoansOrLines 0\n",
+ "NumberOfTime60-89DaysPastDueNotWorse 0\n",
+ "NumberOfDependents 4267\n",
+ "dtype: int64"
+ ]
+ },
+ "execution_count": 3,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# 查看数据缺失值情况\n",
+ "data.isnull().sum(axis=0)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "/opt/conda/lib/python3.7/site-packages/ipykernel_launcher.py:3: UserWarning: Pandas doesn't allow columns to be created via a new attribute name - see https://pandas.pydata.org/pandas-docs/stable/indexing.html#attribute-access\n",
+ " This is separate from the ipykernel package so we can avoid doing imports until\n"
+ ]
+ }
+ ],
+ "source": [
+ "# 清除缺失值\n",
+ "data.dropna(inplace=True)\n",
+ "data.shapey = data['SeriousDlqin2yrs']\n",
+ "X = data.drop('SeriousDlqin2yrs', axis=1)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0.06742876076872101"
+ ]
+ },
+ "execution_count": 5,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# 取出对应的X和y\n",
+ "y = data['SeriousDlqin2yrs']\n",
+ "X = data.drop('SeriousDlqin2yrs', axis=1)\n",
+ "# 查看平均的欺诈率\n",
+ "y.mean()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### 以下为操作题"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### 1.把数据切分成训练集和测试集"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "from sklearn.model_selection import train_test_split\n",
+ "\n",
+ "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {},
+ "outputs": [
+ {
+ "ename": "ImportError",
+ "evalue": "matplotlib is required for plotting when the default backend \"matplotlib\" is selected.",
+ "output_type": "error",
+ "traceback": [
+ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
+ "\u001b[0;31mImportError\u001b[0m Traceback (most recent call last)",
+ "\u001b[0;32m/tmp/ipykernel_86/4186740417.py\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0my_train\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mvalue_counts\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mplot\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkind\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m'bar'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcolor\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'blue'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'red'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
+ "\u001b[0;32m/opt/conda/lib/python3.7/site-packages/pandas/plotting/_core.py\u001b[0m in \u001b[0;36m__call__\u001b[0;34m(self, *args, **kwargs)\u001b[0m\n\u001b[1;32m 890\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 891\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m__call__\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 892\u001b[0;31m \u001b[0mplot_backend\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0m_get_plot_backend\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpop\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"backend\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 893\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 894\u001b[0m x, y, kind, kwargs = self._get_call_args(\n",
+ "\u001b[0;32m/opt/conda/lib/python3.7/site-packages/pandas/plotting/_core.py\u001b[0m in \u001b[0;36m_get_plot_backend\u001b[0;34m(backend)\u001b[0m\n\u001b[1;32m 1812\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0m_backends\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mbackend\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1813\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1814\u001b[0;31m \u001b[0mmodule\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0m_load_backend\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mbackend\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1815\u001b[0m \u001b[0m_backends\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mbackend\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mmodule\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1816\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mmodule\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;32m/opt/conda/lib/python3.7/site-packages/pandas/plotting/_core.py\u001b[0m in \u001b[0;36m_load_backend\u001b[0;34m(backend)\u001b[0m\n\u001b[1;32m 1755\u001b[0m \u001b[0;34m\"matplotlib is required for plotting when the \"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1756\u001b[0m \u001b[0;34m'default backend \"matplotlib\" is selected.'\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1757\u001b[0;31m ) from None\n\u001b[0m\u001b[1;32m 1758\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mmodule\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1759\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;31mImportError\u001b[0m: matplotlib is required for plotting when the default backend \"matplotlib\" is selected."
+ ]
+ }
+ ],
+ "source": [
+ "y_train.value_counts().plot(kind='bar', color=['blue', 'red'])"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#正负样本不均衡"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0 20246\n",
+ "1 1484\n",
+ "Name: SeriousDlqin2yrs, dtype: int64"
+ ]
+ },
+ "execution_count": 9,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "y_test.value_counts()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "# 通过SeriousDlqin2yrs字段查看正负样本分布情况\n",
+ "# 提示:value_counts\n",
+ "\n",
+ "\n",
+ "# 绘制两种类别的柱状图\n",
+ "# 提示:dataframe可以直接plot(kind='bar')\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### 2.数据预处理之离散化"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "# 请对年龄按照3岁一个区间进行离散化\n",
+ "# 提示:可以先计算出分桶边界,再基于pandas的cut函数进行离散化(分箱、分桶)\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " RevolvingUtilizationOfUnsecuredLines \n",
+ " age \n",
+ " NumberOfTime30-59DaysPastDueNotWorse \n",
+ " DebtRatio \n",
+ " MonthlyIncome \n",
+ " NumberOfOpenCreditLinesAndLoans \n",
+ " NumberOfTimes90DaysLate \n",
+ " NumberRealEstateLoansOrLines \n",
+ " NumberOfTime60-89DaysPastDueNotWorse \n",
+ " NumberOfDependents \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 40266 \n",
+ " 0.052899 \n",
+ " 80.0 \n",
+ " 0.0 \n",
+ " 0.342892 \n",
+ " 5683.0 \n",
+ " 14.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " \n",
+ " \n",
+ " 102291 \n",
+ " 0.314817 \n",
+ " 55.0 \n",
+ " 0.0 \n",
+ " 0.133092 \n",
+ " 11600.0 \n",
+ " 5.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " \n",
+ " \n",
+ " 1310 \n",
+ " 0.000000 \n",
+ " 36.0 \n",
+ " 4.0 \n",
+ " 0.437850 \n",
+ " 6250.0 \n",
+ " 11.0 \n",
+ " 0.0 \n",
+ " 2.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 63327 \n",
+ " 0.261331 \n",
+ " 54.0 \n",
+ " 0.0 \n",
+ " 0.395710 \n",
+ " 5733.0 \n",
+ " 16.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " \n",
+ " \n",
+ " 48272 \n",
+ " 0.029445 \n",
+ " 58.0 \n",
+ " 0.0 \n",
+ " 0.130216 \n",
+ " 13300.0 \n",
+ " 8.0 \n",
+ " 1.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " \n",
+ " \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " \n",
+ " \n",
+ " 57097 \n",
+ " 0.287522 \n",
+ " 30.0 \n",
+ " 0.0 \n",
+ " 0.221714 \n",
+ " 6778.0 \n",
+ " 10.0 \n",
+ " 0.0 \n",
+ " 2.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 79879 \n",
+ " 0.930403 \n",
+ " 38.0 \n",
+ " 0.0 \n",
+ " 0.204423 \n",
+ " 3345.0 \n",
+ " 7.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 2.0 \n",
+ " \n",
+ " \n",
+ " 107765 \n",
+ " 0.019931 \n",
+ " 75.0 \n",
+ " 0.0 \n",
+ " 0.004285 \n",
+ " 10500.0 \n",
+ " 7.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 898 \n",
+ " 0.087649 \n",
+ " 27.0 \n",
+ " 0.0 \n",
+ " 0.009995 \n",
+ " 2200.0 \n",
+ " 2.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 1.0 \n",
+ " \n",
+ " \n",
+ " 16428 \n",
+ " 0.369675 \n",
+ " 55.0 \n",
+ " 0.0 \n",
+ " 0.045960 \n",
+ " 5939.0 \n",
+ " 3.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
86918 rows × 10 columns
\n",
+ "
"
+ ],
+ "text/plain": [
+ " RevolvingUtilizationOfUnsecuredLines age \\\n",
+ "40266 0.052899 80.0 \n",
+ "102291 0.314817 55.0 \n",
+ "1310 0.000000 36.0 \n",
+ "63327 0.261331 54.0 \n",
+ "48272 0.029445 58.0 \n",
+ "... ... ... \n",
+ "57097 0.287522 30.0 \n",
+ "79879 0.930403 38.0 \n",
+ "107765 0.019931 75.0 \n",
+ "898 0.087649 27.0 \n",
+ "16428 0.369675 55.0 \n",
+ "\n",
+ " NumberOfTime30-59DaysPastDueNotWorse DebtRatio MonthlyIncome \\\n",
+ "40266 0.0 0.342892 5683.0 \n",
+ "102291 0.0 0.133092 11600.0 \n",
+ "1310 4.0 0.437850 6250.0 \n",
+ "63327 0.0 0.395710 5733.0 \n",
+ "48272 0.0 0.130216 13300.0 \n",
+ "... ... ... ... \n",
+ "57097 0.0 0.221714 6778.0 \n",
+ "79879 0.0 0.204423 3345.0 \n",
+ "107765 0.0 0.004285 10500.0 \n",
+ "898 0.0 0.009995 2200.0 \n",
+ "16428 0.0 0.045960 5939.0 \n",
+ "\n",
+ " NumberOfOpenCreditLinesAndLoans NumberOfTimes90DaysLate \\\n",
+ "40266 14.0 0.0 \n",
+ "102291 5.0 0.0 \n",
+ "1310 11.0 0.0 \n",
+ "63327 16.0 0.0 \n",
+ "48272 8.0 1.0 \n",
+ "... ... ... \n",
+ "57097 10.0 0.0 \n",
+ "79879 7.0 0.0 \n",
+ "107765 7.0 0.0 \n",
+ "898 2.0 0.0 \n",
+ "16428 3.0 0.0 \n",
+ "\n",
+ " NumberRealEstateLoansOrLines NumberOfTime60-89DaysPastDueNotWorse \\\n",
+ "40266 1.0 0.0 \n",
+ "102291 1.0 0.0 \n",
+ "1310 2.0 1.0 \n",
+ "63327 1.0 0.0 \n",
+ "48272 1.0 0.0 \n",
+ "... ... ... \n",
+ "57097 2.0 0.0 \n",
+ "79879 0.0 0.0 \n",
+ "107765 0.0 0.0 \n",
+ "898 0.0 1.0 \n",
+ "16428 0.0 0.0 \n",
+ "\n",
+ " NumberOfDependents \n",
+ "40266 1.0 \n",
+ "102291 1.0 \n",
+ "1310 0.0 \n",
+ "63327 1.0 \n",
+ "48272 1.0 \n",
+ "... ... \n",
+ "57097 0.0 \n",
+ "79879 2.0 \n",
+ "107765 0.0 \n",
+ "898 1.0 \n",
+ "16428 1.0 \n",
+ "\n",
+ "[86918 rows x 10 columns]"
+ ]
+ },
+ "execution_count": 12,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "X_train"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 13,
+ "metadata": {},
+ "outputs": [
+ {
+ "ename": "TypeError",
+ "evalue": "'>' not supported between instances of 'float' and 'method'",
+ "output_type": "error",
+ "traceback": [
+ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
+ "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)",
+ "\u001b[0;32m/tmp/ipykernel_86/2388182925.py\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mmax\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mlist\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mX_train\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'age'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mmax\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0mX_test\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'age'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mmax\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
+ "\u001b[0;31mTypeError\u001b[0m: '>' not supported between instances of 'float' and 'method'"
+ ]
+ }
+ ],
+ "source": [
+ "max(list([X_train['age'].max,X_test['age'].max()]))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 14,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ ".max of 40266 80.0\n",
+ "102291 55.0\n",
+ "1310 36.0\n",
+ "63327 54.0\n",
+ "48272 58.0\n",
+ " ... \n",
+ "57097 30.0\n",
+ "79879 38.0\n",
+ "107765 75.0\n",
+ "898 27.0\n",
+ "16428 55.0\n",
+ "Name: age, Length: 86918, dtype: float64>"
+ ]
+ },
+ "execution_count": 14,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "X_train['age'].max"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 15,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "101.0"
+ ]
+ },
+ "execution_count": 15,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "X_test['age'].max()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 16,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "edges = list(range(0, int(X_test['age'].max())+4,3))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 17,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "X_train['agegroup'] = pd.cut(X_train['age'], bins=edges,labels=False)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 18,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "X_test['agegroup'] = pd.cut(X_test['age'], bins=edges,labels=False)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### 3.数据预处理之独热向量编码"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 19,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "# 请对上述分箱后的年龄段进行独热向量编码\n",
+ "# 提示:使用pandas的get_dummies完成\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 20,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "one_hot = pd.get_dummies(X_train['agegroup'], prefix='agegroup')\n",
+ "\n",
+ "X_train = pd.concat([X_train, one_hot], axis=1)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 21,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " RevolvingUtilizationOfUnsecuredLines \n",
+ " age \n",
+ " NumberOfTime30-59DaysPastDueNotWorse \n",
+ " DebtRatio \n",
+ " MonthlyIncome \n",
+ " NumberOfOpenCreditLinesAndLoans \n",
+ " NumberOfTimes90DaysLate \n",
+ " NumberRealEstateLoansOrLines \n",
+ " NumberOfTime60-89DaysPastDueNotWorse \n",
+ " NumberOfDependents \n",
+ " agegroup \n",
+ " agegroup_6.0 \n",
+ " agegroup_7.0 \n",
+ " agegroup_8.0 \n",
+ " agegroup_9.0 \n",
+ " agegroup_10.0 \n",
+ " agegroup_11.0 \n",
+ " agegroup_12.0 \n",
+ " agegroup_13.0 \n",
+ " agegroup_14.0 \n",
+ " agegroup_15.0 \n",
+ " agegroup_16.0 \n",
+ " agegroup_17.0 \n",
+ " agegroup_18.0 \n",
+ " agegroup_19.0 \n",
+ " agegroup_20.0 \n",
+ " agegroup_21.0 \n",
+ " agegroup_22.0 \n",
+ " agegroup_23.0 \n",
+ " agegroup_24.0 \n",
+ " agegroup_25.0 \n",
+ " agegroup_26.0 \n",
+ " agegroup_27.0 \n",
+ " agegroup_28.0 \n",
+ " agegroup_29.0 \n",
+ " agegroup_30.0 \n",
+ " agegroup_31.0 \n",
+ " agegroup_32.0 \n",
+ " agegroup_33.0 \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 40266 \n",
+ " 0.052899 \n",
+ " 80.0 \n",
+ " 0.0 \n",
+ " 0.342892 \n",
+ " 5683.0 \n",
+ " 14.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 26.0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 1 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " \n",
+ " \n",
+ " 102291 \n",
+ " 0.314817 \n",
+ " 55.0 \n",
+ " 0.0 \n",
+ " 0.133092 \n",
+ " 11600.0 \n",
+ " 5.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 18.0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 1 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " \n",
+ " \n",
+ " 1310 \n",
+ " 0.000000 \n",
+ " 36.0 \n",
+ " 4.0 \n",
+ " 0.437850 \n",
+ " 6250.0 \n",
+ " 11.0 \n",
+ " 0.0 \n",
+ " 2.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 11.0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 1 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " \n",
+ " \n",
+ " 63327 \n",
+ " 0.261331 \n",
+ " 54.0 \n",
+ " 0.0 \n",
+ " 0.395710 \n",
+ " 5733.0 \n",
+ " 16.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 17.0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 1 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " \n",
+ " \n",
+ " 48272 \n",
+ " 0.029445 \n",
+ " 58.0 \n",
+ " 0.0 \n",
+ " 0.130216 \n",
+ " 13300.0 \n",
+ " 8.0 \n",
+ " 1.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 19.0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 1 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " \n",
+ " \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " \n",
+ " \n",
+ " 57097 \n",
+ " 0.287522 \n",
+ " 30.0 \n",
+ " 0.0 \n",
+ " 0.221714 \n",
+ " 6778.0 \n",
+ " 10.0 \n",
+ " 0.0 \n",
+ " 2.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 9.0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 1 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " \n",
+ " \n",
+ " 79879 \n",
+ " 0.930403 \n",
+ " 38.0 \n",
+ " 0.0 \n",
+ " 0.204423 \n",
+ " 3345.0 \n",
+ " 7.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 2.0 \n",
+ " 12.0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 1 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " \n",
+ " \n",
+ " 107765 \n",
+ " 0.019931 \n",
+ " 75.0 \n",
+ " 0.0 \n",
+ " 0.004285 \n",
+ " 10500.0 \n",
+ " 7.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 24.0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 1 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " \n",
+ " \n",
+ " 898 \n",
+ " 0.087649 \n",
+ " 27.0 \n",
+ " 0.0 \n",
+ " 0.009995 \n",
+ " 2200.0 \n",
+ " 2.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 1.0 \n",
+ " 8.0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 1 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " \n",
+ " \n",
+ " 16428 \n",
+ " 0.369675 \n",
+ " 55.0 \n",
+ " 0.0 \n",
+ " 0.045960 \n",
+ " 5939.0 \n",
+ " 3.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 18.0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 1 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
86918 rows × 39 columns
\n",
+ "
"
+ ],
+ "text/plain": [
+ " RevolvingUtilizationOfUnsecuredLines age \\\n",
+ "40266 0.052899 80.0 \n",
+ "102291 0.314817 55.0 \n",
+ "1310 0.000000 36.0 \n",
+ "63327 0.261331 54.0 \n",
+ "48272 0.029445 58.0 \n",
+ "... ... ... \n",
+ "57097 0.287522 30.0 \n",
+ "79879 0.930403 38.0 \n",
+ "107765 0.019931 75.0 \n",
+ "898 0.087649 27.0 \n",
+ "16428 0.369675 55.0 \n",
+ "\n",
+ " NumberOfTime30-59DaysPastDueNotWorse DebtRatio MonthlyIncome \\\n",
+ "40266 0.0 0.342892 5683.0 \n",
+ "102291 0.0 0.133092 11600.0 \n",
+ "1310 4.0 0.437850 6250.0 \n",
+ "63327 0.0 0.395710 5733.0 \n",
+ "48272 0.0 0.130216 13300.0 \n",
+ "... ... ... ... \n",
+ "57097 0.0 0.221714 6778.0 \n",
+ "79879 0.0 0.204423 3345.0 \n",
+ "107765 0.0 0.004285 10500.0 \n",
+ "898 0.0 0.009995 2200.0 \n",
+ "16428 0.0 0.045960 5939.0 \n",
+ "\n",
+ " NumberOfOpenCreditLinesAndLoans NumberOfTimes90DaysLate \\\n",
+ "40266 14.0 0.0 \n",
+ "102291 5.0 0.0 \n",
+ "1310 11.0 0.0 \n",
+ "63327 16.0 0.0 \n",
+ "48272 8.0 1.0 \n",
+ "... ... ... \n",
+ "57097 10.0 0.0 \n",
+ "79879 7.0 0.0 \n",
+ "107765 7.0 0.0 \n",
+ "898 2.0 0.0 \n",
+ "16428 3.0 0.0 \n",
+ "\n",
+ " NumberRealEstateLoansOrLines NumberOfTime60-89DaysPastDueNotWorse \\\n",
+ "40266 1.0 0.0 \n",
+ "102291 1.0 0.0 \n",
+ "1310 2.0 1.0 \n",
+ "63327 1.0 0.0 \n",
+ "48272 1.0 0.0 \n",
+ "... ... ... \n",
+ "57097 2.0 0.0 \n",
+ "79879 0.0 0.0 \n",
+ "107765 0.0 0.0 \n",
+ "898 0.0 1.0 \n",
+ "16428 0.0 0.0 \n",
+ "\n",
+ " NumberOfDependents agegroup agegroup_6.0 agegroup_7.0 \\\n",
+ "40266 1.0 26.0 0 0 \n",
+ "102291 1.0 18.0 0 0 \n",
+ "1310 0.0 11.0 0 0 \n",
+ "63327 1.0 17.0 0 0 \n",
+ "48272 1.0 19.0 0 0 \n",
+ "... ... ... ... ... \n",
+ "57097 0.0 9.0 0 0 \n",
+ "79879 2.0 12.0 0 0 \n",
+ "107765 0.0 24.0 0 0 \n",
+ "898 1.0 8.0 0 0 \n",
+ "16428 1.0 18.0 0 0 \n",
+ "\n",
+ " agegroup_8.0 agegroup_9.0 agegroup_10.0 agegroup_11.0 \\\n",
+ "40266 0 0 0 0 \n",
+ "102291 0 0 0 0 \n",
+ "1310 0 0 0 1 \n",
+ "63327 0 0 0 0 \n",
+ "48272 0 0 0 0 \n",
+ "... ... ... ... ... \n",
+ "57097 0 1 0 0 \n",
+ "79879 0 0 0 0 \n",
+ "107765 0 0 0 0 \n",
+ "898 1 0 0 0 \n",
+ "16428 0 0 0 0 \n",
+ "\n",
+ " agegroup_12.0 agegroup_13.0 agegroup_14.0 agegroup_15.0 \\\n",
+ "40266 0 0 0 0 \n",
+ "102291 0 0 0 0 \n",
+ "1310 0 0 0 0 \n",
+ "63327 0 0 0 0 \n",
+ "48272 0 0 0 0 \n",
+ "... ... ... ... ... \n",
+ "57097 0 0 0 0 \n",
+ "79879 1 0 0 0 \n",
+ "107765 0 0 0 0 \n",
+ "898 0 0 0 0 \n",
+ "16428 0 0 0 0 \n",
+ "\n",
+ " agegroup_16.0 agegroup_17.0 agegroup_18.0 agegroup_19.0 \\\n",
+ "40266 0 0 0 0 \n",
+ "102291 0 0 1 0 \n",
+ "1310 0 0 0 0 \n",
+ "63327 0 1 0 0 \n",
+ "48272 0 0 0 1 \n",
+ "... ... ... ... ... \n",
+ "57097 0 0 0 0 \n",
+ "79879 0 0 0 0 \n",
+ "107765 0 0 0 0 \n",
+ "898 0 0 0 0 \n",
+ "16428 0 0 1 0 \n",
+ "\n",
+ " agegroup_20.0 agegroup_21.0 agegroup_22.0 agegroup_23.0 \\\n",
+ "40266 0 0 0 0 \n",
+ "102291 0 0 0 0 \n",
+ "1310 0 0 0 0 \n",
+ "63327 0 0 0 0 \n",
+ "48272 0 0 0 0 \n",
+ "... ... ... ... ... \n",
+ "57097 0 0 0 0 \n",
+ "79879 0 0 0 0 \n",
+ "107765 0 0 0 0 \n",
+ "898 0 0 0 0 \n",
+ "16428 0 0 0 0 \n",
+ "\n",
+ " agegroup_24.0 agegroup_25.0 agegroup_26.0 agegroup_27.0 \\\n",
+ "40266 0 0 1 0 \n",
+ "102291 0 0 0 0 \n",
+ "1310 0 0 0 0 \n",
+ "63327 0 0 0 0 \n",
+ "48272 0 0 0 0 \n",
+ "... ... ... ... ... \n",
+ "57097 0 0 0 0 \n",
+ "79879 0 0 0 0 \n",
+ "107765 1 0 0 0 \n",
+ "898 0 0 0 0 \n",
+ "16428 0 0 0 0 \n",
+ "\n",
+ " agegroup_28.0 agegroup_29.0 agegroup_30.0 agegroup_31.0 \\\n",
+ "40266 0 0 0 0 \n",
+ "102291 0 0 0 0 \n",
+ "1310 0 0 0 0 \n",
+ "63327 0 0 0 0 \n",
+ "48272 0 0 0 0 \n",
+ "... ... ... ... ... \n",
+ "57097 0 0 0 0 \n",
+ "79879 0 0 0 0 \n",
+ "107765 0 0 0 0 \n",
+ "898 0 0 0 0 \n",
+ "16428 0 0 0 0 \n",
+ "\n",
+ " agegroup_32.0 agegroup_33.0 \n",
+ "40266 0 0 \n",
+ "102291 0 0 \n",
+ "1310 0 0 \n",
+ "63327 0 0 \n",
+ "48272 0 0 \n",
+ "... ... ... \n",
+ "57097 0 0 \n",
+ "79879 0 0 \n",
+ "107765 0 0 \n",
+ "898 0 0 \n",
+ "16428 0 0 \n",
+ "\n",
+ "[86918 rows x 39 columns]"
+ ]
+ },
+ "execution_count": 21,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "X_train"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 22,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "one_hot = pd.get_dummies(X_test['agegroup'], prefix='agegroup')\n",
+ "\n",
+ "X_test = pd.concat([X_test, one_hot], axis=1)\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### 4.数据预处理之幅度缩放"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 23,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "# 请对连续值特征进行幅度缩放\n",
+ "# 提示:可以使用StandardScaler等幅度缩放器进行处理\n",
+ "from sklearn.preprocessing import StandardScaler\n",
+ "scaler = StandardScaler()\n",
+ "\n",
+ "#连续特征有\n",
+ "features_to_scale = ['RevolvingUtilizationOfUnsecuredLines', 'DebtRatio','MonthlyIncome']\n",
+ "\n",
+ "# 使用 fit_transform 对选择的特征列进行幅度缩放\n",
+ "X_train[features_to_scale] = scaler.fit_transform(X_train[features_to_scale])\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 24,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "X_test[features_to_scale] = scaler.fit_transform(X_test[features_to_scale])"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 25,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "X_train.drop([\"agegroup\",\"age\"], axis=1, inplace=True)\n",
+ "X_test.drop([\"agegroup\",\"age\"], axis=1, inplace=True)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 26,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(86918, 37)"
+ ]
+ },
+ "execution_count": 26,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "X_train.shape"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 27,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(21730, 37)"
+ ]
+ },
+ "execution_count": 27,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "X_test.shape"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### 5.使用logistic regression建模,并且输出一下系数,分析重要度。 "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 124,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "/usr/local/lib64/python3.6/site-packages/sklearn/linear_model/_logistic.py:765: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
+ "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
+ "\n",
+ "Increase the number of iterations (max_iter) or scale the data as shown in:\n",
+ " https://scikit-learn.org/stable/modules/preprocessing.html\n",
+ "Please also refer to the documentation for alternative solver options:\n",
+ " https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
+ " extra_warning_msg=_LOGISTIC_SOLVER_CONVERGENCE_MSG)\n"
+ ]
+ },
+ {
+ "data": {
+ "text/plain": [
+ "LogisticRegression()"
+ ]
+ },
+ "execution_count": 124,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# 提示:fit建模,建完模之后可以取出coef属性\n",
+ "from sklearn.linear_model import LogisticRegression\n",
+ "lr = LogisticRegression()\n",
+ "\n",
+ "lr.fit(X_train, y_train)\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 126,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "array([-0.0156334 , 0.467622 , 0.30798023, -0.07341862, -0.02302149,\n",
+ " 0.44146934, -0.19549518, -0.87026371, 0.09492283, -0.04851426,\n",
+ " 0.51266316, 0.69370554, 0.51137 , 0.39410818, 0.40821275,\n",
+ " 0.26042991, 0.28325902, 0.12604619, 0.21490908, 0.13932627,\n",
+ " 0.16228096, -0.03894009, -0.13400387, -0.24467298, -0.60754337,\n",
+ " -0.82244628, -0.69520512, -0.77884549, -0.794883 , -0.76060993,\n",
+ " -0.42927367, -0.4495571 , -0.33327799, -0.09117529, -0.05859908,\n",
+ " 0.00995178, -0.00305238])"
+ ]
+ },
+ "execution_count": 126,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "#取出特征的coef\n",
+ "coefficients = lr.coef_\n",
+ "coefficients[0]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 127,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " feature \n",
+ " coef \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 11 \n",
+ " agegroup_8.0 \n",
+ " 0.693706 \n",
+ " \n",
+ " \n",
+ " 10 \n",
+ " agegroup_7.0 \n",
+ " 0.512663 \n",
+ " \n",
+ " \n",
+ " 12 \n",
+ " agegroup_9.0 \n",
+ " 0.511370 \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " NumberOfTime30-59DaysPastDueNotWorse \n",
+ " 0.467622 \n",
+ " \n",
+ " \n",
+ " 5 \n",
+ " NumberOfTimes90DaysLate \n",
+ " 0.441469 \n",
+ " \n",
+ " \n",
+ " 14 \n",
+ " agegroup_11.0 \n",
+ " 0.408213 \n",
+ " \n",
+ " \n",
+ " 13 \n",
+ " agegroup_10.0 \n",
+ " 0.394108 \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " DebtRatio \n",
+ " 0.307980 \n",
+ " \n",
+ " \n",
+ " 16 \n",
+ " agegroup_13.0 \n",
+ " 0.283259 \n",
+ " \n",
+ " \n",
+ " 15 \n",
+ " agegroup_12.0 \n",
+ " 0.260430 \n",
+ " \n",
+ " \n",
+ " 18 \n",
+ " agegroup_15.0 \n",
+ " 0.214909 \n",
+ " \n",
+ " \n",
+ " 20 \n",
+ " agegroup_17.0 \n",
+ " 0.162281 \n",
+ " \n",
+ " \n",
+ " 19 \n",
+ " agegroup_16.0 \n",
+ " 0.139326 \n",
+ " \n",
+ " \n",
+ " 17 \n",
+ " agegroup_14.0 \n",
+ " 0.126046 \n",
+ " \n",
+ " \n",
+ " 8 \n",
+ " NumberOfDependents \n",
+ " 0.094923 \n",
+ " \n",
+ " \n",
+ " 35 \n",
+ " agegroup_32.0 \n",
+ " 0.009952 \n",
+ " \n",
+ " \n",
+ " 36 \n",
+ " agegroup_33.0 \n",
+ " -0.003052 \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " RevolvingUtilizationOfUnsecuredLines \n",
+ " -0.015633 \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " NumberOfOpenCreditLinesAndLoans \n",
+ " -0.023021 \n",
+ " \n",
+ " \n",
+ " 21 \n",
+ " agegroup_18.0 \n",
+ " -0.038940 \n",
+ " \n",
+ " \n",
+ " 9 \n",
+ " agegroup_6.0 \n",
+ " -0.048514 \n",
+ " \n",
+ " \n",
+ " 34 \n",
+ " agegroup_31.0 \n",
+ " -0.058599 \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " MonthlyIncome \n",
+ " -0.073419 \n",
+ " \n",
+ " \n",
+ " 33 \n",
+ " agegroup_30.0 \n",
+ " -0.091175 \n",
+ " \n",
+ " \n",
+ " 22 \n",
+ " agegroup_19.0 \n",
+ " -0.134004 \n",
+ " \n",
+ " \n",
+ " 6 \n",
+ " NumberRealEstateLoansOrLines \n",
+ " -0.195495 \n",
+ " \n",
+ " \n",
+ " 23 \n",
+ " agegroup_20.0 \n",
+ " -0.244673 \n",
+ " \n",
+ " \n",
+ " 32 \n",
+ " agegroup_29.0 \n",
+ " -0.333278 \n",
+ " \n",
+ " \n",
+ " 30 \n",
+ " agegroup_27.0 \n",
+ " -0.429274 \n",
+ " \n",
+ " \n",
+ " 31 \n",
+ " agegroup_28.0 \n",
+ " -0.449557 \n",
+ " \n",
+ " \n",
+ " 24 \n",
+ " agegroup_21.0 \n",
+ " -0.607543 \n",
+ " \n",
+ " \n",
+ " 26 \n",
+ " agegroup_23.0 \n",
+ " -0.695205 \n",
+ " \n",
+ " \n",
+ " 29 \n",
+ " agegroup_26.0 \n",
+ " -0.760610 \n",
+ " \n",
+ " \n",
+ " 27 \n",
+ " agegroup_24.0 \n",
+ " -0.778845 \n",
+ " \n",
+ " \n",
+ " 28 \n",
+ " agegroup_25.0 \n",
+ " -0.794883 \n",
+ " \n",
+ " \n",
+ " 25 \n",
+ " agegroup_22.0 \n",
+ " -0.822446 \n",
+ " \n",
+ " \n",
+ " 7 \n",
+ " NumberOfTime60-89DaysPastDueNotWorse \n",
+ " -0.870264 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " feature coef\n",
+ "11 agegroup_8.0 0.693706\n",
+ "10 agegroup_7.0 0.512663\n",
+ "12 agegroup_9.0 0.511370\n",
+ "1 NumberOfTime30-59DaysPastDueNotWorse 0.467622\n",
+ "5 NumberOfTimes90DaysLate 0.441469\n",
+ "14 agegroup_11.0 0.408213\n",
+ "13 agegroup_10.0 0.394108\n",
+ "2 DebtRatio 0.307980\n",
+ "16 agegroup_13.0 0.283259\n",
+ "15 agegroup_12.0 0.260430\n",
+ "18 agegroup_15.0 0.214909\n",
+ "20 agegroup_17.0 0.162281\n",
+ "19 agegroup_16.0 0.139326\n",
+ "17 agegroup_14.0 0.126046\n",
+ "8 NumberOfDependents 0.094923\n",
+ "35 agegroup_32.0 0.009952\n",
+ "36 agegroup_33.0 -0.003052\n",
+ "0 RevolvingUtilizationOfUnsecuredLines -0.015633\n",
+ "4 NumberOfOpenCreditLinesAndLoans -0.023021\n",
+ "21 agegroup_18.0 -0.038940\n",
+ "9 agegroup_6.0 -0.048514\n",
+ "34 agegroup_31.0 -0.058599\n",
+ "3 MonthlyIncome -0.073419\n",
+ "33 agegroup_30.0 -0.091175\n",
+ "22 agegroup_19.0 -0.134004\n",
+ "6 NumberRealEstateLoansOrLines -0.195495\n",
+ "23 agegroup_20.0 -0.244673\n",
+ "32 agegroup_29.0 -0.333278\n",
+ "30 agegroup_27.0 -0.429274\n",
+ "31 agegroup_28.0 -0.449557\n",
+ "24 agegroup_21.0 -0.607543\n",
+ "26 agegroup_23.0 -0.695205\n",
+ "29 agegroup_26.0 -0.760610\n",
+ "27 agegroup_24.0 -0.778845\n",
+ "28 agegroup_25.0 -0.794883\n",
+ "25 agegroup_22.0 -0.822446\n",
+ "7 NumberOfTime60-89DaysPastDueNotWorse -0.870264"
+ ]
+ },
+ "execution_count": 127,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "feaures_names = X_train.columns\n",
+ "\n",
+ "data = list(zip(feaures_names, coefficients[0]))\n",
+ "\n",
+ "fea_importance = pd.DataFrame(data,columns = ['feature','coef'])\n",
+ "\n",
+ "fea_importance.sort_values(by='coef', ascending=False)\n",
+ "#将特征按coef排序"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### 6.使用网格搜索交叉验证进行调参\n",
+ "调整penalty和C参数,其中penalty候选为\"l1\"和\"l2\",C的候选为[1,10,100,500]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 133,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "/usr/local/lib64/python3.6/site-packages/sklearn/linear_model/_logistic.py:765: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
+ "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
+ "\n",
+ "Increase the number of iterations (max_iter) or scale the data as shown in:\n",
+ " https://scikit-learn.org/stable/modules/preprocessing.html\n",
+ "Please also refer to the documentation for alternative solver options:\n",
+ " https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
+ " extra_warning_msg=_LOGISTIC_SOLVER_CONVERGENCE_MSG)\n"
+ ]
+ },
+ {
+ "data": {
+ "text/plain": [
+ "GridSearchCV(estimator=LogisticRegression(max_iter=1000, solver='liblinear'),\n",
+ " param_grid={'C': [1, 10, 100, 500], 'penalty': ['l1', 'l2']})"
+ ]
+ },
+ "execution_count": 133,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# 提示:先按照上面要求准备好网格字典,再使用GridSearchCV进行调参\n",
+ "from sklearn.metrics import accuracy_score\n",
+ "from sklearn.model_selection import GridSearchCV\n",
+ "lr = LogisticRegression()\n",
+ "\n",
+ "lr.fit(X_train, y_train)\n",
+ "\n",
+ "model = LogisticRegression(solver='liblinear',max_iter=1000)#未收敛,增加迭代次数.\n",
+ "##ValueError: Solver lbfgs supports only 'l2' or 'none' penalties, got l1 penalty.因此换为liblinera solver\n",
+ "\n",
+ "\n",
+ "# 定义要尝试的超参数组合\n",
+ "param = {'C': [1,10,100,500], 'penalty': ['l1', 'l2']}\n",
+ "\n",
+ "# 创建 GridSearchCV 对象\n",
+ "gsc_lr = GridSearchCV(estimator=model, param_grid=param)\n",
+ "\n",
+ "# 在训练集上拟合模型\n",
+ "gsc_lr.fit(X_train, y_train)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 137,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [],
+ "source": [
+ "# 输出最好的超参数\n",
+ "# 输出最好的模型\n",
+ "\n",
+ "best_params = gsc_lr.best_params_\n",
+ "\n",
+ "best_model = gsc_lr.best_estimator_\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### 7.在测试集上进行预测,计算 查准率/查全率/auc/混淆矩阵/f1值 等测试指标"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 35,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [],
+ "source": [
+ "# 提示:在测试集上预测可以使用predict\n",
+ "# 提示:各种指标可以在sklearn.metrics中查到各种评估指标,分别是accuracy_score、recall_score、auc、confusion_matrix、f1_score\n",
+ "from sklearn.metrics import accuracy_score,recall_score,auc,roc_curve,confusion_matrix,f1_score\n",
+ "\n",
+ "\n",
+ "# 在测试集上进行预测\n",
+ "y_pred = best_model.predict(X_test)\n",
+ "\n",
+ "# 评估模型性能\n",
+ "accuracy = accuracy_score(y_test, y_pred)\n",
+ "recall_score = recall_score(y_test, y_pred)\n",
+ "\n",
+ "confusion_matrix = confusion_matrix(y_test, y_pred)\n",
+ "f1_score = f1_score(y_test, y_pred)\n",
+ "\n",
+ "\n",
+ "\n",
+ "fpr, tpr, thresholds = roc_curve(y_test, y_pred)\n",
+ "\n",
+ "# 计算曲线下面积(AUC)\n",
+ "roc_auc = auc(fpr, tpr)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 147,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(0.933087896916705,\n",
+ " 0.04514824797843666,\n",
+ " array([[20209, 37],\n",
+ " [ 1417, 67]]),\n",
+ " 0.08438287153652393,\n",
+ " 0.5216603632463556)"
+ ]
+ },
+ "execution_count": 147,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "accuracy,recall_score,confusion_matrix,f1_score,roc_auc"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### 8.更多优化\n",
+ "银行通常会有更严格的要求,因为欺诈带来的后果通常比较严重,一般我们会调整模型的标准。 \n",
+ "\n",
+ "比如在logistic regression当中,一般我们的概率判定边界为0.5,但是我们可以把阈值设定低一些,来提高模型的“敏感度” \n",
+ "试试看把阈值设定为0.3,再看看这个时候的混淆矩阵等评估指标。"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 36,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def cal_eval(y_test,y_pred):\n",
+ " accuracy = accuracy_score(y_test, y_pred)\n",
+ " recall = recall_score(y_test, y_pred)\n",
+ " co_matrix = confusion_matrix(y_test, y_pred)\n",
+ " f1 = f1_score(y_test, y_pred)\n",
+ "\n",
+ "\n",
+ "\n",
+ " fpr, tpr, thresholds = roc_curve(y_test, y_pred)\n",
+ "\n",
+ " # 计算曲线下面积(AUC)\n",
+ " roc_auc = auc(fpr, tpr)\n",
+ " \n",
+ " print(f\"acc:{accuracy},recall:{recall},f1:{f1},roc:{roc_auc}\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 178,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [],
+ "source": [
+ "# 提示:thresholds = [0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9]\n",
+ "# 根据predict_proba的结果和threshold的比较确定结果,再评估各种结果指标\n",
+ "lr_pred_new = []\n",
+ "lr_pred_prob = best_model.predict_proba(X_test)\n",
+ "\n",
+ "for i in lr_pred_prob:\n",
+ " if i[0]>=0.3:\n",
+ " lr_pred_new.append(0)\n",
+ " else:\n",
+ " lr_pred_new.append(1)\n",
+ "\n",
+ "lr_pred_new = np.array(lr_pred_new)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 183,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "acc:0.9321675103543489,recall:0.02021563342318059,f1:0.03911342894393741,roc:0.5096138919857185\n"
+ ]
+ }
+ ],
+ "source": [
+ "from sklearn.metrics import accuracy_score,recall_score,auc,roc_curve,confusion_matrix,f1_score\n",
+ "cal_eval(y_test,lr_pred_new)##0.3"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 184,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "acc:0.8534744592728947,recall:0.4359838274932615,f1:0.28896828941491737,roc:0.6600298471655777\n"
+ ]
+ }
+ ],
+ "source": [
+ "lr_pred_new = []\n",
+ "lr_pred_prob = best_model.predict_proba(X_test)\n",
+ "\n",
+ "for i in lr_pred_prob:\n",
+ " if i[0]>=0.9:\n",
+ " lr_pred_new.append(0)\n",
+ " else:\n",
+ " lr_pred_new.append(1)\n",
+ "\n",
+ "lr_pred_new = np.array(lr_pred_new)\n",
+ "cal_eval(y_test,lr_pred_new)##0.9"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### 9.尝试对不同特征的重要度进行排序,通过特征选择的方式,对特征进行筛选。并重新建模,观察此时的模型准确率等评估指标。"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 185,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [],
+ "source": [
+ "# 可以根据逻辑回归的系数绝对值大小进行排序,也可以基于树模型的特征重要度进行排序\n",
+ "# 特征选择可以使用RFE或者selectFromModel\n",
+ "\n",
+ "coef_abs = np.abs(best_model.coef_[0])\n",
+ "\n",
+ "# 获取排序后的索引\n",
+ "sorted_indices = np.argsort(coef_abs)[::-1]\n",
+ "\n",
+ "# 获取排序后的特征名和对应的系数\n",
+ "sorted_features = X_train.columns[sorted_indices]\n",
+ "sorted_coefficients = best_model.coef_[0][sorted_indices]\n",
+ "\n",
+ "# 创建 DataFrame 显示排序后的结果\n",
+ "sorted_df = pd.DataFrame({'Feature': sorted_features, 'Coefficient': sorted_coefficients})\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 191,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " Feature \n",
+ " Coefficient \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " agegroup_29.0 \n",
+ " -0.941463 \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " NumberOfTime60-89DaysPastDueNotWorse \n",
+ " -0.861895 \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " agegroup_28.0 \n",
+ " -0.835204 \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " agegroup_26.0 \n",
+ " -0.800776 \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " agegroup_8.0 \n",
+ " 0.755337 \n",
+ " \n",
+ " \n",
+ " 5 \n",
+ " agegroup_31.0 \n",
+ " -0.733585 \n",
+ " \n",
+ " \n",
+ " 6 \n",
+ " agegroup_25.0 \n",
+ " -0.707014 \n",
+ " \n",
+ " \n",
+ " 7 \n",
+ " agegroup_22.0 \n",
+ " -0.631208 \n",
+ " \n",
+ " \n",
+ " 8 \n",
+ " agegroup_24.0 \n",
+ " -0.627627 \n",
+ " \n",
+ " \n",
+ " 9 \n",
+ " agegroup_9.0 \n",
+ " 0.620013 \n",
+ " \n",
+ " \n",
+ " 10 \n",
+ " agegroup_7.0 \n",
+ " 0.578749 \n",
+ " \n",
+ " \n",
+ " 11 \n",
+ " agegroup_23.0 \n",
+ " -0.512713 \n",
+ " \n",
+ " \n",
+ " 12 \n",
+ " agegroup_27.0 \n",
+ " -0.504184 \n",
+ " \n",
+ " \n",
+ " 13 \n",
+ " agegroup_11.0 \n",
+ " 0.503420 \n",
+ " \n",
+ " \n",
+ " 14 \n",
+ " agegroup_10.0 \n",
+ " 0.492589 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " Feature Coefficient\n",
+ "0 agegroup_29.0 -0.941463\n",
+ "1 NumberOfTime60-89DaysPastDueNotWorse -0.861895\n",
+ "2 agegroup_28.0 -0.835204\n",
+ "3 agegroup_26.0 -0.800776\n",
+ "4 agegroup_8.0 0.755337\n",
+ "5 agegroup_31.0 -0.733585\n",
+ "6 agegroup_25.0 -0.707014\n",
+ "7 agegroup_22.0 -0.631208\n",
+ "8 agegroup_24.0 -0.627627\n",
+ "9 agegroup_9.0 0.620013\n",
+ "10 agegroup_7.0 0.578749\n",
+ "11 agegroup_23.0 -0.512713\n",
+ "12 agegroup_27.0 -0.504184\n",
+ "13 agegroup_11.0 0.503420\n",
+ "14 agegroup_10.0 0.492589"
+ ]
+ },
+ "execution_count": 191,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "sorted_df.head(15)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 193,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "new_feature = sorted_features[0:15]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 203,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "GridSearchCV(estimator=LogisticRegression(max_iter=1000, solver='liblinear'),\n",
+ " param_grid={'C': [1, 10, 100, 500], 'penalty': ['l1', 'l2']})"
+ ]
+ },
+ "execution_count": 203,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "\n",
+ "##只使用排名前15的特征\n",
+ "\n",
+ "model = LogisticRegression(solver='liblinear',max_iter=1000)#未收敛,增加迭代次数.\n",
+ "##ValueError: Solver lbfgs supports only 'l2' or 'none' penalties, got l1 penalty.因此换为liblinera solver\n",
+ "\n",
+ "\n",
+ "# 定义要尝试的超参数组合\n",
+ "param = {'C': [1,10,100,500], 'penalty': ['l1', 'l2']}\n",
+ "\n",
+ "# 创建 GridSearchCV 对象\n",
+ "gsc_lr = GridSearchCV(estimator=model, param_grid=param)\n",
+ "\n",
+ "# 在训练集上拟合模型\n",
+ "gsc_lr.fit(X_train[new_feature], y_train)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 204,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "best_params = gsc_lr.best_params_\n",
+ "\n",
+ "best_model = gsc_lr.best_estimator_"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 205,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "({'C': 1, 'penalty': 'l1'},\n",
+ " LogisticRegression(C=1, max_iter=1000, penalty='l1', solver='liblinear'))"
+ ]
+ },
+ "execution_count": 205,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "best_params,best_model"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 206,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "y_pred = best_model.predict(X_test[new_feature])"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 207,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "X_test = X_test.rename(columns=dict(zip(X_test.columns, X_train.columns)))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 208,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "y_pred = best_model.predict(X_test[new_feature])"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 209,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "array([0, 0, 0, ..., 0, 0, 0])"
+ ]
+ },
+ "execution_count": 209,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "y_pred"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 210,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "acc:0.9320294523699953,recall:0.012129380053908356,f1:0.02379378717779246,roc:0.5057930314277248\n"
+ ]
+ }
+ ],
+ "source": [
+ "#只用前15个特征的结果\n",
+ "cal_eval(y_test,y_pred)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#特征全用时,分别为。(0.933087896916705,0.04514824797843666,0.08438287153652393,0.5216603632463556)\n",
+ "#特征用少时,有些大部分指标反而上升了。"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### 10.其他模型算法尝试\n",
+ "使用RandomForestClassifier/SVM/KNN等sklearn分类算法进行分类,尝试上述超参数调优算法过程。"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [],
+ "source": [
+ "# 随机森林\n",
+ "from sklearn.ensemble import RandomForestClassifier\n",
+ "# 支持向量机\n",
+ "from sklearn.svm import SVC\n",
+ "# K最近邻\n",
+ "from sklearn.neighbors import KNeighborsClassifier\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 随机森林"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 211,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from sklearn.ensemble import RandomForestClassifier"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 222,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "GridSearchCV(estimator=RandomForestClassifier(random_state=42),\n",
+ " param_grid={'max_depth': [1, 2, 3], 'n_estimators': [50, 100]})"
+ ]
+ },
+ "execution_count": 222,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# 定义要尝试的超参数组合,树模型的主要参数有树深。\n",
+ "param = {'max_depth':[1,2,3],'n_estimators':[50,100]}\n",
+ "\n",
+ "\n",
+ "rf = RandomForestClassifier(random_state = 42)\n",
+ "# 创建 GridSearchCV 对象\n",
+ "gsc_rf = GridSearchCV(estimator=rf, param_grid=param)\n",
+ "\n",
+ "# 在训练集上拟合模型\n",
+ "gsc_rf.fit(X_train, y_train)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 225,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "acc:0.9317073170731708,recall:0.0,f1:0.0,roc:0.5\n"
+ ]
+ }
+ ],
+ "source": [
+ "best_params = gsc_rf.best_params_\n",
+ "\n",
+ "best_model = gsc_rf.best_estimator_\n",
+ "\n",
+ "y_pred = best_model.predict(X_test)\n",
+ "\n",
+ "cal_eval(y_test,y_pred)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 226,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "feature_importance = best_model.feature_importances_"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 227,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "array([0.12, 0.12, 0.06, 0.04, 0.06, 0.22, 0.04, 0.12, 0. , 0. , 0. ,\n",
+ " 0.02, 0.04, 0.02, 0.04, 0. , 0. , 0. , 0. , 0. , 0. , 0. ,\n",
+ " 0. , 0. , 0.02, 0.04, 0.04, 0. , 0. , 0. , 0. , 0. , 0. ,\n",
+ " 0. , 0. , 0. , 0. ])"
+ ]
+ },
+ "execution_count": 227,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "feature_importance"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 229,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "feature_importance_df = pd.DataFrame({'Feature': X_train.columns, 'Importance': feature_importance})"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 234,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "new_feature = feature_importance_df.sort_values(by='Importance', ascending=False)[\"Feature\"][0:10]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### 使用部分特征"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 235,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "GridSearchCV(estimator=RandomForestClassifier(random_state=42),\n",
+ " param_grid={'max_depth': [1, 2, 3], 'n_estimators': [50, 100]})"
+ ]
+ },
+ "execution_count": 235,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# 定义要尝试的超参数组合,树模型的主要参数有树深。\n",
+ "param = {'max_depth':[1,2,3],'n_estimators':[50,100]}\n",
+ "\n",
+ "\n",
+ "rf = RandomForestClassifier(random_state = 42)\n",
+ "# 创建 GridSearchCV 对象\n",
+ "gsc_rf = GridSearchCV(estimator=rf, param_grid=param)\n",
+ "\n",
+ "# 在训练集上拟合模型\n",
+ "gsc_rf.fit(X_train[new_feature], y_train)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 237,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "acc:0.9331339162448228,recall:0.0431266846361186,f1:0.08096141682479444,roc:0.5207483665203709\n"
+ ]
+ }
+ ],
+ "source": [
+ "best_params = gsc_rf.best_params_\n",
+ "\n",
+ "best_model = gsc_rf.best_estimator_\n",
+ "\n",
+ "y_pred = best_model.predict(X_test[new_feature])\n",
+ "\n",
+ "cal_eval(y_test,y_pred)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 238,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "##指标均有提升"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## SVM"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 29,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from sklearn.svm import SVC\n",
+ "from sklearn.model_selection import GridSearchCV"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 38,
+ "metadata": {
+ "scrolled": true
+ },
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "/usr/local/lib64/python3.6/site-packages/sklearn/svm/_base.py:258: ConvergenceWarning: Solver terminated early (max_iter=100). Consider pre-processing your data with StandardScaler or MinMaxScaler.\n",
+ " % self.max_iter, ConvergenceWarning)\n",
+ "/usr/local/lib64/python3.6/site-packages/sklearn/svm/_base.py:258: ConvergenceWarning: Solver terminated early (max_iter=100). Consider pre-processing your data with StandardScaler or MinMaxScaler.\n",
+ " % self.max_iter, ConvergenceWarning)\n",
+ "/usr/local/lib64/python3.6/site-packages/sklearn/svm/_base.py:258: ConvergenceWarning: Solver terminated early (max_iter=100). Consider pre-processing your data with StandardScaler or MinMaxScaler.\n",
+ " % self.max_iter, ConvergenceWarning)\n",
+ "/usr/local/lib64/python3.6/site-packages/sklearn/svm/_base.py:258: ConvergenceWarning: Solver terminated early (max_iter=100). Consider pre-processing your data with StandardScaler or MinMaxScaler.\n",
+ " % self.max_iter, ConvergenceWarning)\n",
+ "/usr/local/lib64/python3.6/site-packages/sklearn/svm/_base.py:258: ConvergenceWarning: Solver terminated early (max_iter=100). Consider pre-processing your data with StandardScaler or MinMaxScaler.\n",
+ " % self.max_iter, ConvergenceWarning)\n",
+ "/usr/local/lib64/python3.6/site-packages/sklearn/svm/_base.py:258: ConvergenceWarning: Solver terminated early (max_iter=100). Consider pre-processing your data with StandardScaler or MinMaxScaler.\n",
+ " % self.max_iter, ConvergenceWarning)\n",
+ "/usr/local/lib64/python3.6/site-packages/sklearn/svm/_base.py:258: ConvergenceWarning: Solver terminated early (max_iter=100). Consider pre-processing your data with StandardScaler or MinMaxScaler.\n",
+ " % self.max_iter, ConvergenceWarning)\n",
+ "/usr/local/lib64/python3.6/site-packages/sklearn/svm/_base.py:258: ConvergenceWarning: Solver terminated early (max_iter=100). Consider pre-processing your data with StandardScaler or MinMaxScaler.\n",
+ " % self.max_iter, ConvergenceWarning)\n",
+ "/usr/local/lib64/python3.6/site-packages/sklearn/svm/_base.py:258: ConvergenceWarning: Solver terminated early (max_iter=100). Consider pre-processing your data with StandardScaler or MinMaxScaler.\n",
+ " % self.max_iter, ConvergenceWarning)\n",
+ "/usr/local/lib64/python3.6/site-packages/sklearn/svm/_base.py:258: ConvergenceWarning: Solver terminated early (max_iter=100). Consider pre-processing your data with StandardScaler or MinMaxScaler.\n",
+ " % self.max_iter, ConvergenceWarning)\n",
+ "/usr/local/lib64/python3.6/site-packages/sklearn/svm/_base.py:258: ConvergenceWarning: Solver terminated early (max_iter=100). Consider pre-processing your data with StandardScaler or MinMaxScaler.\n",
+ " % self.max_iter, ConvergenceWarning)\n",
+ "/usr/local/lib64/python3.6/site-packages/sklearn/svm/_base.py:258: ConvergenceWarning: Solver terminated early (max_iter=100). Consider pre-processing your data with StandardScaler or MinMaxScaler.\n",
+ " % self.max_iter, ConvergenceWarning)\n",
+ "/usr/local/lib64/python3.6/site-packages/sklearn/svm/_base.py:258: ConvergenceWarning: Solver terminated early (max_iter=100). Consider pre-processing your data with StandardScaler or MinMaxScaler.\n",
+ " % self.max_iter, ConvergenceWarning)\n",
+ "/usr/local/lib64/python3.6/site-packages/sklearn/svm/_base.py:258: ConvergenceWarning: Solver terminated early (max_iter=100). Consider pre-processing your data with StandardScaler or MinMaxScaler.\n",
+ " % self.max_iter, ConvergenceWarning)\n",
+ "/usr/local/lib64/python3.6/site-packages/sklearn/svm/_base.py:258: ConvergenceWarning: Solver terminated early (max_iter=100). Consider pre-processing your data with StandardScaler or MinMaxScaler.\n",
+ " % self.max_iter, ConvergenceWarning)\n",
+ "/usr/local/lib64/python3.6/site-packages/sklearn/svm/_base.py:258: ConvergenceWarning: Solver terminated early (max_iter=100). Consider pre-processing your data with StandardScaler or MinMaxScaler.\n",
+ " % self.max_iter, ConvergenceWarning)\n",
+ "/usr/local/lib64/python3.6/site-packages/sklearn/svm/_base.py:258: ConvergenceWarning: Solver terminated early (max_iter=100). Consider pre-processing your data with StandardScaler or MinMaxScaler.\n",
+ " % self.max_iter, ConvergenceWarning)\n",
+ "/usr/local/lib64/python3.6/site-packages/sklearn/svm/_base.py:258: ConvergenceWarning: Solver terminated early (max_iter=100). Consider pre-processing your data with StandardScaler or MinMaxScaler.\n",
+ " % self.max_iter, ConvergenceWarning)\n",
+ "/usr/local/lib64/python3.6/site-packages/sklearn/svm/_base.py:258: ConvergenceWarning: Solver terminated early (max_iter=100). Consider pre-processing your data with StandardScaler or MinMaxScaler.\n",
+ " % self.max_iter, ConvergenceWarning)\n",
+ "/usr/local/lib64/python3.6/site-packages/sklearn/svm/_base.py:258: ConvergenceWarning: Solver terminated early (max_iter=100). Consider pre-processing your data with StandardScaler or MinMaxScaler.\n",
+ " % self.max_iter, ConvergenceWarning)\n",
+ "/usr/local/lib64/python3.6/site-packages/sklearn/svm/_base.py:258: ConvergenceWarning: Solver terminated early (max_iter=100). Consider pre-processing your data with StandardScaler or MinMaxScaler.\n",
+ " % self.max_iter, ConvergenceWarning)\n",
+ "/usr/local/lib64/python3.6/site-packages/sklearn/svm/_base.py:258: ConvergenceWarning: Solver terminated early (max_iter=100). Consider pre-processing your data with StandardScaler or MinMaxScaler.\n",
+ " % self.max_iter, ConvergenceWarning)\n",
+ "/usr/local/lib64/python3.6/site-packages/sklearn/svm/_base.py:258: ConvergenceWarning: Solver terminated early (max_iter=100). Consider pre-processing your data with StandardScaler or MinMaxScaler.\n",
+ " % self.max_iter, ConvergenceWarning)\n",
+ "/usr/local/lib64/python3.6/site-packages/sklearn/svm/_base.py:258: ConvergenceWarning: Solver terminated early (max_iter=100). Consider pre-processing your data with StandardScaler or MinMaxScaler.\n",
+ " % self.max_iter, ConvergenceWarning)\n",
+ "/usr/local/lib64/python3.6/site-packages/sklearn/svm/_base.py:258: ConvergenceWarning: Solver terminated early (max_iter=100). Consider pre-processing your data with StandardScaler or MinMaxScaler.\n",
+ " % self.max_iter, ConvergenceWarning)\n",
+ "/usr/local/lib64/python3.6/site-packages/sklearn/svm/_base.py:258: ConvergenceWarning: Solver terminated early (max_iter=100). Consider pre-processing your data with StandardScaler or MinMaxScaler.\n",
+ " % self.max_iter, ConvergenceWarning)\n",
+ "/usr/local/lib64/python3.6/site-packages/sklearn/svm/_base.py:258: ConvergenceWarning: Solver terminated early (max_iter=100). Consider pre-processing your data with StandardScaler or MinMaxScaler.\n",
+ " % self.max_iter, ConvergenceWarning)\n",
+ "/usr/local/lib64/python3.6/site-packages/sklearn/svm/_base.py:258: ConvergenceWarning: Solver terminated early (max_iter=100). Consider pre-processing your data with StandardScaler or MinMaxScaler.\n",
+ " % self.max_iter, ConvergenceWarning)\n",
+ "/usr/local/lib64/python3.6/site-packages/sklearn/svm/_base.py:258: ConvergenceWarning: Solver terminated early (max_iter=100). Consider pre-processing your data with StandardScaler or MinMaxScaler.\n",
+ " % self.max_iter, ConvergenceWarning)\n",
+ "/usr/local/lib64/python3.6/site-packages/sklearn/svm/_base.py:258: ConvergenceWarning: Solver terminated early (max_iter=100). Consider pre-processing your data with StandardScaler or MinMaxScaler.\n",
+ " % self.max_iter, ConvergenceWarning)\n",
+ "/usr/local/lib64/python3.6/site-packages/sklearn/svm/_base.py:258: ConvergenceWarning: Solver terminated early (max_iter=100). Consider pre-processing your data with StandardScaler or MinMaxScaler.\n",
+ " % self.max_iter, ConvergenceWarning)\n"
+ ]
+ },
+ {
+ "data": {
+ "text/plain": [
+ "GridSearchCV(estimator=SVC(max_iter=100, random_state=42),\n",
+ " param_grid={'C': [0.1, 1, 10], 'kernel': ['rbf', 'linear']})"
+ ]
+ },
+ "execution_count": 38,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# 定义要尝试的超参数组合,SVM主要参数有,正则项系数C,核函数,\n",
+ "param = {'C':[0.1, 1, 10],'kernel':['rbf','linear']}\n",
+ "\n",
+ "svc = SVC(random_state=42,max_iter=100)#默认迭代次数太慢了 ,\n",
+ " \n",
+ "# 创建 GridSearchCV 对象\n",
+ "gsc_svm = GridSearchCV(estimator=svc, param_grid=param)\n",
+ "\n",
+ "# 在训练集上拟合模型\n",
+ "gsc_svm.fit(X_train, y_train)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 40,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "acc:0.9320294523699953,recall:0.012803234501347708,f1:0.025082508250825083,roc:0.5061052624151509\n"
+ ]
+ }
+ ],
+ "source": [
+ "from sklearn.metrics import accuracy_score,recall_score,auc,roc_curve,confusion_matrix,f1_score\n",
+ "\n",
+ "best_params = gsc_svm.best_params_\n",
+ "\n",
+ "best_model = gsc_svm.best_estimator_\n",
+ "\n",
+ "y_pred = best_model.predict(X_test)\n",
+ "\n",
+ "cal_eval(y_test,y_pred)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 39,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "{'C': 10, 'kernel': 'linear'}"
+ ]
+ },
+ "execution_count": 39,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "best_params"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## KNN"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 28,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from sklearn.neighbors import KNeighborsClassifier"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 32,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "CPU times: user 12min 37s, sys: 20min 54s, total: 33min 32s\n",
+ "Wall time: 3min 9s\n"
+ ]
+ },
+ {
+ "data": {
+ "text/plain": [
+ "GridSearchCV(estimator=KNeighborsClassifier(n_jobs=25),\n",
+ " param_grid={'n_neighbors': [1, 3]})"
+ ]
+ },
+ "execution_count": 32,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "%%time\n",
+ "# 定义要尝试的超参数组合,KNN主要参数为K的值,\n",
+ "param = {'n_neighbors':[1,3]}\n",
+ "\n",
+ "knn = KNeighborsClassifier(n_jobs=25,leaf_size=30)\n",
+ " \n",
+ "# 创建 GridSearchCV 对象\n",
+ "gsc_knn = GridSearchCV(estimator=knn, param_grid=param)\n",
+ "\n",
+ "# 在训练集上拟合模型\n",
+ "gsc_knn.fit(X_train, y_train)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#KNN算的极其慢"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 34,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "X_test = X_test.rename(columns=dict(zip(X_test.columns, X_train.columns)))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 37,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "acc:0.9267832489645651,recall:0.15296495956873316,f1:0.22200488997555015,roc:0.5682339368623079\n"
+ ]
+ }
+ ],
+ "source": [
+ "from sklearn.metrics import accuracy_score,recall_score,auc,roc_curve,confusion_matrix,f1_score\n",
+ "\n",
+ "best_params = gsc_knn.best_params_\n",
+ "\n",
+ "best_model = gsc_knn.best_estimator_\n",
+ "\n",
+ "y_pred = best_model.predict(X_test)\n",
+ "\n",
+ "cal_eval(y_test,y_pred)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3 (ipykernel)",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.7.11"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/2023/homework/TinglanWang/homework_matplotlib.ipynb b/2023/homework/TinglanWang/homework_matplotlib.ipynb
new file mode 100644
index 00000000..00caa1dc
--- /dev/null
+++ b/2023/homework/TinglanWang/homework_matplotlib.ipynb
@@ -0,0 +1,2987 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "b8753dd8-7a83-4584-9360-954a9512007f",
+ "metadata": {},
+ "source": [
+ "# 数据可视化作业题目"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "id": "0de668c2",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ ""
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "from IPython.display import display, HTML\n",
+ "display(HTML(''))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "id": "bda9f437-af21-4bc4-beb9-59034544e317",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "import matplotlib.pyplot as plt\n",
+ "import seaborn as sns\n",
+ "import numpy as np"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "67d732a4-44e5-4313-bf75-0b1b1d036aee",
+ "metadata": {},
+ "source": [
+ "## 练习1:航班乘客变化分析 (2个题)\n",
+ "\n",
+ "1. 分析年度乘客总量的变化情况(提示:折线图)\n",
+ "2. 分析乘客量在一年中12个月份的分布(提示:柱状图)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "id": "fbecfad8-61bd-483d-a1cc-e6cb6b69188a",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " year \n",
+ " month \n",
+ " passengers \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 1949 \n",
+ " Jan \n",
+ " 112 \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 1949 \n",
+ " Feb \n",
+ " 118 \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 1949 \n",
+ " Mar \n",
+ " 132 \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 1949 \n",
+ " Apr \n",
+ " 129 \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " 1949 \n",
+ " May \n",
+ " 121 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " year month passengers\n",
+ "0 1949 Jan 112\n",
+ "1 1949 Feb 118\n",
+ "2 1949 Mar 132\n",
+ "3 1949 Apr 129\n",
+ "4 1949 May 121"
+ ]
+ },
+ "execution_count": 2,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "data = sns.load_dataset(\"flights\")\n",
+ "data.head()\n",
+ "# 年份,月份,乘客数"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "id": "60d8e4c9",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "result_eachyear = data.groupby('year')['passengers'].sum().reset_index()\n",
+ "\n",
+ "result_eachmonth = data.groupby('month')['passengers'].sum().reset_index()\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 38,
+ "id": "35fcea0c",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(18, 7))\n",
+ "\n",
+ "#year\n",
+ "axes[0].plot(result_eachyear['year'], result_eachyear['passengers'])\n",
+ "\n",
+ "axes[0].set_xticks(result_eachyear['year'],fontsize=1)\n",
+ "\n",
+ "axes[0].set_xlabel('year')\n",
+ "axes[0].set_ylabel('number')\n",
+ "axes[0].set_title('the number of passengers in each year')\n",
+ "\n",
+ "# month\n",
+ "axes[1].bar(np.array(result_eachmonth.index.tolist())+1, result_eachmonth['passengers'])\n",
+ "axes[1].set_xticks(np.array(result_eachmonth.index.tolist())+1)\n",
+ "\n",
+ "axes[1].set_title('the number of passengers in each month')\n",
+ "axes[1].set_xlabel('month')\n",
+ "axes[1].set_ylabel('number')\n",
+ "plt.show()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "id": "341a4d22-025f-486a-8e82-954fae22b177",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "5e30df57-a5ac-41da-b1a1-b42b1baae88b",
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "markdown",
+ "id": "66bbde41-e2c6-4bb6-bc80-5c5f3dd75bf6",
+ "metadata": {
+ "tags": []
+ },
+ "source": [
+ "## 练习2:鸢尾花花型尺寸分析 (3个题)\n",
+ "\n",
+ "1. 萼片(sepal)和花瓣(petal)的大小关系(提示:散点图)\n",
+ "2. 不同种类(species)鸢尾花萼片和花瓣的大小关系(提示:箱图或者提琴图)\n",
+ "3. 不同种类鸢尾花萼片和花瓣大小的分布情况(六角箱图或者核密度估计)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 39,
+ "id": "b71328f2-0ae3-463a-a557-43506a54a8a4",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " sepal_length \n",
+ " sepal_width \n",
+ " petal_length \n",
+ " petal_width \n",
+ " species \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 5.1 \n",
+ " 3.5 \n",
+ " 1.4 \n",
+ " 0.2 \n",
+ " setosa \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 4.9 \n",
+ " 3.0 \n",
+ " 1.4 \n",
+ " 0.2 \n",
+ " setosa \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 4.7 \n",
+ " 3.2 \n",
+ " 1.3 \n",
+ " 0.2 \n",
+ " setosa \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 4.6 \n",
+ " 3.1 \n",
+ " 1.5 \n",
+ " 0.2 \n",
+ " setosa \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " 5.0 \n",
+ " 3.6 \n",
+ " 1.4 \n",
+ " 0.2 \n",
+ " setosa \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " sepal_length sepal_width petal_length petal_width species\n",
+ "0 5.1 3.5 1.4 0.2 setosa\n",
+ "1 4.9 3.0 1.4 0.2 setosa\n",
+ "2 4.7 3.2 1.3 0.2 setosa\n",
+ "3 4.6 3.1 1.5 0.2 setosa\n",
+ "4 5.0 3.6 1.4 0.2 setosa"
+ ]
+ },
+ "execution_count": 39,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "data = sns.load_dataset(\"iris\")\n",
+ "data.head()\n",
+ "# 萼片长度,萼片宽度,花瓣长度,花瓣宽度,种类"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 53,
+ "id": "1d89c7f7",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "sepal_length 5.5\n",
+ "sepal_width 2.6\n",
+ "petal_length 4.4\n",
+ "petal_width 1.2\n",
+ "species versicolor\n",
+ "Name: 90, dtype: object"
+ ]
+ },
+ "execution_count": 53,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "data.iloc[90]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 47,
+ "id": "27a094ec",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(15, 15))\n",
+ "\n",
+ "def subgraph(x,y,x_name,y_name,axes,data):\n",
+ " axes[x][y].scatter(data[x_name],data[y_name])\n",
+ " axes[x][y].set_xlabel(x_name[0]+x_name[5]+x_name[6])\n",
+ " axes[x][y].set_ylabel(y_name[0]+y_name[5]+y_name[6])\n",
+ " axes[x][y].set_title(x_name[0]+x_name[5]+x_name[6]+'VS.'+y_name[0]+y_name[5]+y_name[6])\n",
+ " \n",
+ "subgraph(0,0,'sepal_length','petal_length',axes,data)\n",
+ "subgraph(0,1,'sepal_length','petal_width',axes,data)\n",
+ "subgraph(1,0,'sepal_width','petal_length',axes,data)\n",
+ "subgraph(1,1,'sepal_width','petal_width',axes,data)\n",
+ "\n",
+ "\n",
+ "plt.show()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "d5b91055",
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 56,
+ "id": "50010249",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "color_dict = {'setosa':'blue','virginica':'black','versicolor':'red'}\n",
+ "\n",
+ "data['colors'] = data['species'].map(color_dict)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 57,
+ "id": "563f3aa7",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(15, 15))\n",
+ "\n",
+ "def subgraph(x,y,x_name,y_name,axes,data):\n",
+ " axes[x][y].scatter(data[x_name],data[y_name],c=data['colors'])\n",
+ " axes[x][y].set_xlabel(x_name[0]+x_name[5]+x_name[6])\n",
+ " axes[x][y].set_ylabel(y_name[0]+y_name[5]+y_name[6])\n",
+ " axes[x][y].set_title(x_name[0]+x_name[5]+x_name[6]+'VS.'+y_name[0]+y_name[5]+y_name[6])\n",
+ " \n",
+ "subgraph(0,0,'sepal_length','petal_length',axes,data)\n",
+ "subgraph(0,1,'sepal_length','petal_width',axes,data)\n",
+ "subgraph(1,0,'sepal_width','petal_length',axes,data)\n",
+ "subgraph(1,1,'sepal_width','petal_width',axes,data)\n",
+ "\n",
+ "\n",
+ "plt.show()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "id": "d27db8fe-9406-4897-adc2-32a0a83f359a",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "id": "e58d7341-e8bc-40c6-9938-0e77bff459be",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "f39794e8",
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 59,
+ "id": "732f047d",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0 5.1\n",
+ "1 4.9\n",
+ "2 4.7\n",
+ "3 4.6\n",
+ "4 5.0\n",
+ "5 5.4\n",
+ "6 4.6\n",
+ "7 5.0\n",
+ "8 4.4\n",
+ "9 4.9\n",
+ "10 5.4\n",
+ "11 4.8\n",
+ "12 4.8\n",
+ "13 4.3\n",
+ "14 5.8\n",
+ "15 5.7\n",
+ "16 5.4\n",
+ "17 5.1\n",
+ "18 5.7\n",
+ "19 5.1\n",
+ "20 5.4\n",
+ "21 5.1\n",
+ "22 4.6\n",
+ "23 5.1\n",
+ "24 4.8\n",
+ "25 5.0\n",
+ "26 5.0\n",
+ "27 5.2\n",
+ "28 5.2\n",
+ "29 4.7\n",
+ "30 4.8\n",
+ "31 5.4\n",
+ "32 5.2\n",
+ "33 5.5\n",
+ "34 4.9\n",
+ "35 5.0\n",
+ "36 5.5\n",
+ "37 4.9\n",
+ "38 4.4\n",
+ "39 5.1\n",
+ "40 5.0\n",
+ "41 4.5\n",
+ "42 4.4\n",
+ "43 5.0\n",
+ "44 5.1\n",
+ "45 4.8\n",
+ "46 5.1\n",
+ "47 4.6\n",
+ "48 5.3\n",
+ "49 5.0\n",
+ "Name: sepal_length, dtype: float64"
+ ]
+ },
+ "execution_count": 59,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "data[data['species'] == 'setosa']['sepal_length']"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 80,
+ "id": "3b5f2f49",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "fig, axes = plt.subplots(nrows=3, ncols=1, figsize=(6, 18))\n",
+ "\n",
+ "def subgraph(x,specie,axes,data):\n",
+ "\n",
+ " axes[x].boxplot([data[data['species'] == specie][data.columns[i]] for i in range(4)],\n",
+ " labels=data.columns[0:4],\n",
+ " patch_artist=True,\n",
+ " boxprops=dict(color='blue')\n",
+ " )\n",
+ " axes[x].set_xlabel(\"Feature of \"+specie+\" flowers\")\n",
+ " axes[x].set_ylabel(\"Values\")\n",
+ " axes[x].set_title(\"the box with 4 features for the \"+specie + \" flowers\")\n",
+ "\n",
+ "for idx,i in enumerate(list(set(data['species']))):\n",
+ " subgraph(idx,i,axes,data) \n",
+ "\n",
+ "plt.show()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "acdd257b",
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "id": "dffa76b0-43ad-4ef3-a783-7a5e99f7e389",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": []
+ },
+ {
+ "cell_type": "markdown",
+ "id": "43eaf74c-c25a-4841-a524-1474c1993aeb",
+ "metadata": {},
+ "source": [
+ "## 练习3:餐厅小费情况分析 (7个题)\n",
+ "\n",
+ "1. 小费和总消费之间的关系(提示:散点图+回归分析)\n",
+ "2. 男性顾客和女性顾客,谁更慷慨(提示:箱图或者提琴图)\n",
+ "3. 抽烟与否是否会对小费金额产生影响(提示:箱图或者提琴图)\n",
+ "4. 工作日和周末,什么时候顾客给的小费更慷慨(提示:箱图或者提琴图)\n",
+ "5. 午饭和晚饭,哪一顿顾客更愿意给小费(提示:箱图或者提琴图)\n",
+ "6. 就餐人数是否会对慷慨度产生影响(提示:箱图或者提琴图)\n",
+ "7. 性别+抽烟的组合因素对慷慨度的影响(提示:统计柱状图)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "id": "899479bc-3d1b-4144-ac14-52d113ad26b5",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " total_bill \n",
+ " tip \n",
+ " sex \n",
+ " smoker \n",
+ " day \n",
+ " time \n",
+ " size \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 16.99 \n",
+ " 1.01 \n",
+ " Female \n",
+ " No \n",
+ " Sun \n",
+ " Dinner \n",
+ " 2 \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 10.34 \n",
+ " 1.66 \n",
+ " Male \n",
+ " No \n",
+ " Sun \n",
+ " Dinner \n",
+ " 3 \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 21.01 \n",
+ " 3.50 \n",
+ " Male \n",
+ " No \n",
+ " Sun \n",
+ " Dinner \n",
+ " 3 \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 23.68 \n",
+ " 3.31 \n",
+ " Male \n",
+ " No \n",
+ " Sun \n",
+ " Dinner \n",
+ " 2 \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " 24.59 \n",
+ " 3.61 \n",
+ " Female \n",
+ " No \n",
+ " Sun \n",
+ " Dinner \n",
+ " 4 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " total_bill tip sex smoker day time size\n",
+ "0 16.99 1.01 Female No Sun Dinner 2\n",
+ "1 10.34 1.66 Male No Sun Dinner 3\n",
+ "2 21.01 3.50 Male No Sun Dinner 3\n",
+ "3 23.68 3.31 Male No Sun Dinner 2\n",
+ "4 24.59 3.61 Female No Sun Dinner 4"
+ ]
+ },
+ "execution_count": 3,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "data = sns.load_dataset(\"tips\")\n",
+ "data.head()\n",
+ "# 总消费,小费,性别,吸烟与否,就餐星期,就餐时间,就餐人数"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "id": "190abd86",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def scatterplot(x_data, y_data, x_label, y_label, title, ax = None):\n",
+ "\n",
+ " # 创建一个绘图对象\n",
+ " if ax:\n",
+ " pass\n",
+ " else:\n",
+ " fig, ax = plt.subplots()\n",
+ " # 调用plt句柄画一个图,返回两个变量fig,ax,分别是figure和 axes的缩写。函数返回一个figure图像和一个子图ax的array列表。\n",
+ " # 一个fig图里边会套好几个ax,而每个ax会负责当前坐标上的图。所以任何绘制的图形都是在ax上完成的;\n",
+ " # 而设置整个画布,将是在fig上完成。\n",
+ " \n",
+ " # 不显示顶部和右侧的坐标线\n",
+ " ax.spines['top'].set_color('none')\n",
+ " ax.spines['right'].set_color('none')\n",
+ " # 设置数据x_data和y_data、点的大小s、点的颜色color和透明度alpha\n",
+ " ax.scatter(x_data, y_data, s = 10, color = '#539caf', alpha = 0.75)\n",
+ "\n",
+ " # 添加标题和坐标说明\n",
+ " ax.set_title(title)\n",
+ " ax.set_xlabel(x_label)\n",
+ " ax.set_ylabel(y_label)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "id": "2a4a0624",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "iVBORw0KGgoAAAANSUhEUgAAAjMAAAHFCAYAAAAHcXhbAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8qNh9FAAAACXBIWXMAAA9hAAAPYQGoP6dpAABNQUlEQVR4nO3de3wU9b0//tfM7JVkNyEESGIChIIkAQoqomB/lRble9B6qd9Wa22raFvrhWK151i13lpb1FOv9VSr9ag9Vq39FZTTU45YK2gFhCgomqBowiUlgCEkuxv2OvP5/rHskg257G5md2Y2r+fjsQ/d28x7Pln2897PVRJCCBARERFZlGx0AERERETDwWSGiIiILI3JDBEREVkakxkiIiKyNCYzREREZGlMZoiIiMjSmMwQERGRpTGZISIiIktjMkNERESWxmSGqMCsX78ed9xxB7q6urI+xnPPPYcHH3xwWHEsWLAACxYsyPg9M2bMSOu1kiThjjvuSN5fu3YtJEnC2rVrk4/dcccdkCQpoxhyYbDy7HsdRJQ5JjNEBWb9+vW48847DU9mcm3Dhg347ne/a3QYaRmsPK10HURmZTM6ACKibJx66qlGh6CLQrkOIiOxZYaogNxxxx3413/9VwBAbW0tJElK6XrRNA333nsv6urq4HQ6MW7cOHznO99BW1tb8hgLFizA//zP/2DXrl3J9/fuqrnzzjtxyimnoKysDF6vFyeeeCKefPJJ6Lln7ZtvvolTTz0Vbrcbxx13HG699VaoqpryGr26Zx588EFIkoRPPvnkmOduvPFGOBwOdHR0AAC2bNmCr3zlKxg3bhycTieqqqpw9tlnp5RfX0OVZ9/rePrppyFJEl599VUsWbIEZWVlKCoqwjnnnIOWlpZhXy9RIWIyQ1RAvvvd72Lp0qUAgBUrVmDDhg3YsGEDTjzxRADAVVddhRtvvBFnnnkmVq1ahZ///Of43//9X8yfPz9ZYf/mN7/BaaedhoqKiuT7N2zYkDzHzp07ceWVV+LFF1/EihUrcMEFF2Dp0qX4+c9/rss17Nu3D9/4xjdwySWX4OWXX8bXvvY13HXXXVi2bJkux+/rW9/6FhwOB55++umUx1VVxbPPPotzzjkH5eXl6OnpwZlnnon9+/fjP/7jP/Dqq6/iwQcfxIQJE+D3+wc8/lDlOZArrrgCsiwnu6g2bdqEBQsWDKv7kKhgCSIqKP/+7/8uAIjW1taUx5ubmwUAcfXVV6c8/vbbbwsA4uabb04+dvbZZ4uJEycOeS5VVUU0GhU/+9nPxJgxY4SmacnnTj/9dHH66adnFPvpp58uAIiXX3455fHvfe97QpZlsWvXruRjAMTtt9+evP/6668LAOL1119PPnb77beLdL7mLrjgAlFdXS1UVU0+9te//lUAEP/93/8thBCisbFRABAvvfRSRtckxODl2fc6nnrqKQFAfPWrX0153VtvvSUAiLvuuivj8xMVOrbMEI0Qr7/+OgDgsssuS3l87ty5qK+vx2uvvZbWcf7+97/jjDPOQElJCRRFgd1ux2233YaDBw/iwIEDw47T4/Hg3HPPTXnsm9/8JjRNwxtvvDHs4/dnyZIlaGtrw9/+9rfkY0899RQqKiqwePFiAMCUKVMwevRo3HjjjXjsscfQ1NSUk1gSLrnkkpT78+fPx8SJE5N/RyI6iskM0Qhx8OBBAEBlZeUxz1VVVSWfH8ymTZuwaNEiAMATTzyBt956C5s3b8Ytt9wCAAgGg8OOc/z48cc8VlFRAQBpxZiNxYsXo7KyEk899RQA4NChQ1i1ahW+853vQFEUAEBJSQnWrVuH2bNn4+abb8b06dNRVVWF22+/HdFoVPeYEtfc97FclQGRlXE2E9EIMWbMGABAe3s7qqurU57bu3cvysvLhzzGCy+8ALvdjr/85S9wuVzJx1966SXd4ty/f/8xj+3btw/A0WvQm6Io+Pa3v42HH34YXV1deO655xAOh7FkyZKU182cORMvvPAChBB4//338fTTT+NnP/sZ3G43fvKTn+gaU+Ka+z42ZcoUXc9DVAjYMkNUYJxOJ4BjW0m+/OUvAwCeffbZlMc3b96M5uZmLFy4MOUY/bWySJIEm82WbK1InOe//uu/dIvf7/dj1apVKY8999xzkGUZX/ziF3U7T19LlixBKBTC888/j6effhrz5s1DXV1dv6+VJAmzZs3CAw88gNLSUrz77ruDHnug8hzMH/7wh5T769evx65duzJeiJBoJGDLDFGBmTlzJgDgoYcewqWXXgq73Y5p06Zh2rRp+P73v49f//rXkGUZixcvxs6dO3HrrbeipqYGP/rRj1KOsWLFCjz66KM46aSTIMsy5syZg7PPPhv3338/vvnNb+L73/8+Dh48iF/96lfJBEoPY8aMwVVXXYXdu3fj+OOPx1//+lc88cQTuOqqqzBhwgTdztNXXV0d5s2bh+XLl2PPnj14/PHHU57/y1/+gt/85jc4//zzMXnyZAghsGLFCnR1deHMM88c9NgDledgGhsb8d3vfhdf//rXsWfPHtxyyy047rjjcPXVVw/7WokKjtEjkIlIfzfddJOoqqoSsiynzPBRVVXcc8894vjjjxd2u12Ul5eLb33rW2LPnj0p7+/s7BRf+9rXRGlpqZAkKWVG0H/+53+KadOmCafTKSZPniyWL18unnzyyWNmUGU7m2n69Oli7dq1Ys6cOcLpdIrKykpx8803i2g0mvJa6DibKeHxxx8XAITb7Rbd3d0pz23fvl1cfPHF4nOf+5xwu92ipKREzJ07Vzz99NNDHnew8ux7HYnZTGvWrBHf/va3RWlpqXC73eKss84SO3bsSPtaiEYSSQgdV7oiIqJhefrpp7FkyRJs3rx5yNYbIorjmBkiIiKyNI6ZIaKcU1V10O0OJElKGVRMRJQJdjMRUc4tWLAA69atG/D5iRMnYufOnfkLiIgKCpMZIsq5jz76aND9i5xOZ3IWFhFRppjMEBERkaVxADARERFZWsEnM0II+Hy+QQcfEhERkXUVfDLj9/tRUlIyaH89ERERWVfBJzNERERU2JjMEBERkaUxmSEiIiJLYzJDRERElsZkhoiIiCyNyQwRERFZGpMZIiIisjQmM0RERGRpTGaIiIjI0pjMEBERkaUxmSEiIiJLYzJDRERElmZoMvPGG2/gnHPOQVVVFSRJwksvvZTyvBACd9xxB6qqquB2u7FgwQJ8+OGHxgRLREREpmRoMtPT04NZs2bhkUce6ff5e++9F/fffz8eeeQRbN68GRUVFTjzzDO5AzYREZEJtHX5sbXtANq6jK2XJSGEMDSCIyRJwsqVK3H++ecDiLfKVFVV4brrrsONN94IAAiHwxg/fjzuueceXHnllWkd1+fzoaSkBN3d3fB6vbkKn4iIaERZ09yK1U0tCMVUuGwKFjdMxqL6WkNiMe2YmdbWVuzbtw+LFi1KPuZ0OnH66adj/fr1A74vHA7D5/Ol3IiIiEg/bV1+rG5qgQBQXuyGALC6qcWwFhrTJjP79u0DAIwfPz7l8fHjxyef68/y5ctRUlKSvNXU1OQ0TiIiopGmIxBEKKbC43JAliR4XA6EYio6AkFD4jFtMpMgSVLKfSHEMY/1dtNNN6G7uzt527NnT65DJCIiGlHKi91w2RT4QxFoQsAfisBlU1Be7DYkHtMmMxUVFQBwTCvMgQMHjmmt6c3pdMLr9abciIiISD/VpR4sbpgMCfFWGgnAWdMno7rUY0g8NkPOmoba2lpUVFTg1VdfxQknnAAAiEQiWLduHe655x6DoyMiIhrZFtXXoqGyHB2BIMqL3YYlMoDByUwgEMAnn3ySvN/a2oqtW7eirKwMEyZMwHXXXYdf/vKXmDp1KqZOnYpf/vKXGDVqFL75zW8aGDUREREB8RYaI5OYBEOTmcbGRnzpS19K3r/++usBAJdeeimefvpp/Nu//RuCwSCuvvpqHDp0CKeccgrWrFkDj8f4giMiIiJzMM06M7nCdWaIiIgKm2kHABMRERGlg8kMERERWRqTGSIiIrI0JjNERERkaUxmiIiIyNKYzBAREZGlMZkhIiIiS2MyQ0RERJbGZIaIiIgsjckMERERWRqTGSIiIrI0JjNERERkaUxmiIiIyNKYzBAREZGlMZkhIiIiS2MyQ0RERJbGZIaIiIgsjckMERERWRqTGSIiIrI0JjNERERkaUxmiIiIyNKYzBAREZGlMZkhIiIiS2MyQ0RERJbGZIaIiIgsjckMERERWRqTGSIiIrI0m9EBEBERUe60dfnREQiivNiN6lKP0eHkBJMZIiKiArWmuRWrm1oQiqlw2RQsbpiMRfW1RoelO3YzERERFaC2Lj9WN7VAACgvdkMAWN3UgrYuv9Gh6Y7JDBERUQHqCAQRiqnwuByQJQkelwOhmIqOQNDo0HTHZIaIiKgAlRe74bIp8Ici0ISAPxSBy6agvNhtdGi6YzJDRERUgKpLPVjcMBkS4q00EoCzpk8uyEHAHABMRERUoBbV16KhspyzmYiIiMi6qks9BZvEJLCbiYiIiCyNyQwRERFZGpMZIiIisjQmM0RERGRpTGaIiIjI0pjMEBERkaUxmSEiIiJLYzJDRERElsZkhoiIiCyNyQwRERFZGpMZIiIisjQmM0RERGRpTGaIiIjI0pjMEBERkaUxmSEiIiJLYzJDRERElsZkhoiIiCyNyQwRERFZGpMZIiIisjQmM0RERGRpTGaIiIjI0pjMEBERkaUxmSEiIiJLYzJDRERElsZkhoiIiCyNyQwRERFZGpMZIiIisjQmM0RERGRppk5mYrEYfvrTn6K2thZutxuTJ0/Gz372M2iaZnRoREREZBI2owMYzD333IPHHnsMzzzzDKZPn47GxkYsWbIEJSUlWLZsmdHhERERkQmYOpnZsGEDzjvvPJx99tkAgEmTJuH5559HY2OjwZERERGRWZi6m+kLX/gCXnvtNXz88ccAgPfeew//+Mc/cNZZZw34nnA4DJ/Pl3IjIiKiwmXqlpkbb7wR3d3dqKurg6IoUFUVv/jFL3DxxRcP+J7ly5fjzjvvzGOUREREZCRTt8z88Y9/xLPPPovnnnsO7777Lp555hn86le/wjPPPDPge2666SZ0d3cnb3v27MljxERERJRvkhBCGB3EQGpqavCTn/wE11xzTfKxu+66C88++yy2b9+e1jF8Ph9KSkrQ3d0Nr9ebq1CJiIjIIKZumTl8+DBkOTVERVE4NZuIiIiSTD1m5pxzzsEvfvELTJgwAdOnT8eWLVtw//334/LLLzc6NCIiIjIJU3cz+f1+3HrrrVi5ciUOHDiAqqoqXHzxxbjtttvgcDjSOga7mYiIiAqbqZMZPTCZISIiKmymHjNDRERENBQmM0RERGRpTGaIiIjI0pjMEBERkaUxmSEiIiJLYzJDRERElmbqRfOIiGjkaevyoyMQRHmxG9WlHqPDIQtgMkNERKaxprkVq5taEIqpcNkULG6YjEX1tUaHRSbHbiYiIjKFti4/Vje1QAAoL3ZDAFjd1IK2Lr/RoZHJMZkhIiJT6AgEEYqp8LgckCUJHpcDoZiKjkDQ6NDI5JjMEBGRKZQXu+GyKfCHItCEgD8UgcumoLzYbXRoZHJMZoiIyBSqSz1Y3DAZEuKtNBKAs6ZP5iBgGhIHABMRkWksqq9FQ2U5ZzNRRpjMEBGRqVSXepjEUEbYzURERESWxmSGiIiILI3JDBEREVkakxkiIiKyNCYzREREZGlMZoiIiMjSmMwQERGRpTGZISIiIktjMkNERESWxmSGiIiILI3JDBEREVkakxkiIiKyNCYzREREZGlMZoiIiMjSmMwQERGRpTGZISIiIktjMkNERESWxmSGiIiILI3JDBEREVkakxkiIiKyNCYzREREZGlMZoiIiMjSmMwQERGRpTGZISIiIkuzGR0AERER5UZblx8dgSDKi92oLvUYHU7OMJkhIiowI6UCo8GtaW7F6qYWhGIqXDYFixsmY1F9rdFh5QSTGSKiAjKSKjAaWFuXH6ubWiAAlBe74Q9FsLqpBQ2V5QWZ4HLMDBFRgehbgQkAq5ta0NblNzo0yrOOQBChmAqPywFZkuBxORCKqegIBI0OLSeYzBARFYiRVoHRwMqL3XDZFPhDEWhCwB+KwGVTUF7sNjq0nGAyQ0RUIEZaBUYDqy71YHHDZEiIJ7kSgLOmTy7ILiaAY2aIiApGogJb3dSCjkAQLptS0BUYDW5RfS0aKstHxGBwSQghjA4il3w+H0pKStDd3Q2v12t0OEREOcfZTDTSsGWGiNJm9UrS6vGnq7rUU9DXR9QXkxkiSovVp/xaPX4iGhgHABPRkKw+5dfq8RPR4JjMENGQrD7l1+rxE9HgmMwQ0ZCsPuXX6vET0eCYzBDRkKy+ZoXV4yeiwXFqNhGlzeqzgawePxH1j8kMERERWRq7mYiIiMjSmMwQERGRpTGZISIiIktjMkNERESWxmSGiIiILI3JDBEREVkakxkiIiKyNCYzREREZGlMZoiIiMjSmMwQERGRpZk+mfnnP/+Jb33rWxgzZgxGjRqF2bNn45133jE6LCIiIjIJm9EBDObQoUM47bTT8KUvfQmrV6/GuHHj8Omnn6K0tNTo0IiIdMeNMImyY+pk5p577kFNTQ2eeuqp5GOTJk0yLiAiohxZ09yK1U0tCMVUuGwKFjdMxqL6WqPDIrIEU3czrVq1CnPmzMHXv/51jBs3DieccAKeeOKJQd8TDofh8/lSbkREZtbW5cfqphYIAOXFbggAq5ta0NblNzo0IkswdTLT0tKCRx99FFOnTsUrr7yCH/zgB/jhD3+I3//+9wO+Z/ny5SgpKUneampq8hgxEVlZW5cfW9sO5D2J6AgEEYqp8LgckCUJHpcDoZiKjkAwr3EQWZUkhBBGBzEQh8OBOXPmYP369cnHfvjDH2Lz5s3YsGFDv+8Jh8MIh8PJ+z6fDzU1Neju7obX6815zERkTUZ287R1+XHfa5sgAHhcDvhDEUgAblg4l2NniNJg6paZyspKNDQ0pDxWX1+P3bt3D/gep9MJr9ebciMiGozR3TzVpR4sbpgMCfFWGgnAWdMnM5EhSpOpBwCfdtpp+Oijj1Ie+/jjjzFx4kSDIiKibJl5pk6im6e82J3s5ukIBNERCOYt1kX1tWioLDdtGRGZmamTmR/96EeYP38+fvnLX+LCCy/Epk2b8Pjjj+Pxxx83OjQiyoDZZ+qUF7vhsinwhyLJbh6XTUF5sTuvcVSXepjEEGXB1N1MJ598MlauXInnn38eM2bMwM9//nM8+OCDuOSSS4wOjYjSZHQXTjrYzUNkbaYeAKwHn8+HkpISDgAmMsjWtgP47Vtbk104mhDoCARx5WmzMbt6nNHhpTBzVxgRDczU3UxEZH1m6cJJB7t5iKzJ1N1MRGR97MIholxjywwR5Rxn6hBRLjGZIaKMZTO2hF04RJQrTGaIKCNmn2ZNRCNP1slMY2MjmpubIUkS6urqMGfOHD3jIiIT6jvN2h+KYHVTCxoqy0dsqwtnQBEZL+Nkpq2tDRdffDHeeustlJaWAgC6urowf/58PP/889zYkaiAmWGlXDNhKxWROWQ8m+nyyy9HNBpFc3MzOjs70dnZiebmZgghcMUVV+QiRiIyid7TrDUhTD3NOtessBgg0UiRcTLz5ptv4tFHH8W0adOSj02bNg2//vWv8eabb+oaHBGZC6dZH5VopfK4HMlWqlBMRUcgaHRoRCNOxt1MEyZMQDQaPebxWCyG4447TpegiMi8OM06zkqLARIVuoxbZu69914sXboUjY2NSOyE0NjYiGXLluFXv/qV7gESkflUl3owu3rciE1kALZSEZlJxnszjR49GocPH0YsFoPNFm/YSfx/UVFRyms7Ozv1izRL3JuJiHKJs5mIjJdxN9ODDz6YgzCIiKyJiwESGY+7ZhMREZGlpdUy4/P5komAz+cb9LVMGIiIiCif0kpmRo8ejfb2dowbNw6lpaWQJOmY1wghIEkSVFXVPUgiIiKigaSVzPz9739HWVkZAOCpp55CTU0NFEVJeY2madi9e7f+ERIRERENIuMxM4qiJFtpejt48CDGjRtnupYZjpkhIiIqbBmvM5PoTuorEAjA5XLpEhQRERFRutKemn399dcDACRJwq233opRo0Yln1NVFW+//TZmz56te4BEREREg0k7mdmyZQuAeMvMtm3b4HA4ks85HA7MmjULP/7xj/WPkIiIhoUL+1Ghy3jMzJIlS/DQQw9ZZvwJx8wQWQ8rX/2saW7F6qYWBCJR2CQJC6ZOwEUn1RsdFpGuuGhegWJlQFaVqHxDMRUum4LFDZOxqL7W6LAAWO/fVVuXH/e9tgn+cASHozGomgYJEr46ayouOpEJDRWOjLczIPMzc2VANJi2Lj9WN7VAIL4rtT8UweqmFjRUlhuePFjx31VHIIhAJIrD0RggAIciIxLTsPbj3ThtcrXhZUqkl4xnM5G59a0MBIDVTS1o6/IbHRrRkDoCQYRiKjwuB2RJgsflQCimoiMQNDQuq/67Ki92wyZJUDUNigxoAlBkCaoQhpcpkZ6YzBQYs1YGROkoL3bDZVPgD0WgCQF/KAKXTUF5sdvQuBL/rpw2GYfDUThtsiX+XVWXerBg6gRIkBCJaRBCoMhhR5HDbniZEumJ3UwFpndl4HE5TFMZEKWjutSDxQ2TsbqpBR2BIFw2BWdNn2x4d0h5sRsxVcWuzqPJi8fpsMS/q4tOqgckYO3Hu6EeSWbMUKZEemIyU2DMWhlQ/gxnkGq+B7j2d75F9bVoqCxPO458xRyfKSFBkgAhEveHlk18el/TRSfW47TJ1SnHtNpg5kzpdX1WLyerx58uJjMFKNPKgArHcAap5nuA62Dnqy71pPW5zVfMHYEg7IqCCWUuqJqAIkvwh6LoCAQHjTOb+HJ1Tb3L1IqDmTOh1/VZvZysHn8mOGamQFWXejC7ehwTmRFkOINU8z3AVY/z5TPmRPdtJKZhlMOOSEwbsvs2m/jycU1WHcycLr2uz+rlZPX4M8VkhqhADGfwd74HjutxvnzGnOi+lY6cVwKG7L7NJr58XFOhTxLQ6/qsXk5Wjz9T7GYiKhDDGfyd74Hjepwv3zFn2n2bTXz5uKZCnySg1/VZvZysHn+m2DJDVCCyaT3Q4735jtWomBPnTLf7Npv48nFNRpRbPul1fVYvJ6vHnyluZ0BUYKw+m8mIY+SSGWYzGXUOI3E2U5zV408XkxkiIiKyNHYzERERkaUxmSEiIiJLYzJDRERElsap2URkmJEyOJH0w88M9YfJDBEZYiQttU764GeGBsJuJiLKu5G21LpVtHX5sbXtgCn/DvzM0GDYMkNEeZdYar282J1car0jEBxy40bKHbO3evAzQ4NhywwR5V3vpdY1IQp+qXWzs0KrBz8zNBgmM0SUdyNtqXWzs8KmhPzM0GDYzUREhsh040bKHatsSsjPDA2E2xkQEdExY2bOmj4ZZ9aZZ8wM0WDYMkNElEdmXSeFrR5kZUxmiIjyxOwzhqpLPUxiyJI4AJiIKA+sMGOIyKqYzBAR5YEVZgwRWRWTGSKiPOA6KUS5w2SGiCgPuE4KUe5wADARmYJZZ/noiTOGiHKDyQwRGc7ss3z0xBlDRPpjNxNRBsy8q3CmzHAtbV1+/O2jnVi1bQdn+RBR1tgyQ5SmQmo9MMO1JGLwhSIIRCIYW8TdkIkoO2yZIUpDIa0RYoZrSY3BFR8U2xNCMBrlLB8iyhiTGaI0FNIaIWa4lt4xuOx2lBe5ISBwMBAy7SwfM3TLEVH/2M1ElAYz7So83Fk/ZriWvjHYFAVji0fh/JlTUVcxxnSJjBm65YhoYGyZIUqDWdYIWdPcivte24TfvrUV9722CWuaWzM+hhmupb8Yzps5BWfUTTJdImOGbjkiGhxbZojSZPQaIX0rVX8ogtVNLWioLM84FqOvxSwxpCPRJVZezAHKRGbFZIYoA0auEaJ3pWqG9U7MEMNQzNAtR0SDYzcTUZ4MdwBppnv7cMCqPszQLUdEg2PLDFEe6DGANFGprm5qQUcgCJdNSalUew8Mbmrv4IBVHVmlS4xopJKEEMLoINK1fPly3HzzzVi2bBkefPDBtN7j8/lQUlKC7u5ueL3e3AZI1I+2Lj/ue20TBJDsppAA3LBwblaVYn+zmXonS4osIRSNodjp0OV8RERmZ5mWmc2bN+Pxxx/H5z//eaNDGdGsuBlgLmJOHDOmabDJ8qDH7m+syz5fD97e2Q5Mire4ZBJj33EmiYHBEVWFyx7vhvKHIygrcuk+YDXXf/9sj9/W5cf2fQcBCagbb76p3SOZFb8zyHoskcwEAgFccskleOKJJ3DXXXcZHc6IZcW1NnIRc+KYh4IhhKLx444e5Rrw2H0HkLZ3BxAIR/FKcwv+8ekeTCzzYlenL+sYOwJBHDocQkTTkGho1TQBXzACp82m24DVXP/9sz3+muZWvLhlO/zhCADA63Tg6yfUmf6zORJY8TuDrMkSA4CvueYanH322TjjjDOGfG04HIbP50u50fBZca2NXMTcuxUkElMBCEQ0DRFVHfDYvQeQ7vP1IBCOwuO0Y7y3CBFVxcad7YioatYxxjQNoZgKTdNgkyUIISBJEiCEbgNWc/33z/b4bV1+rNq2A4FwFIokQZEk+MMRrPrgE1N/NkcCK35nkHWZPpl54YUX8O6772L58uVpvX758uUoKSlJ3mpqanIc4chghiXwM5WLmBPHdCoKBAC7IkMIAYdNGfTYi+prccPCufiX+snwuhyoKCmGLElw2BRoQsCpKFnHaJNluOzx98c0AVmSMMphw9dOrMOVp83GDQvn4sy6o7+Gs5nllOu/f7bH7wgEEYyqkCRAkWXYlPhXWjASM/VncySw4ncGWZepu5n27NmDZcuWYc2aNXC5XGm956abbsL111+fvO/z+ZjQ6MCKa23kIubEMcOqCglAVNUgyzIiR5rRBzt2dakHmAT849M9yZgiMRWyJCGsqihKY7r1QDGNdrsQUeNJVlhV4VCUfseOZNvsn+u/f7bHLy92w21X0BOJQtW05ONuh83Un82RwIrfGWRdpm6Zeeedd3DgwAGcdNJJsNlssNlsWLduHR5++GHYbDaoqnrMe5xOJ7xeb8qNhs+Ka23kIubEMR2KAodNASDBIctwKEpax+4bk0NRMG9SJRyKknWMvWMKxtQBYxlOs3+u//7ZHr+61INzZ05FsdMOVQioQsDjdOC8mVNM/dkcCaz4nUHWZeqp2X6/H7t27Up5bMmSJairq8ONN96IGTNmDHkMTs3WlxVnJhg9mymdmPSIcahjbG07gN++tTU5q0o7MqbmytNmY3b1OF3OMVyczVR4rPidQdZj6mSmPwsWLMDs2bO5zgxRhvRe74aIyCxM3c1ERPphsz8RFSrLtcxkii0zZEaNu/dhV6cPE8u8mDOhIq/nNkOzvxliIKLCYerZTESF6MHXN2PjznZoIj6N+tRJlbjuSyfn7fxG71TNhdSISG/sZiJKkx67UDfu3oeNO9sBCDhtMgCBjTvb0bh7n25xmhkXUiOiXGDLDFEa9GpN2NXpiy+SZ5MhSTLsChCOadjV6ct7d5MR+tunSq99o6h/7NKjkYDJDNEQ+rYm+EMRrG5qQUNlecaVw8QyL2RJQlTVYFeOLLonSZhYNjLGc3Ehtfxilx6NFOxmIhqCnsuyz5lQgVMnVQKQEI5pACTMq60s+FaZRBcdAM6oyhN26dFIwpYZoiHo3Zpw3ZdONnQ2U6717dbor3XghoVz2fWRY+zSo5GEyQzREBLrs6xuakFHIAiXLb2tCwYzZ0JFwSUxwLHdGqfWVmFj695+u+jSXXWYssMuPRpJmMwQpWFRfS0aKsvZmjCI/sYWrf14N2JCoMJbxNaBPEsk4as++AR7uwJwO2zcs4oKFpMZojQNtj5L764VAIPu25TN7JJs37N9/0FAAHUVR/crSudY2Zyvv26Nw5EobJLE1gEjJdZFFQKFvUSqdXCGmf6YzJClmeFLoXfXSlRVISE+SykUU+GyKxjtdiVnkWQzuyTb9/xpy3b4whEAgMfpwIUn1AHAkMf64zvNWLsj3qJS7LCnPQOmv26NIocdDRVj8O6e/eiJRFHssHPAb54kWspsioKqouHNwiP9cIZZbnA2E1nWmuZW3PfaJvz2ra2477VNWNPcmvcYeneteFx2BMJR+EJhhFUVgEAkpiKiqljd1ILG3fsynl2SzYyUti4/Vn3wCfzhCBRJgiJJCISjWLH1I6z64JNBj/XHd5ux8v0d6AyG0BOJwh+OpD0Dpr+9nyaVedG07yBUIWCTJJxaW4Uz6/jFnQ96zsIjfXCGWe4wmSFLMsuXQu8KQ9UEJAkQADQhYFdkCABORUEopmJXpy/jyiWbCqkjEEQwEgMA2BQZiixDkoDDURXBSGzAY7V1+bH2490QQsChyIAADkdjCESiaVeAi+prccPCubjytNm45OTp2NnpgwAw3lsEp92Gja17+cWdJ71byjQh2MVnAkwwc4fJDFmSGb4U2rr86Og5DEUC/KEIFFmCEIAEJBfGkwCE1Xhz8sQyb8aVSzYVUnmxG25HvAc5pmpQNQ1CAKPsCtwO24DH6ggEoQoBRZagCUCRAVXTYJOkjCrA6lIPZlePg02Wc/I30mNbiZGAu6SbDxPM3OGYGTLMcMa7GD3ttHe/d0xVEYqpsMcUFDvtKWNmHDYFDiU+lXvOhAp09gQzmuKdzbTw6lIPzp0x5ZgxM//3hGkQAgMeq7zYjSKHHZoQOByJIqIKSJKELx0/IasKMBd/I443yAxn4ZlLLpZ5oDhJiMIe3+7z+VBSUoLu7m54vSNjyXgr0KNS6nuMs6ZPzst4jLYuP+57bdORcTLxSjqmaTh/5lTUVYwBYOxspsRrY5qGrmAoo9lMiTLtiUShHElkLjyxPoPSSaXn36i/cpcA3LBwLisDshQzTFwoNGyZobzTa68jo351DrSyannxqGQMQ7WcZBpruu9JJ0kc7Fh6l6mex+OKtlQosvkOoMExmaG807NSMuJLweguroHolSRmWqZD/crU629k1nInIuNxADDlndUHwWUzsDIfg1aNGBSdz+nxHNBKRANhywzlXSEMgsuk+6S/rp9cdI/louVisFYXvVqCMsEBrUTUHyYzZAgjK6XhDr7r/f6hNkvsr8J/cct2uD6wQdWELjNyesejZ5I41Pgbo8awcLwBEfXFZIYMY0SlNNxZVJm+v2+F77DJ8PsisMkyxnpGZdWa0Tt5aWrvOCaeGxbOHXaSmE6rC8ewZI+zWYj0xWSGRoxMu0X6VjjZdKv0rfB9wfi6LyVuR1atGb2TKUUCwjEVRU7HMfEM1WI0lHRaXQqhu9AIXCuHSH9MZmjEyKRbpL8KZ5ynKONulb4Vvk2W4HU6EI5pcNgyG/zcN5lq7w7AFwqj2GnXvZsn3VYXK3cXGsGIcUZEIwGTGRox0q2gB6pwLjl5elbdKn0r/ETXUKatGb2Tsa7D8Y0gNQHs7e5BVNVgUxTdunkyaXUxqrtw1QefIBiJwe2w4dwZUyzRupHpOCMrJmxERmAyQyNGuhX0QBWOTZaz7lZJvKYjEERDZXlWrRmJZKyzJ4hDwTCEEFCk+HOf9QQxtsiN8z4/VbdKz6wzh9q6/PjTlu3wH9mqoScSwZ+2bM9r60a2SUYm44zYHUWUPiYzVHAGq2jSqaAHq3BmV4875v3pVGx6VEyJZGzlex8jpmqwyRLKi0fB7bChIxDC+bOOxxnTJmV0zHTOaZYkJmH7/oPwhSNQJAk2RUZM1eALR7B9/8G8xDqcv2W6CTW7o4gyw2SGCkq6y/kDSC4m17dyGKrC6V3Bp3M+PSumRfW1KCty48n170GWJZSMcsEfisDrcqBu/JiMjmVZvXaTS9lZLg+7zOnxt0wnoebWDUSZYTJDBaN3ReNx2eELRrBq245jKpq+M4JOrKlIJiDb9x0EJKCsyI2zZ3wOEEDpKBdssoy2Lv8xG0b2rtg6e4JY+d7HKCtyY86EimSLTUfgcLJiiqoqZADdoQje3tmOfWU9yc0oAfRbwfXeONImy6jwFuGrs44/JtkCgK1tB1Le39blx/b9B4/ZbDLTcjVTV1NdxRh4nA4EwlEITYMQ8V3BE5t8DmSo60jnOvVKMoZq8Sqkae8DlatenyuzfT7JGExmqGAkKhpFAj4LHIYmBDQh8EpzK66Y93kAqQmIIgEdPSG8sr0Vb37ahpiqIqJpiG8kL2GU3Qa7IkMAsB8ZXNu75aXvgNxDwTBiqoYn17+Hf3y6B7s6ffF4ZAlRVUV7dwCHI1FEVQ0CwP+/pRkCElw2BTZFhgQkB/EmzpNIvA4FQwhF460/o0e5jllPpqm9A/e9timlhQgA/rRlO3xHxpZ4nA5ceEJdTtfVyYfqUg8uPKEOq7btQDCqwm1XhhwrNNR1pHud+UoyCmXa+0DlqtfnyoyfTzIGkxkqGOXF7mSCIkuAhPhtS9v+ZKvK0f2L7PgscBiyBKiaQE84Ag2ATZagCgAQCMVUBKNRSJKMCWUuRGJaSpdC3wG5mhYfx6IJgY072zGmyJXsitCEQE84muwJkQDENAFZEgirQDAWgwRgYpkX4SPnKStyY3VTCyKqikhMBSAQ0TREVDVlPZn+uj5WffAJYqoG/5GxJQAQCEf7bakaiJnHbWQyOHmo68jkOvOZZJh1AHa6BirXxOd6uJ8rM38+Kf+40SQVjOpSD06sqYBAvEVGkiSMLXJD1URyfEwiAfEF4wkGAMiSHM8ujkj8ryYEBADpSMLTd+PGRMWmaQIxVYMsSSgvcmOU0w5NCDgVJdkVIUGC22HDmFEuKLKUbPFRZDkZBwDE1KPnSbTsOBXlSOuQDCEEHDYlJY7+NpgMRmI4HIkBAGyKDEWWIUlAMJr+xpPZblyZj001gXj5z64eN2TFNdR1ZHqdi+prccPCubjytNm4YeFcnFmXu5aAdK/RjAYq18TnergbohqxsSqZF1tmqKAsqq/Fu237EVM1lLjji9PZgGQ3QCIBWbVtRzzhAVDqcuJQMIzECNJEaiFLEoQQEAJQZKnfLoX+BuR29gQhSxLCqoqiI7uCux024EhyJEsSVC1+blXTIPdKaGzK0fNMLPPCZVMQVlVIAKJq/LWRI03qiTj66/pwO2ywqxoiwRhiqha/LgG47el3iWTTpWLGZv+hriOb6zTjLC+zGahcE5/r4XbVFdK4Iho+tsxQQaku9eDcGVPgttvgC0UhAcd0Ayyqr8VPFs3DorpalBePgiTL8LgccNuUI8lGPOFw2RR4XU4UO+3wD3AsAJgzoQJfnXU8HIqCjkAQDkXBvEmVyfsSgPNmTsG5M6fCoShw2BRIUrx1RpJkOBUFHqcDHqcjJeY5EyqwuGFy8j2ABIcsw6EcO7tqccNkSEDK+S6YdTw8TgdUIaAKgWKnfdCxJX1bVPo77mBdKn2b/QWA1U0tOW+hGcpQ15HpdVJ6BirXxOd6uOXNvxv1JgmRMrmx4Ph8PpSUlKC7uxter9focChP0p3h0Pt1wNHZTKVu15CzjIY6Z38x9J2ZlPhvJrOZMnk+3dlMg7WopFuWW9sO4LdvbU3O9NFEvHvvytNmD3uvKD3oMZuJMsfZTJQPTGaILEyPbp22Lj/ue23TkSnt8eZ6CcANC+dmPCBTj+MQEWWK3UxEFqVXt45eAynZ7E9ERuEAYLIkNi0fTUK8LjsOh6Nw2mT4QtGMF3DTcyCl1acTE5E1MZkhyzHjjJne8pVolRe7EVNV7Oo82oLicToyTkL0XjuFM32IKN+YzJClpLtlgVHWNLdi1QefIBiJwe2w4dwZU3KaaMUHvEmQpPjU62wHwLFFhYisjGNmyFISXSsxVcU/uwLoDoXxWU8QrzS3Gh0a2rr8+NOW7egIHEZPJIKOwGH8acv2nE1N7ggEYVcUTCjzoKqkGBPKPLAfmQ6eDSsv0EZEIxuTGbKU3lsWiCOL3vXessBI2/cfhC8cgSxJcNjiq//6wpH41OgcSIx1icQ0jHLYEYlpXDSMiEYkJjNkKelsWWCYXn08yQUPBLC705+TRIuzh4iI4jhmhnSTr4GvvbcscNttCEZjsCmy4S0SdRVj4HE6EAhHITQNqiYghMCGnf/Etr0HcjJQmWNdiIjYMkM6WdPcivte24TfvrUV9722CWtyOIYlsWWBqmnY6wvgUDCEUDSGpvaOnJ0z3bguPKEO5UUuuGwKJAAlbicqvEU5XdqfY12IaKRjMkPDZsSePA2V5XDaFIx2uzCxzItip8MU+wAl9n364pQauO02jClyjagdffO1Y/ZwWSVOIkoPu5lo2BIzjBJ78nhcDnQEghkv3tbXYN1WHYEgVAGM9YyCLElw2oQu59RDU3sH3tm9D8FYDLs6fSgvcsOmKAU/ONfs6/8kWCVOIkofkxkatkxXkE1nbE2iwumJRCGEwIyqsfi/s6clX5+Y1fSZ/zC8bscxM3mG2pwxG+ls2phopbIpCsYWudHRE8JnPUGMLR6F82ZOyclGe2bQt3XOH4pgdVOLadb/SbBKnGRNhfRv2mqYzNCwZbKCbDq/ihMVjj8cgT8UhiqANz9tw3v/PIBvnFiPRfW1aGrvQDimwheO4FAwBI/TgYtOrEN1qSd5jvhYmvh5Ro9yDesX+JrmVvxpy3b4whFAAG67gsXTP4eLTqxPeV3fVqpRTjsOBkI4f+ZUnFE3KaNysJJctc7pzSpxkvUU2r9pq2EyQ7pIZ1ZNur+KOwJB9ESi6AlHoIn4OjICQE84ilUffIKyIjdWN7WgyOlAWZEL3cEIbIqM+ory5DkiqopITAUgENE0RFQ161/gbV1+rPrgE/jDEUAIqAIIRGJYsfVjQAAXnXQ0oenbShWJafC6HKirGJNxOWTLiF+Heu7vlEtWiZOshS1+xuMAYNLNULNq0t2dOd6FJEHV4ou1CACSFH8uGImPQ0kcx2W3Y6xnVHKdmcQ5nIoCAcCuyBBCwGFTsh6A2xEIIhiJQQjgSEhHEiyBtTt2pwwiTWftF712qe5PPmeV9WaVNW+sEidZSy7/TVN62DJDeZPur+LqUg8WHD8BK7Z+DO3I6nMy4gmN22HDxDLvoMdx2RSEVRUSgKiqQZZlRI40/WbzC7y82A23w4ZAOBJPrI48rsgyYkIc00UxVCtVrloHjP51aJU1b6wSJ1kHW/yMx5YZyptMfhVfdGI9Lph1PIodNsiSBEgSPE4Hzps5BXMmVAx4nMQ5HIoCh00BIMEhy3Ao2e8EnVjXpshpBxBvKZIlCUUOO4od9n6/sAZrpcpV64AZfh1aZc0bq8RJ1sAWP+NJQohsN9q1BJ/Ph5KSEnR3d8Pr9RodDiGzMR1tXX5s33cQkIC68akziAY7Tq5mM/15y0f4oP0zQJJQ7LDjrOmTcWbd0IP8+otV77EtbV1+3PfapiM7isd/HUoAblg4l1+qRHnA2UzGYTJDhrLiP/7G3fuwq9OHiWVezJlQMeTr053BpUc59D1XuskWEZGVccwMGcaKUxn7xtzZExw05nTGsehZDhwPQkQjEcfMkCGM2AIhW4ml7xt378s45qHGseSiHKpLPSgvdqMjEDRleRIR6Y0tM2SITBcvM6I7qq3Lj1eaW7GlbX98mrgQCMZiqBntTXvBtaFmOeRiETcrtngREQ0HkxkyxECVfEzTsLXtQErSonflnO52Cqu27cBnPfGZCeVFbghJQiiqorMniLIid1rTL4daHVnvKZ1GT88mIjICkxkyRH+V/KQyL/6w+cOUpKWhslzXyjmT7RRimoAsSZAAHAqGUV1ajJ5wFNqRBfoG27aht8HGsWSyFUQ6ctHSY8VB2kQ0sjCZIcP0ruRjmoY/bP7wmKRFliTdKudMtlMIxVR43Q74wxEIIaBqGrqDEYwe5cIlJ0/PeLp3Yg2cocphuAmD3i097LIiIivgAGAyVGLxMpss9ztQFhKSlbMmxLAq50y2U3DZFERiGka7ndBEfKE8myLjrOmTMWdChe4Lrum1iJuei3dZaZA2EY1sbJkhUxioRaFu/BhomtClG6bvOTp7goAQiGlayut6d/2oAigvcuHEmgosqq+1RDeLXi093GGaiKyCyQyZwmBjR6pLPbpUzr3PseeQD6FovOvkD5s/PGa9GKuv1zJYt1a6uN8MEVmFqVcAXr58OVasWIHt27fD7XZj/vz5uOeeezBt2rS0j8EVgK0lH4NNG3fvw5Pr34MsS8lZSYMt+z+SB8AOZ0Vhs5eb2eMjovSZumVm3bp1uOaaa3DyyScjFovhlltuwaJFi9DU1ISioiKjw7OMTPdCyvQLPvGefb4e9ESimFjmRYW3aMh9k/p7LnE/MY6lutSTsj9TTBVo9/mhaQITx5Sg1O1C1+FQcu+mfb6e5FYDvWMAgO37D8YHvwCAFE9k+nafJM6diO2P7zTjbx/tRFTT4FIU1FeWo8pbjNFFrmP2ikq3nIZTeeZjj6feGirLIcsSIIC6iqGvNxHLR/sPYuPOvQjFVCiyhBOqx+P/5LCbLtMyGM7A5r7nYlJEZDxTt8z09dlnn2HcuHFYt24dvvjFL6b1npHeMpPJl3Y2X/CJ9xwIHEY4pgKI7yjtVGQUu5wpU6w7AkF8dOAgNrbuHfAcfWOYWObFB+0d8Icj8YXr+iFLgCRJkCUJMVWDJEkAkIwhqqqIxFSE1Xh8o+x2KLKEYqcjZUPGU2urUmIbW+zGlrYD6O+sEoBStxNfP6EurUpQj1lB/R0DQM5mG2Uac+L1PZEofKEIip12uGwKOnqCEADGFrlx7sypus+GyjTO4WzI2d/nc1enj7O9iAxmqdlM3d3dAICysrIBXxMOh+Hz+VJuI1Ums1GymbmSeM/haCyZyEgANCEQjKlw2WQIAH/ash13r9mA/3jzXax8bwf84Ui/52jcvQ8r3/sYETU+6DSiqtjQuhf+cARSvylFXHxxXoGoqkEAUOSjMSgS4A9HcDgagwRAkSQEozFomoaYqmK/rwfhaAzTK8dgY+ve5PVHVBVb//nZIGcFfKEwVn3wyZCze/SYFdTfMVZ98AlWbduRk9lGmcbc+/XFDjuEEOiJRHHwcAhyItE8MpBbz9lQ2ZRturPahjpXRFWxcWd78vPK2V5ExrFMMiOEwPXXX48vfOELmDFjxoCvW758OUpKSpK3mpqaPEZpLpl8aXcEgghEopAkIHrkPUN9wSeOn/gQSUBK5R+OaXDaZPjCEcQ0gWKnHQIChyNRRNXUc6xpbsWT69/DwcMhHDocQtfhEJyKAi3ZcCgNeq2J1hgAgDj6/8Gomvx/RZahyDIkCZAkGVPHlkGRJMSEwDu79+NQMJQsK6eiYKhGSwEgGIkNWQlmW3kOdYxgJIZgdHjH1Svm3q932BUosgRVi6/PA8Rb67xu/eLLNk4gdWBzJtP9+54r8fl02BTdy5+IMmOZZObaa6/F+++/j+eff37Q1910003o7u5O3vbs2ZOnCM0nky/tjw4chD8Uwd7uAHZ3+dHeHRjyCz5x/MTEZoHUlMNpk9EdjAAAvG4HnIoCRZahagKRqJqyhcHqphbIsgSbIkMTAgd7guiJxiAnk5QhEoveiYd09P/ddiX5/6qmQdU0CAHYFRk7PuuE025DhbcIsnx0qwJNCITV+FiPwVIoCYDbYRuyEsy28hzqGG6HDW67PmvwDDfm3q+3KwpGOeyQjxSeJoDRbiciMU332VDZlG22a/H0PVdYVSFLEiIxVffyJ6LMWCKZWbp0KVatWoXXX38d1dXVg77W6XTC6/Wm3EaqdL+027r82Ni6F8VOOxRJgqpqCISjmFdbNegXfOL4o+w2OG3xpEEg/ivcbVMQimmwKTI8TgciMQ12m4JRdhskSYq3Ah2JJ7FgXlmRG2NGuSDLcnwrAQDza6vgcTogBkkrEmNmHIoMCYCqHY1BFYDH6cAouw0CgCriLURzJ1bGnzvyK7us6EhidmSrAoeiYN6kKpS4nVCOJFQSUpM1r8uJ82ZOGbIS1GMhu/6Ocd7MKTh35lRdFsgbbsx9X+9xOnDBrOPxL/WTUV7kgiqga3zZxpmwqL4WNyyciytPm40bFs5Na4ZW33PFPyOVcCiK7uVPRJkx9QBgIQSWLl2KlStXYu3atZg6dWrGxxjpA4CBoWd6bG07gN++tRXlxe74YNmoikAkiqv/vxMxu3pc2scfaDZTU3tHyqDJebVVOH7cmJTZIL0HZHb2BKFpAlfMn4U5Eyp0n81UVzEGAPodBNp3q4Le5y51u9AVDOFQT2jEzGbK9Nj5ji/bOPU8F2czERnP1MnM1Vdfjeeeew4vv/xyytoyJSUlcLvTa8plMjO04czuSLx/qC/zoV4znPVMsmXEOYmISH+mTmZSBnX28tRTT+Gyyy5L6xhMZtKTbcWu50aERvzi5a9qIiLrM3UyowcmM+nLpksh0xaddM/B3ZqJiChdpl4BmPIr0/18Mt2IMN0Epe96Hv5QBKubWtBQWW7Z1hO2ABER5Q6TGcpKW5cfHT2H44vSDbELdeL16SYohbZbM1uZiIhyi8kMZax35RxTVYRiKrqCoUF3oc4kQclkt2azt3gUYisTEZHZWGKdGTKPvpVzkdMBmyzDLssoctjgtttwOBI9Zln3TBY3S3ftkDXNrbjvtU347Vtbcd9rm7CmuTW3F58FPVb/JSKiwbFlhjLSXwuLPxRBTyS+EWRiFWB/OJLS6pJIUFY3taAjEEzOmBpoLZJF9bXJzSkH2nlb7xaPXLTyZNLKRERE2WEyQxnpr3KWJCB2ZEfrxP5MEVVD4+59KYnBQAnKQGNKBhuQrPe4mlyNaxkqiSMiouFjMkMZ6a9ynjp2NBp37ztm96Q3P92DD9s/S0kM+iYo2bawJJKqzp4gHDYFkSNJSDYtHrke1zJUKxMREQ0PkxnKWKJyTmwPENMEtrQdgBACsgREj7TSlI1yQQCDJgbZtrBUl3owscyLjTvboQkBWZIwb1JlVolCIgavy47D4Wh8p+9QVNfZU5lOeyciovQxmaGs9N1vaeJoD3Yd8ienZXuddnjdTmhCDJqcZDumpK3Lj12dPowpcsGpKAirKnZ2+tDW5c84aSgvdiOmqtjVeXRQrsfp4LgWIiKL4GymEa6ty4+tbQdSZh6l857e3TKJMTKXnTITixs+h/IiN9wOO/zBMDp7gsnkpO+5EgNuT51UBQnAfl8PwtFYcrfuwWJLtKaUFbnhcTtRVuQe1iyhxIif+BYaEmKahu37D2ZULnrK5u9S6FgmRDQQtsyMYL0HvSqyhBOqx+P/HBl4O5iBuoYqvEX4l4Za+EIhbNzZDlXTIEkSTjhu7LEtOWVe7Or0Je+PLXYjdCiGmBDY0LoXe7sDKc/3HZCr5yyhjkAQdkXBhDIXVE0gEI6gKxjG843N8LoceV/kjovsHYtlQkSDYcvMCJD4Rdu4e1/yl23v1hVFAjoCh7FmeyvuXrNh0PVa4q0ph6HIUr9rxiS6f9x2BYoc/3i9t/czPP9Oc7IlJ6Kq2LizHRFVTd5/f28HZFlChbfomOcT4256/yJPdy2adCQSo0hMgyJL6A7G95gqL3b1e+5c6q/VK5/nNyOWCRENhS0zBS7xi/bQ4VD8V61dwWi3CzOqxiYHvX4WOAxZkiAQH8w70IDd3r+Oo6oKCUC41y7b1aUebG07gEAkirCqQZYk2BUJoZiKnkgUYz3xlhyHTYEmBJyKAlmS4FTi9x02pd/nBxoUrNcsod4ztA4GQhAQGFfkhstuh8M2+JgfvRXaVg56YJkQ0VCYzBSwxC/aiKoiomkABCIxFRFVxZa2/VAkoDsYgarFEw9ZkuB1O+DvZyZPf9OXY6qK82cdj7rxY5KvLS92wyZJUDUNDkWGJgCbJEETgC8YgdNmQySmQpYkhFUVRUIgrMbvR2IqNCGOeX6o1YL1nD69fd9BvLRtBxRZHnKl4lzgInvHYpkQ0VDYzVTAEr9oHTYFQgjYFRkCgFNRoGoCJ9ZUwHbkMU0Ao91ORGJavxVFf8vyqwIoLxqVkkxUl3qwYOoESJAQiWkQQqDY6YDX5YBNltARCMKhKJg3qRIORUnr/nC6kDJRXerBGXWTcO6MKbp0X2Ui0RUIQLfus0KhZ5ciERUmtswUsKNjQVRIkoToka6fsBrvGlpUX4tF9bVY09yKd/fsgyoAO/qvKDL5dXzRSfWABKz9eDdUIVDksOOs6ZNRX5HaJdR3+4Ch7udLvhe5629w6w0L53KRvV648CARDUYSQvRduLWg+Hw+lJSUoLu7G16v1+hw8m6gMTNnTZ+MM+uOzgZJJ3FIHCsQicImSfjS8RNw4Yn1A57b7Dtam0Fblx/3vbYJAji6PQSAGxbOZZkREaWJLTMFrvcv2pimwSbL/SYX6Yw9WVRfi0PBULLFZUPrXpS6XQNOkeWqt0Pj4FYiouFjMjMCpJNUpNOK0tblx8bWvXDabclWhNVNLSgrcqckSWyRSR8HtxIRDR+TGUp7QbL+WhH2dPrw5Pr3AEnqdzE8Lm42OO6qTUQ0fExmRrhMdozu24rQ2RNPboqcdpQVudHZE8TGne0YU+TKye7ThYqDW4mIhodTs0e4/qZcD7THUd8pspom4LIrKCty97v43WDHolTVpR7Mrh7HRIZScD8qovSwZWaEy3TMRt8BxX/Y/GHyvX0Xv+P4D6LscT8qovSxZWaEy2ZBskQrwpwJFSnvNWqxO6JCw/2oiDLDlhka1piN/t6bq9lMnCVFIwWn7BNlhskMARjemjB935uL9WXY5E4jCafsE2WG3Uxkemxyp5GG+1ERZYYtM2R6VmhyZxcY6Y1T9onSx2RmGPJRgWV7jkKqXGOaBgiBzp4gyorcpmtyZxcY5Qq3BCFKD5OZLOWjAsv2HIVUuSauJRiLIRRV0ROOYvQol2ma3DNZdJCIiHKDY2aykI8xHNmeo5DGl/S+lprRXowpcsFtt+GSk6en7PhtpEwWHSQiotxgMpOFfFRg2Z6jkCrXvtdSVuQGJAk22Twf296zTrhQIBGRMcxTK1hIPiqwbM9RSJWrFa6Fs06IiIzHZCYL+ajAsj1HIVWuVrmWRfW1uGHhXFx52mzcsHCuabrAiIhGCkkIIYwOIpd8Ph9KSkrQ3d0Nr9er67E5myk/CulaiIhIf0xmiIiIyNLYzURERESWxmSGiIiILI3JDBEREVkakxkiIiKyNCYzREREZGlMZoiIiMjSmMwQERGRpTGZISIiIktjMkNERESWxmSGiIiILI3JDBEREVmazegAci2x9ZTP5zM4EiIiIsqUx+OBJEmDvqbgkxm/3w8AqKmpMTgSIiIiylQ6G0UX/K7ZmqZh7969aWV2lDmfz4eamhrs2bOHu5LnEcvdOCx7Y7DcjWN02bNlBoAsy6iurjY6jILn9Xr5BWMAlrtxWPbGYLkbx8xlzwHAREREZGlMZoiIiMjSmMzQsDidTtx+++1wOp1GhzKisNyNw7I3BsvdOFYo+4IfAExERESFjS0zREREZGlMZoiIiMjSmMwQERGRpTGZISIiIktjMkNpeeONN3DOOeegqqoKkiThpZdeSnleCIE77rgDVVVVcLvdWLBgAT788ENjgi0gy5cvx8knnwyPx4Nx48bh/PPPx0cffZTyGpa9/h599FF8/vOfTy4SNm/ePKxevTr5PMs8P5YvXw5JknDdddclH2PZ58Ydd9wBSZJSbhUVFcnnzV7uTGYoLT09PZg1axYeeeSRfp+/9957cf/99+ORRx7B5s2bUVFRgTPPPDO5NxZlZ926dbjmmmuwceNGvPrqq4jFYli0aBF6enqSr2HZ66+6uhp33303Ghsb0djYiC9/+cs477zzkl/eLPPc27x5Mx5//HF8/vOfT3mcZZ8706dPR3t7e/K2bdu25HOmL3dBlCEAYuXKlcn7mqaJiooKcffddycfC4VCoqSkRDz22GMGRFi4Dhw4IACIdevWCSFY9vk0evRo8bvf/Y5lngd+v19MnTpVvPrqq+L0008Xy5YtE0Lw855Lt99+u5g1a1a/z1mh3NkyQ8PW2tqKffv2YdGiRcnHnE4nTj/9dKxfv97AyApPd3c3AKCsrAwAyz4fVFXFCy+8gJ6eHsybN49lngfXXHMNzj77bJxxxhkpj7Psc2vHjh2oqqpCbW0tvvGNb6ClpQWANcq94DeapNzbt28fAGD8+PEpj48fPx67du0yIqSCJITA9ddfjy984QuYMWMGAJZ9Lm3btg3z5s1DKBRCcXExVq5ciYaGhuSXN8s8N1544QW8++672Lx58zHP8fOeO6eccgp+//vf4/jjj8f+/ftx1113Yf78+fjwww8tUe5MZkg3fbdoF0IMuW07pe/aa6/F+++/j3/84x/HPMey19+0adOwdetWdHV14c9//jMuvfRSrFu3Lvk8y1x/e/bswbJly7BmzRq4XK4BX8ey19/ixYuT/z9z5kzMmzcPn/vc5/DMM8/g1FNPBWDucmc3Ew1bYsR7IntPOHDgwDGZPGVn6dKlWLVqFV5//XVUV1cnH2fZ547D4cCUKVMwZ84cLF++HLNmzcJDDz3EMs+hd955BwcOHMBJJ50Em80Gm82GdevW4eGHH4bNZkuWL8s+94qKijBz5kzs2LHDEp95JjM0bLW1taioqMCrr76afCwSiWDdunWYP3++gZFZnxAC1157LVasWIG///3vqK2tTXmeZZ8/QgiEw2GWeQ4tXLgQ27Ztw9atW5O3OXPm4JJLLsHWrVsxefJkln2ehMNhNDc3o7Ky0hqfeePGHpOV+P1+sWXLFrFlyxYBQNx///1iy5YtYteuXUIIIe6++25RUlIiVqxYIbZt2yYuvvhiUVlZKXw+n8GRW9tVV10lSkpKxNq1a0V7e3vydvjw4eRrWPb6u+mmm8Qbb7whWltbxfvvvy9uvvlmIcuyWLNmjRCCZZ5PvWczCcGyz5UbbrhBrF27VrS0tIiNGzeKr3zlK8Lj8YidO3cKIcxf7kxmKC2vv/66AHDM7dJLLxVCxKfu3X777aKiokI4nU7xxS9+UWzbts3YoAtAf2UOQDz11FPJ17Ds9Xf55ZeLiRMnCofDIcaOHSsWLlyYTGSEYJnnU99khmWfGxdddJGorKwUdrtdVFVViQsuuEB8+OGHyefNXu6SEEIY0yZERERENHwcM0NERESWxmSGiIiILI3JDBEREVkakxkiIiKyNCYzREREZGlMZoiIiMjSmMwQERGRpTGZISLLuuyyy3D++een9doFCxbguuuuG/Q1kyZNwoMPPpi8L0kSXnrpJQDAzp07IUkStm7dmlWsRJQ7TGaISFfpJA16vCcXNm/ejO9///tGh0FEGbIZHQARkVmMHTvW6BCIKAtsmSEi3Vx22WVYt24dHnroIUiSBEmSsHPnTqxbtw5z586F0+lEZWUlfvKTnyAWiw36HlVVccUVV6C2thZutxvTpk3DQw89NKz4YrEYrr32WpSWlmLMmDH46U9/it47uvTtZiIia2AyQ0S6eeihhzBv3jx873vfQ3t7O9rb22G323HWWWfh5JNPxnvvvYdHH30UTz75JO66664B31NTUwNN01BdXY0XX3wRTU1NuO2223DzzTfjxRdfzDq+Z555BjabDW+//TYefvhhPPDAA/jd736n1+UTkUHYzUREuikpKYHD4cCoUaNQUVEBALjllltQU1ODRx55BJIkoa6uDnv37sWNN96I2267rd/3AICiKLjzzjuT92tra7F+/Xq8+OKLuPDCC7OKr6amBg888AAkScK0adOwbds2PPDAA/je9743vAsnIkOxZYaIcqq5uRnz5s2DJEnJx0477TQEAgG0tbUN+t7HHnsMc+bMwdixY1FcXIwnnngCu3fvzjqWU089NSWOefPmYceOHVBVNetjEpHxmMwQUU4JIVISiMRjAI55vLcXX3wRP/rRj3D55ZdjzZo12Lp1K5YsWYJIJJLTeInIetjNRES6cjgcKS0dDQ0N+POf/5yS1Kxfvx4ejwfHHXdcv+8BgDfffBPz58/H1VdfnXzs008/HVZsGzduPOb+1KlToSjKsI5LRMZiywwR6WrSpEl4++23sXPnTnR0dODqq6/Gnj17sHTpUmzfvh0vv/wybr/9dlx//fWQZbnf92iahilTpqCxsRGvvPIKPv74Y9x6663YvHnzsGLbs2cPrr/+enz00Ud4/vnn8etf/xrLli3T47KJyEBMZohIVz/+8Y+hKAoaGhowduxYRKNR/PWvf8WmTZswa9Ys/OAHP8AVV1yBn/70pwO+Z/fu3fjBD36ACy64ABdddBFOOeUUHDx4MKWVJhvf+c53EAwGMXfuXFxzzTVYunQpF8kjKgCS6L3IAhEREZHFsGWGiIiILI3JDBFZ3u7du1FcXDzgbTjTuYnI/NjNRESWF4vFsHPnzgGfnzRpEmw2Tt4kKlRMZoiIiMjS2M1ERERElsZkhoiIiCyNyQwRERFZGpMZIiIisjQmM0RERGRpTGaIiIjI0pjMEBERkaUxmSEiIiJL+3+PlrX5iQfEhwAAAABJRU5ErkJggg==\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "scatterplot(x_data = data['total_bill']\n",
+ " , y_data = data['tip']\n",
+ " , x_label = 'total_bill'\n",
+ " , y_label = 'tip'\n",
+ " , title = 'total_bill vs tip')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "id": "0b931bd7",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import statsmodels.api as sm\n",
+ "from statsmodels.stats.outliers_influence import summary_table\n",
+ "x = sm.add_constant(data['total_bill']) # 线性回归增加常数项\n",
+ "y = data['tip']\n",
+ "regr = sm.OLS(y, x) # 普通最小二乘模型,ordinary least square model\n",
+ "res = regr.fit()\n",
+ "\n",
+ "# 从模型获得拟合数据\n",
+ "st, result_data, ss2 = summary_table(res, alpha=0.05) # 置信水平alpha=5%,st数据汇总,data数据详情,ss2数据列名\n",
+ "fitted_values = result_data[:,2]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "id": "93c9e7c8",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "predict_mean_ci_low, predict_mean_ci_upp = result_data[:,4:6].T\n",
+ "import pandas as pd\n",
+ "# 创建置信区间DataFrame,上下界\n",
+ "CI_df = pd.DataFrame(columns = ['x_data', 'low_CI', 'upper_CI'])\n",
+ "CI_df['x_data'] = data['total_bill']\n",
+ "CI_df['low_CI'] = predict_mean_ci_low\n",
+ "CI_df['upper_CI'] = predict_mean_ci_upp\n",
+ "CI_df.sort_values('x_data', inplace = True) # 根据x_data进行排序\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "id": "86febbe6",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def lineplotCI(x_data, y_data, sorted_x, low_CI, upper_CI, x_label, y_label, title):\n",
+ " # 创建绘图对象\n",
+ " _, ax = plt.subplots()\n",
+ "\n",
+ " # 绘制预测曲线\n",
+ " ax.plot(x_data, y_data, lw = 1, color = '#539caf', alpha = 1, label = 'Fit')\n",
+ " # 绘制置信区间,顺序填充\n",
+ " ax.fill_between(sorted_x, low_CI, upper_CI, color = '#539caf', alpha = 0.4, label = '95% CI')\n",
+ " # 添加标题和坐标说明\n",
+ " ax.set_title(title)\n",
+ " ax.set_xlabel(x_label)\n",
+ " ax.set_ylabel(y_label)\n",
+ "\n",
+ " # 显示图例,配合label参数,loc=“best”自适应方式\n",
+ " ax.legend(loc = 'best')\n",
+ " return ax"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "id": "332f2ad6",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "ax = lineplotCI(x_data = data['total_bill']\n",
+ " , y_data = fitted_values\n",
+ " , sorted_x = CI_df['x_data']\n",
+ " , low_CI = CI_df['low_CI']\n",
+ " , upper_CI = CI_df['upper_CI']\n",
+ " , x_label = 'total_bill'\n",
+ " , y_label = 'tip'\n",
+ " , title = 'Line of Best Fit for total_bill vs tip')\n",
+ "\n",
+ "scatterplot(x_data = data['total_bill']\n",
+ " , y_data = data['tip']\n",
+ " , x_label = 'total_bill'\n",
+ " , y_label = 'tip'\n",
+ " , title = 'Line of Best Fit for total_bill vs tip'\n",
+ " , ax=ax)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "id": "c2e72f50",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " total_bill \n",
+ " tip \n",
+ " sex \n",
+ " smoker \n",
+ " day \n",
+ " time \n",
+ " size \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 16.99 \n",
+ " 1.01 \n",
+ " Female \n",
+ " No \n",
+ " Sun \n",
+ " Dinner \n",
+ " 2 \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 10.34 \n",
+ " 1.66 \n",
+ " Male \n",
+ " No \n",
+ " Sun \n",
+ " Dinner \n",
+ " 3 \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 21.01 \n",
+ " 3.50 \n",
+ " Male \n",
+ " No \n",
+ " Sun \n",
+ " Dinner \n",
+ " 3 \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 23.68 \n",
+ " 3.31 \n",
+ " Male \n",
+ " No \n",
+ " Sun \n",
+ " Dinner \n",
+ " 2 \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " 24.59 \n",
+ " 3.61 \n",
+ " Female \n",
+ " No \n",
+ " Sun \n",
+ " Dinner \n",
+ " 4 \n",
+ " \n",
+ " \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " \n",
+ " \n",
+ " 239 \n",
+ " 29.03 \n",
+ " 5.92 \n",
+ " Male \n",
+ " No \n",
+ " Sat \n",
+ " Dinner \n",
+ " 3 \n",
+ " \n",
+ " \n",
+ " 240 \n",
+ " 27.18 \n",
+ " 2.00 \n",
+ " Female \n",
+ " Yes \n",
+ " Sat \n",
+ " Dinner \n",
+ " 2 \n",
+ " \n",
+ " \n",
+ " 241 \n",
+ " 22.67 \n",
+ " 2.00 \n",
+ " Male \n",
+ " Yes \n",
+ " Sat \n",
+ " Dinner \n",
+ " 2 \n",
+ " \n",
+ " \n",
+ " 242 \n",
+ " 17.82 \n",
+ " 1.75 \n",
+ " Male \n",
+ " No \n",
+ " Sat \n",
+ " Dinner \n",
+ " 2 \n",
+ " \n",
+ " \n",
+ " 243 \n",
+ " 18.78 \n",
+ " 3.00 \n",
+ " Female \n",
+ " No \n",
+ " Thur \n",
+ " Dinner \n",
+ " 2 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
244 rows × 7 columns
\n",
+ "
"
+ ],
+ "text/plain": [
+ " total_bill tip sex smoker day time size\n",
+ "0 16.99 1.01 Female No Sun Dinner 2\n",
+ "1 10.34 1.66 Male No Sun Dinner 3\n",
+ "2 21.01 3.50 Male No Sun Dinner 3\n",
+ "3 23.68 3.31 Male No Sun Dinner 2\n",
+ "4 24.59 3.61 Female No Sun Dinner 4\n",
+ ".. ... ... ... ... ... ... ...\n",
+ "239 29.03 5.92 Male No Sat Dinner 3\n",
+ "240 27.18 2.00 Female Yes Sat Dinner 2\n",
+ "241 22.67 2.00 Male Yes Sat Dinner 2\n",
+ "242 17.82 1.75 Male No Sat Dinner 2\n",
+ "243 18.78 3.00 Female No Thur Dinner 2\n",
+ "\n",
+ "[244 rows x 7 columns]"
+ ]
+ },
+ "execution_count": 10,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "data"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 13,
+ "id": "d35578b5-b9d5-4a2b-99a7-fc596e498865",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "1 1.66\n",
+ "2 3.50\n",
+ "3 3.31\n",
+ "5 4.71\n",
+ "6 2.00\n",
+ " ... \n",
+ "236 1.00\n",
+ "237 1.17\n",
+ "239 5.92\n",
+ "241 2.00\n",
+ "242 1.75\n",
+ "Name: tip, Length: 157, dtype: float64"
+ ]
+ },
+ "execution_count": 13,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "data.loc[data['sex'] == 'Male', 'tip']"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 36,
+ "id": "59a646d6-59c6-444c-bddc-32b9b232b9f8",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#定义画Feature vs tip的函数\n",
+ "def plot_feature_vs_tip(feature): \n",
+ " plt.boxplot([data.loc[data[feature] == i,'tip'] for i in set(data[feature])],\n",
+ " labels=[i for i in set(data[feature])],\n",
+ " patch_artist=True,\n",
+ " boxprops=dict(color='blue')\n",
+ " )\n",
+ " plt.xlabel(feature)\n",
+ " plt.ylabel(\"tip\")\n",
+ " plt.title(feature+\" for tip\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 37,
+ "id": "2ac65625-6574-4d6a-8d10-5e40a1584a05",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "plot_feature_vs_tip(\"sex\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 38,
+ "id": "02f2d1da",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "iVBORw0KGgoAAAANSUhEUgAAAjMAAAHFCAYAAAAHcXhbAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8qNh9FAAAACXBIWXMAAA9hAAAPYQGoP6dpAAAtjklEQVR4nO3de1xVdb7/8fdmE4gIqBCKgcJIhAV5q6kgEkLzjOkjImpS86jUWGesk2ajaZZaCdVop46OTjqlnnG0GkRqqCnNS5KSI1iTNHjJJHXEUCvAa8Jevz/6uac94gUF9v5uX8/HYz10f9d3rfVh13a9+a7vWttmWZYlAAAAQ/m4uwAAAICLQZgBAABGI8wAAACjEWYAAIDRCDMAAMBohBkAAGA0wgwAADAaYQYAABiNMAMAAIxGmAHQIhYuXCibzaaSkhK3HP+HH37QQw89pIiICNntdvXo0aNZj7dkyRK9/PLLDa6z2WyaOnVqsx4fuJT4ursAAGgJc+fO1auvvqpZs2apd+/eatOmTbMeb8mSJSorK9OYMWNOW1dcXKzIyMhmPT5wKSHMAPAKlmXp+PHjCggIaHB9WVmZAgIC9PDDDzfZMY8dO3bG453NjTfe2GQ1AOAyE+DVDhw4oFGjRikqKkr+/v66/PLLlZycrA8//NDZJzU1VQkJCSouLlZSUpICAgIUHR2tBQsWSJLeffdd9erVS61bt1ZiYqLef//9047z8ccfKz09XUFBQWrdurWSkpL07rvvnrO+yspK9e7dW1deeaV27NghSaqpqdHjjz+umJgY+fn56YorrtCYMWN05MgRl21tNpsefvhh/f73v1e3bt3k7++vRYsWNXgcm82mP/zhDzp27JhsNptsNpsWLlwoSTp+/LgmTpzocrzRo0fr+++/d9lHdHS0Bg4cqPz8fPXs2VOtWrXStGnTGjxeamqq3n33XX399dfO49lsNpd6fnqZ6dQluJUrV2rkyJFq3769AgMDNWjQIH311VfnfB+BSx0jM4AXGzZsmDZv3qzp06crLi5O33//vTZv3qxDhw659Nu/f79Gjhyp8ePHKzIyUrNmzVJ2drb27NmjvLw8TZo0SSEhIXrmmWeUkZGhr776Sp06dZIkffTRR+rXr5+uvfZavfbaa/L399ecOXM0aNAgLV26VL/85S8brK2srEwDBgxQZGSkiouLFRYWpqNHj6pPnz7au3evJk2apGuvvVZffPGFnn76aW3ZskUffvihSygoKChQUVGRnn76aXXs2FHh4eENHqu4uFjPPvus1qxZo9WrV0uSunbtKsuylJGRoVWrVmnixIlKSUnR559/rilTpqi4uFjFxcXy9/d37mfz5s0qLy/X5MmTFRMTo8DAwAaPN2fOHI0aNUo7d+7U8uXLz/u/1/33369+/fppyZIl2rNnjyZPnqzU1FR9/vnnatu27XnvB7jkWAC8Vps2bawxY8actU+fPn0sSVZJSYmz7dChQ5bdbrcCAgKsf/7zn872zz77zJJk/e///q+z7cYbb7TCw8Ot2tpaZ1tdXZ2VkJBgRUZGWg6Hw7Isy1qwYIElydq0aZO1cuVKKzg42MrKyrKOHTvm3C43N9fy8fGxNm3a5FJjXl6eJcl67733nG2SrJCQEOvbb789r/di+PDhVmBgoEvb+++/b0myXnzxRZf2N99805JkzZs3z9nWpUsXy263W9u2bTuv491+++1Wly5dGlwnyZoyZYrz9an35s4773Tpt379ekuS9dxzz53XMYFLFZeZAC/285//XAsXLtRzzz2nTz75RCdPnmywX0REhHr37u183b59e4WHh6tHjx7OERhJ6tatmyTp66+/liQdOXJEGzduVFZWlsuEWrvdrmHDhmnv3r3atm2by7EWLVqkAQMG6IEHHtBbb72lVq1aOdcVFhYqISFBPXr0UF1dnXPp37+/bDab1q5d67KvW2+9Ve3atbuwN0dyjtKMGDHCpf3uu+9WYGCgVq1a5dJ+7bXXKi4u7oKPdy5Dhw51eZ2UlKQuXbpozZo1zXZMwBsQZgAv9uabb2r48OH6wx/+oJtuuknt27fXf/7nf2r//v0u/dq3b3/atn5+fqe1+/n5Sfpxnokkfffdd7IsSxEREadtfyoE/fslrTfeeEMBAQF64IEHXC4ZSdI333yjzz//XJdddpnLEhQUJMuydPDgQZf+DR23MQ4dOiRfX19dfvnlLu02m00dO3Y8rfaLPd65dOzYscG2f68DgCvmzABeLCwsTC+//LJefvll7d69W++8846eeOIJVVVVNTiRt7HatWsnHx8fVVZWnrZu3759zhp+6k9/+pOeeuop9enTRytWrHB53ktYWJgCAgL0+uuvn/Hn+al/D0ONFRoaqrq6Oh04cMAl0FiWpf379+v6669v0uOdy7+HzFNtsbGxzXpcwHSMzACXiM6dO+vhhx9Wv379tHnz5ibZZ2BgoG644Qbl5+fr2LFjznaHw6HFixcrMjLytMsy7du314cffqhu3bopLS1Nn3zyiXPdwIEDtXPnToWGhuq66647bYmOjm6Suk9JT0+XJC1evNilfdmyZTpy5Ihz/YXw9/d3eU/Ox5/+9CeX1xs2bNDXX3+t1NTUC64DuBQwMgN4qerqaqWlpWnIkCGKj49XUFCQNm3apPfff1+ZmZlNdpzc3Fz169dPaWlpevzxx+Xn56c5c+aorKxMS5cubXA0IygoyFlHv3799M477ygtLU1jxozRsmXLdMstt2js2LG69tpr5XA4tHv3bq1YsULjxo3TDTfc0GS19+vXT/3799eECRNUU1Oj5ORk591MPXv21LBhwy5434mJicrPz9fcuXPVu3dv+fj46LrrrjvrNiUlJXrggQd09913a8+ePXryySd1xRVX6Ne//vUF1wFcCggzgJdq1aqVbrjhBv3xj39URUWFTp48qc6dO2vChAkaP358kx2nT58+Wr16taZMmaIRI0bI4XCoe/fueueddzRw4MAzbhcQEKC3335bQ4YM0YABA7Rs2TINGDBARUVFev755zVv3jzt2rVLAQEB6ty5s/r27dvkIzM2m00FBQWaOnWqFixYoOnTpyssLEzDhg1TTk6Oy23ZjfXoo4/qiy++0KRJk1RdXS3LsmRZ1lm3ee211/THP/5R9957r06cOKG0tDS98sorDc5pAvAvNutcny4AQLNauHChRo4cqU2bNp1z9AbA6ZgzAwAAjEaYAQAARuMyEwAAMBojMwAAwGiEGQAAYDTCDAAAMJrXP2fG4XBo3759CgoKavZHkQMAgKZhWZZqa2vVqVMn+ficfezF68PMvn37FBUV5e4yAADABdizZ48iIyPP2sfrw0xQUJCkH9+M4OBgN1cDAADOR01NjaKiopzn8bPx+jBz6tJScHAwYQYAAMOczxQRJgADAACjEWYAAIDRCDMAAMBohBkAAGA0wgwAADAaYQYAABiNMAMAAIxGmAEAAEYjzAAAAKN5/ROAAQDmq6+vV1FRkSorKxUREaGUlBTZ7XZ3lwUP4daRmXXr1mnQoEHq1KmTbDabCgoKXNZblqWpU6eqU6dOCggIUGpqqr744gv3FAsAcIv8/HzFxsYqLS1NQ4YMUVpammJjY5Wfn+/u0uAh3Bpmjhw5ou7du2v27NkNrn/xxRf10ksvafbs2dq0aZM6duyofv36qba2toUrBQC4Q35+vrKyspSYmKji4mLV1taquLhYiYmJysrKItBAkmSzLMtydxHSj18ktXz5cmVkZEj6cVSmU6dOGjNmjCZMmCBJOnHihDp06KAXXnhBDz744Hntt6amRiEhIaquruaLJgHAIPX19YqNjVViYqIKCgrk4/Ov378dDocyMjJUVlamHTt2cMnJCzXm/O2xE4B37dql/fv367bbbnO2+fv7q0+fPtqwYcMZtztx4oRqampcFgCAeYqKilRRUaFJkya5BBlJ8vHx0cSJE7Vr1y4VFRW5qUJ4Co8NM/v375ckdejQwaW9Q4cOznUNyc3NVUhIiHOJiopq1joBAM2jsrJSkpSQkNDg+lPtp/rh0uWxYeYUm83m8tqyrNPafmrixImqrq52Lnv27GnuEgEAzSAiIkKSVFZW1uD6U+2n+uHS5bFhpmPHjpJ02ihMVVXVaaM1P+Xv76/g4GCXBQBgnpSUFEVHRysnJ0cOh8NlncPhUG5urmJiYpSSkuKmCuEpPDbMxMTEqGPHjlq5cqWz7YcfftBHH32kpKQkN1YGAGgJdrtdM2fOVGFhoTIyMlzuZsrIyFBhYaFmzJjB5F+496F5hw8f1pdfful8vWvXLn322Wdq3769OnfurDFjxignJ0dXXnmlrrzySuXk5Kh169YaMmSIG6sGALSUzMxM5eXlady4cS6/yMbExCgvL0+ZmZlurA6ewq23Zq9du1ZpaWmntQ8fPlwLFy6UZVmaNm2aXn31VX333Xe64YYb9Lvf/e6Mk8Eawq3ZAGA+ngB86WnM+dtjnjPTXAgzAACYxyueMwMAAHA+CDMAAMBohBkAAGA0wgwAADAaYQYAABiNMAMAAIxGmAEAAEYjzAAAAKMRZgAAgNEIMwAAwGiEGQAAYDTCDAAAMBphBgAAGI0wAwAAjEaYAQAARiPMAAAAoxFmAACA0QgzAADAaIQZAABgNMIMAAAwGmEGAAAYjTADAACMRpgBAABGI8wAAACjEWYAAIDRfN1dANBU6uvrVVRUpMrKSkVERCglJUV2u93dZQEAmhkjM/AK+fn5io2NVVpamoYMGaK0tDTFxsYqPz/f3aUBAJoZYQbGy8/PV1ZWlhITE1VcXKza2loVFxcrMTFRWVlZBBoA8HI2y7IsdxfRnGpqahQSEqLq6moFBwe7uxw0sfr6esXGxioxMVEFBQXy8flXPnc4HMrIyFBZWZl27NjBJScAMEhjzt+MzMBoRUVFqqio0KRJk1yCjCT5+Pho4sSJ2rVrl4qKitxUIQCguRFmYLTKykpJUkJCQoPrT7Wf6gcA8D6EGRgtIiJCklRWVtbg+lPtp/oBALwPYQZGS0lJUXR0tHJycuRwOFzWORwO5ebmKiYmRikpKW6qEADQ3AgzMJrdbtfMmTNVWFiojIwMl7uZMjIyVFhYqBkzZjD5FwC8GA/Ng/EyMzOVl5encePGKSkpydkeExOjvLw8ZWZmurE6AEBz49ZseA2eAAwA3qMx529GZuA17Ha7UlNT3V0GAKCFMWcGAAAYjTADAACMRpgBAABGI8wAAACjEWYAAIDRCDMAAMBohBkAAGA0wgwAADAaYQYAABiNMAMAAIxGmAEAAEYjzAAAAKMRZgAAgNEIMwAAwGiEGQAAYDTCDAAAMBphBgAAGI0wAwAAjEaYAQAARiPMAAAAoxFmAACA0QgzAADAaIQZAABgNMIMAAAwGmEGAAAYjTADAACM5tFhpq6uTpMnT1ZMTIwCAgL0s5/9TM8884wcDoe7SwMAAB7C190FnM0LL7yg3//+91q0aJGuueYalZSUaOTIkQoJCdGjjz7q7vIAAIAH8OgwU1xcrDvuuEO33367JCk6OlpLly5VSUmJmysDAACewqMvM918881atWqVtm/fLkn6+9//ro8//lgDBgxwc2UAAMBTePTIzIQJE1RdXa34+HjZ7XbV19dr+vTpGjx48Bm3OXHihE6cOOF8XVNT0xKlAgAAN/HokZk333xTixcv1pIlS7R582YtWrRIM2bM0KJFi864TW5urkJCQpxLVFRUC1YMAABams2yLMvdRZxJVFSUnnjiCY0ePdrZ9txzz2nx4sXaunVrg9s0NDITFRWl6upqBQcHN3vNAADg4tXU1CgkJOS8zt8efZnp6NGj8vFxHTyy2+1nvTXb399f/v7+zV0aAADwEB4dZgYNGqTp06erc+fOuuaaa/Tpp5/qpZdeUnZ2trtLAwAAHsKjLzPV1tbqqaee0vLly1VVVaVOnTpp8ODBevrpp+Xn53de+2jMMBUAAPAMjTl/e3SYaQqEGQAAzNOY87dH380EAABwLoQZAABgNMIMAAAwGmEGAAAYzaNvzQYao76+XkVFRaqsrFRERIRSUlJkt9vdXRYAoJkxMgOvkJ+fr9jYWKWlpWnIkCFKS0tTbGys8vPz3V0aAKCZEWZgvPz8fGVlZSkxMVHFxcWqra1VcXGxEhMTlZWVRaABAC/Hc2ZgtPr6esXGxioxMVEFBQUuX3/hcDiUkZGhsrIy7dixg0tOAGAQnjODS0ZRUZEqKio0adKk077Hy8fHRxMnTtSuXbtUVFTkpgoBAM2NMAOjVVZWSpISEhIaXH+q/VQ/AID3IczAaBEREZKksrKyBtefaj/VDwDgfQgzMFpKSoqio6OVk5Mjh8Phss7hcCg3N1cxMTFKSUlxU4UAgOZGmIHR7Ha7Zs6cqcLCQmVkZLjczZSRkaHCwkLNmDGDyb8A4MV4aB6Ml5mZqby8PI0bN05JSUnO9piYGOXl5SkzM9ON1QEAmhu3ZsNr8ARgAPAejTl/MzIDr2G325WamuruMgAALYw5MwAAwGiEGQAAYDTCDAAAMBphBgAAGI0wAwAAjEaYAQAARiPMAAAAoxFmAACA0QgzAADAaIQZAABgNMIMAAAwGmEGAAAYjTADAACMRpgBAABGI8wAAACjEWYAAIDRCDMAAMBohBkAAGA0wgwAADAaYQYAABjN190FAE2lvr5eRUVFqqysVEREhFJSUmS3291dFgCgmTEyA6+Qn5+vrl27Ki0tTUOGDFFaWpq6du2q/Px8d5cGAGhmhBkYLz8/X3fddZeqqqpc2quqqnTXXXcRaADAyxFmYLT6+no99NBDkqT09HQVFxertrZWxcXFSk9PlyT913/9l+rr691ZJgCgGRFmYLS1a9fqwIEDuvnmm/X222/rxhtvVJs2bXTjjTfq7bff1s0336yqqiqtXbvW3aUCAJoJYQZGOxVSpk2bJh8f1/+dfXx8NGXKFJd+AADvQ5gBAABGI8zAaKmpqZKkKVOmyOFwuKxzOByaOnWqSz8AgPchzMBoqampCg8P18cff6w77rjDZQLwHXfcofXr1ys8PJwwAwBejIfmwWh2u11z585VVlaWVq1apcLCQue61q1by2azae7cuTw8DzDcDz/8oDlz5mjnzp3q2rWrfv3rX8vPz8/dZcFDMDID42VmZiovL08dOnRwae/QoYPy8vKUmZnppsoANIXx48crMDBQY8eO1ezZszV27FgFBgZq/Pjx7i4NHoIwA6+QmZmpL7/8UmvWrNGSJUu0Zs0a7dixgyADGG78+PH67W9/q9DQUM2fP1+VlZWaP3++QkND9dvf/pZAA0mSzbIsy91FNKeamhqFhISourpawcHB7i4HAHCefvjhBwUGBio0NFR79+6Vr++/ZkbU1dUpMjJShw4d0pEjR7jk5IUac/5mZAYA4JHmzJmjuro6Pffccy5BRpJ8fX31zDPPqK6uTnPmzHFThfAUhBkAgEfauXOnJGngwIENrj/VfqofLl2EGQCAR+rataskudyl+FOn2k/1w6WLOTMAAI/EnJlLG3NmAADG8/Pz09ixY/XNN98oMjJS8+bN0759+zRv3jxFRkbqm2++0dixYwky4KF5AADP9eKLL0qS/ud//kcPPvigs93X11e/+c1vnOtxaeMyEwDA4/EE4EtPY87fhBkAAOBxmDMDAAAuGYQZAABgNMIMAAAwGmEGAAAYjTADAACMRpgBAABG46F58Br19fUqKipSZWWlIiIilJKSIrvd7u6yAADNzONHZv75z3/qvvvuU2hoqFq3bq0ePXqotLTU3WXBw+Tn5ys2NlZpaWkaMmSI0tLSFBsbq/z8fHeXBgBoZh4dZr777jslJyfrsssu01//+lf94x//0MyZM9W2bVt3lwYPkp+fr6ysLCUmJqq4uFi1tbUqLi5WYmKisrKyCDQA4OU8+gnATzzxhNavX6+ioqIL3gdPAPZu9fX1io2NVWJiopYtW6b169c7LzMlJyfrrrvuUllZmXbs2MElJwAwiNc8Afidd97Rddddp7vvvlvh4eHq2bOn5s+ff9ZtTpw4oZqaGpcF3quoqEgVFRVKSkpSXFycy2WmuLg43XTTTdq1a9dFBWIAgGfz6DDz1Vdfae7cubryyiv1wQcf6KGHHtJ///d/6//+7//OuE1ubq5CQkKcS1RUVAtWjJZWWVkpSZo0aVKDl5mefPJJl34AAO/j0WHG4XCoV69eysnJUc+ePfXggw/qV7/6lebOnXvGbSZOnKjq6mrnsmfPnhasGC0tPDxckpScnKxly5bp+PHj+stf/qLjx49r2bJlSk5OdukHAPA+Hn1rdkREhK6++mqXtm7dumnZsmVn3Mbf31/+/v7NXRo8zMGDBxUXF6eKigpnW3R0tFq1auW+ogAALcKjR2aSk5O1bds2l7bt27erS5cubqoInqaqqkqStHXrVh07dkzz5s3Tvn37NG/ePB07dkxbt2516QcA8D4ePTIzduxYJSUlKScnR/fcc4/+9re/ad68eZo3b567S4OHOHX5KD4+XseOHdOoUaOc66KjoxUfH6+tW7dymQkAvJhHj8xcf/31Wr58uZYuXaqEhAQ9++yzevnllzV06FB3lwYPExYWph07dmjNmjVasmSJ1qxZo+3btyssLMzdpQEAmplHhxlJGjhwoLZs2aLjx4+rvLxcv/rVr9xdEjzIqctH69ev11133SV/f38NHDhQ/v7+uuuuu7R+/XqXfgAA7+PxYQY4m4iICElSTk6OtmzZoqSkJAUHByspKUllZWWaPn26Sz8AgPfx6DkzwLmkpKQoOjpaGzZs0Pbt2xt8AnBMTIxSUlLcXSoAoJkwMgOj2e12zZw5U4WFhQ1eZiosLNSMGTP4KgMA8GKMzMB4mZmZysvL07hx45SUlORsj4mJUV5enjIzM91YHQCguXn0F002Bb5o8tJRX1+voqIi52WmlJQURmQAwFCNOX9f8MhMSUmJysvLZbPZFB8fr+uuu+5CdwU0CbvdrtTUVHeXAeACHD161PmQyzM5duyYKioqFB0drYCAgLP2jY+PV+vWrZuyRHiwRoeZvXv3avDgwVq/fr3atm0rSfr++++VlJSkpUuX8sWOAIBG27p1q3r37t1k+ystLVWvXr2abH/wbI0OM9nZ2Tp58qTKy8t11VVXSZK2bdum7Oxs3X///VqxYkWTFwlI/OYGeLP4+HiVlpaetU95ebnuu+8+LV68WN26dTvn/nDpaHSYKSoq0oYNG5xBRpKuuuoqzZo1y/kNxUBz4Dc3wHu1bt36vD+P3bp147MLF40OM507d9bJkydPa6+rq9MVV1zRJEUBDeE3NwBAQxodZl588UU98sgj+t3vfqfevXvLZrOppKREjz76qGbMmNEcNQKS+M0NANCwRoeZESNG6OjRo7rhhhvk6/vj5nV1dfL19VV2drays7Odfb/99tumqxQAAKABjQ4zL7/8cjOUAQAAcGEaHWaGDx/eHHUAAABckPMKMzU1Nc6n79XU1Jy1L0/ZBQAALem8wky7du1UWVmp8PBwtW3bVjab7bQ+lmXJZrOpvr6+yYsEAAA4k/MKM6tXr1b79u0lSQsWLFBUVNRp33njcDi0e/fupq8QAADgLM4rzPTp08f59+zsbOcozU8dOnRIffv2ZU4NAABoUT6N3eDU5aR/d/jwYbVq1apJigIAADhf530302OPPSZJstlseuqpp1y+06a+vl4bN25Ujx49mrxAAACAsznvMPPpp59K+nFkZsuWLfLz83Ou8/PzU/fu3fX44483fYUAAABncd5hZs2aNZKkkSNH6pVXXuEWbAAA4BEa/dC8BQsWNEcdAAAAF6TRE4ABAAA8CWEGAAAYjTADAACMRpgBAABGI8wAAACjEWYAAIDRCDMAAMBohBkAAGA0wgwAADAaYQYAABiNMAMAAIxGmAEAAEYjzAAAAKMRZgAAgNEIMwAAwGiEGQAAYDTCDAAAMBphBgAAGI0wAwAAjEaYAQAARiPMAAAAoxFmAACA0QgzAADAaIQZAABgNMIMAAAwGmEGAAAYjTADAACMRpgBAABGI8wAAACjEWYAAIDRCDMAAMBohBkAAGA0wgwAADAaYQYAABiNMAMAAIxGmAEAAEYjzAAAAKMRZgAAgNEIMwAAwGiEGQAAYDTCDAAAMJpRYSY3N1c2m01jxoxxdykAAMBDGBNmNm3apHnz5unaa691dykAAMCDGBFmDh8+rKFDh2r+/Plq166du8sBAAAexIgwM3r0aN1+++3q27fvOfueOHFCNTU1LgsAAPBevu4u4FzeeOMNbd68WZs2bTqv/rm5uZo2bVozVwUAADyFR4/M7NmzR48++qgWL16sVq1andc2EydOVHV1tXPZs2dPM1cJAADcyaNHZkpLS1VVVaXevXs72+rr67Vu3TrNnj1bJ06ckN1ud9nG399f/v7+LV0qAABwE48OM+np6dqyZYtL28iRIxUfH68JEyacFmQAAMClx6PDTFBQkBISElzaAgMDFRoaelo7AAC4NHn0nBkAAIBz8eiRmYasXbvW3SUAAAAPwsgMAAAwGmEGAAAYjTADAACMRpgBAABGM24CMADALLt3SwcPXvx+ystd/7xYYWFS585Nsy+4F2EGANBsdu+Wroq3dPyYrcn2ed99TbOfVgGWtm21EWi8AGEGANBsDh6Ujh+zKXTgp7os9PBF7cuqO6666vnyDTkkm2/RRe3r5KE2OlTYUwcPMjrjDQgzAIBmd1noYfl3rLn4HUVGSPrh/y/Aj5gADAAAjEaYAQAARuMyEzwCdzsAAC4UYQZux90OAICLQZiB23G3AwDgYhBm4DG42wEAcCGYAAwAAIxGmAEAAEYjzAAAAKMRZgAAgNEIMwAAwGiEGQAAYDTCDAAAMBphBgAAGI0wAwAAjEaYAQAARiPMAAAAoxFmAACA0QgzAADAaIQZAABgNMIMAAAwmq+7CwAAeC9b3VH17LhdYX5f6jLbEXeX43TSL1ARHYNkq4uT1Nrd5eAiEWYAAM2m1eHt2vxgH3eXcbpOkh6Uyg9/JKmHm4vBxSLMAACazfE2cer16kcKG/SpLgv1oJGZQ4E6+Jeeem1AnLtLQRMgzAAAmo3l21qf7u+hjj/Uyt+qcXc5Tid+CNb+/T1kcRb0CkwABgAARiPMAAAAoxFmAACA0QgzAADAaIQZAABgNMIMAAAwGjelwe14QigA4GIQZuB2PCEUAHAxCDNwO54QCgC4GIQZuB1PCAUAXAwmAAMAAKMRZgAAgNEIMwAAwGiEGQAAYDTCDAAAMBphBgAAGI0wAwAAjEaYAQAARiPMAAAAoxFmAACA0QgzAADAaIQZAABgNMIMAAAwGmEGAAAYjTADAACMRpgBAABGI8wAAACjEWYAAIDRCDMAAMBohBkAAGA0jw4zubm5uv766xUUFKTw8HBlZGRo27Zt7i4LAAB4EI8OMx999JFGjx6tTz75RCtXrlRdXZ1uu+02HTlyxN2lAQAAD+Hr7gLO5v3333d5vWDBAoWHh6u0tFS33HKLm6oCAACexKNHZv5ddXW1JKl9+/ZurgQAAHgKjx6Z+SnLsvTYY4/p5ptvVkJCwhn7nThxQidOnHC+rqmpaYny0AROHmpz0fuw6o6rrvpr+YZ0kc23ldvrAfAjPt9oTsaEmYcffliff/65Pv7447P2y83N1bRp01qoKjSFsDCpVYClQ4U9m2BvmyXdJqlUUq+L3lurAEthYbaL3g9wqeLzjZZgsyzLcncR5/LII4+ooKBA69atU0xMzFn7NjQyExUVperqagUHBzd3qbhAu3dLBw9e/H7Kyzfrvvt6a/HiUnXrdvH/2IWFSZ07X3xdwKWMzzcuRE1NjUJCQs7r/O3RIzOWZemRRx7R8uXLtXbt2nMGGUny9/eXv79/C1SHptS5c9P+o9Ktm9Tr4v+tA9AE+HyjuXl0mBk9erSWLFmit99+W0FBQdq/f78kKSQkRAEBAW6uDgAAeAKPvptp7ty5qq6uVmpqqiIiIpzLm2++6e7SAACAh/DokRkDpvMAAAA38+iRGQAAgHMhzAAAAKMRZgAAgNEIMwAAwGiEGQAAYDTCDAAAMBphBgAAGI0wAwAAjEaYAQAARiPMAAAAoxFmAACA0QgzAADAaIQZAABgNMIMAAAwGmEGAAAYjTADAACMRpgBAABGI8wAAACjEWYAAIDRCDMAAMBohBkAAGA0wgwAADAaYQYAABiNMAMAAIxGmAEAAEYjzAAAAKMRZgAAgNEIMwAAwGiEGQAAYDTCDAAAMJqvuwsAztfRo0e1devWs/YpLy93+fNs4uPj1bp16yapDcDF4fONi2GzLMtydxHNqaamRiEhIaqurlZwcLC7y8FF2Lx5s3r37t1k+ystLVWvXr2abH8ALhyfb/y7xpy/GZmBMeLj41VaWnrWPseOHVNFRYWio6MVEBBwzv0B8Ax8vnExGJkBAAAepzHnbyYAAwAAoxFmAACA0QgzAADAaIQZAABgNMIMAAAwGmEGAAAYjTADAACMRpgBAABGI8wAAACjEWYAAIDRCDMAAMBohBkAAGA0wgwAADCar7sLaG6nvhS8pqbGzZUAAIDzdeq8feo8fjZeH2Zqa2slSVFRUW6uBAAANFZtba1CQkLO2sdmnU/kMZjD4dC+ffsUFBQkm83m7nLQzGpqahQVFaU9e/YoODjY3eUAaEJ8vi8tlmWptrZWnTp1ko/P2WfFeP3IjI+PjyIjI91dBlpYcHAw/9gBXorP96XjXCMypzABGAAAGI0wAwAAjEaYgVfx9/fXlClT5O/v7+5SADQxPt84E6+fAAwAALwbIzMAAMBohBkAAGA0wgwAADAaYQYAABiNMAPjjBgxQjabTc8//7xLe0FBAU95BgxkWZb69u2r/v37n7Zuzpw5CgkJ0e7du91QGUxBmIGRWrVqpRdeeEHfffedu0sBcJFsNpsWLFigjRs36tVXX3W279q1SxMmTNArr7yizp07u7FCeDrCDIzUt29fdezYUbm5uWfss2zZMl1zzTXy9/dXdHS0Zs6c2YIVAmiMqKgovfLKK3r88ce1a9cuWZal+++/X+np6fr5z3+uAQMGqE2bNurQoYOGDRumgwcPOrfNy8tTYmKiAgICFBoaqr59++rIkSNu/GnQ0ggzMJLdbldOTo5mzZqlvXv3nra+tLRU99xzj+69915t2bJFU6dO1VNPPaWFCxe2fLEAzsvw4cOVnp6ukSNHavbs2SorK9Mrr7yiPn36qEePHiopKdH777+vb775Rvfcc48kqbKyUoMHD1Z2drbKy8u1du1aZWZmikeoXVp4aB6MM2LECH3//fcqKCjQTTfdpKuvvlqvvfaaCgoKdOedd8qyLA0dOlQHDhzQihUrnNuNHz9e7777rr744gs3Vg/gbKqqqpSQkKBDhw4pLy9Pn376qTZu3KgPPvjA2Wfv3r2KiorStm3bdPjwYfXu3VsVFRXq0qWLGyuHOzEyA6O98MILWrRokf7xj3+4tJeXlys5OdmlLTk5WTt27FB9fX1LlgigEcLDwzVq1Ch169ZNd955p0pLS7VmzRq1adPGucTHx0uSdu7cqe7duys9PV2JiYm6++67NX/+fObSXYIIMzDaLbfcov79+2vSpEku7ZZlnXZnE4OQgBl8fX3l6+srSXI4HBo0aJA+++wzl2XHjh265ZZbZLfbtXLlSv31r3/V1VdfrVmzZumqq67Srl273PxToCX5ursA4GI9//zz6tGjh+Li4pxtV199tT7++GOXfhs2bFBcXJzsdntLlwjgAvXq1UvLli1TdHS0M+D8O5vNpuTkZCUnJ+vpp59Wly5dtHz5cj322GMtXC3chZEZGC8xMVFDhw7VrFmznG3jxo3TqlWr9Oyzz2r79u1atGiRZs+erccff9yNlQJorNGjR+vbb7/V4MGD9be//U1fffWVVqxYoezsbNXX12vjxo3KyclRSUmJdu/erfz8fB04cEDdunVzd+loQYQZeIVnn33W5TJSr1699NZbb+mNN95QQkKCnn76aT3zzDMaMWKE+4oE0GidOnXS+vXrVV9fr/79+yshIUGPPvqoQkJC5OPjo+DgYK1bt04DBgxQXFycJk+erJkzZ+oXv/iFu0tHC+JuJgAAYDRGZgAAgNEIMwAAwGiEGQAAYDTCDAAAMBphBgAAGI0wAwAAjEaYAQAARiPMAPAqI0aMUEZGhrvLANCCCDMAAMBohBkAOIuTJ0+6uwQA50CYAdCs8vLylJiYqICAAIWGhqpv3746cuSI83JQTk6OOnTooLZt22ratGmqq6vTb37zG7Vv316RkZF6/fXXXfa3ZcsW3Xrrrc79jRo1SocPHz7j8UtLSxUeHq7p06dLkqqrqzVq1CiFh4crODhYt956q/7+9787+0+dOlU9evTQ66+/rp/97Gfy9/cX3/oCeDbCDIBmU1lZqcGDBys7O1vl5eVau3atMjMzneFg9erV2rdvn9atW6eXXnpJU6dO1cCBA9WuXTtt3LhRDz30kB566CHt2bNHknT06FH9x3/8h9q1a6dNmzbpz3/+sz788EM9/PDDDR5/7dq1Sk9P17Rp0/Tkk0/Ksizdfvvt2r9/v9577z2VlpaqV69eSk9P17fffuvc7ssvv9Rbb72lZcuW6bPPPmv29wnARbIAoJmUlpZakqyKiorT1g0fPtzq0qWLVV9f72y76qqrrJSUFOfruro6KzAw0Fq6dKllWZY1b948q127dtbhw4edfd59913Lx8fH2r9/v3O/d9xxh1VQUGAFBQVZS5YscfZdtWqVFRwcbB0/ftyllq5du1qvvvqqZVmWNWXKFOuyyy6zqqqqmuAdANASfN0dpgB4r+7duys9PV2JiYnq37+/brvtNmVlZaldu3aSpGuuuUY+Pv8aIO7QoYMSEhKcr+12u0JDQ1VVVSVJKi8vV/fu3RUYGOjsk5ycLIfDoW3btqlDhw6SpI0bN6qwsFB//vOfdeeddzr7lpaW6vDhwwoNDXWp89ixY9q5c6fzdZcuXXT55Zc34TsBoDkRZgA0G7vdrpUrV2rDhg1asWKFZs2apSeffFIbN26UJF122WUu/W02W4NtDodDkmRZlmw2W4PH+ml7165dFRoaqtdff1233367/Pz8JEkOh0MRERFau3btadu3bdvW+fefhiUAno85MwCalc1mU3JysqZNm6ZPP/1Ufn5+Wr58+QXt6+qrr9Znn32mI0eOONvWr18vHx8fxcXFOdvCwsK0evVq7dy5U7/85S+ddyT16tVL+/fvl6+vr2JjY12WsLCwi/tBAbgNYQZAs9m4caNycnJUUlKi3bt3Kz8/XwcOHFC3bt0uaH9Dhw5Vq1atNHz4cJWVlWnNmjV65JFHNGzYMOclplPCw8O1evVqbd26VYMHD1ZdXZ369u2rm266SRkZGfrggw9UUVGhDRs2aPLkySopKWmKHxmAGxBmADSb4OBgrVu3TgMGDFBcXJwmT56smTNn6he/+MUF7a9169b64IMP9O233+r6669XVlaW0tPTNXv27Ab7d+zYUatXr9aWLVs0dOhQORwOvffee7rllluUnZ2tuLg43XvvvaqoqDgtDAEwh82yeIACAAAwFyMzAADAaIQZAABgNMIMAAAwGmEGAAAYjTADAACMRpgBAABGI8wAAACjEWYAAIDRCDMAAMBohBkAAGA0wgwAADAaYQYAABjt/wGI3oPO+7QIFAAAAABJRU5ErkJggg==\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "plot_feature_vs_tip(\"smoker\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 39,
+ "id": "31098a48-d053-4666-bf01-4ed59fb94601",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "plot_feature_vs_tip(\"day\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 41,
+ "id": "f2fa7ef2-68ff-4900-9ef9-353fac1c6d19",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "plot_feature_vs_tip(\"time\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 42,
+ "id": "ae4d913b",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "plot_feature_vs_tip(\"size\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "id": "9050a71c-23c4-4286-8188-86b0fc107026",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 100,
+ "id": "880db10e-d3f4-42ec-9b69-5c4fff7d710e",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def groupedbarplot(x_data, y_data_list, y_data_names, colors, x_label, y_label, title):\n",
+ " _, ax = plt.subplots()\n",
+ " # 设置每一组柱状图的宽度\n",
+ " x_data_len = np.arange(len(x_data))\n",
+ " total_width = 0.8\n",
+ " # 设置每一个柱状图的宽度\n",
+ " ind_width = total_width / len(y_data_list)\n",
+ " # 计算每一个柱状图的中心偏移\n",
+ " alteration = np.arange(-total_width/2+ind_width/2, total_width/2+ind_width/2, ind_width)\n",
+ "\n",
+ " # 分别绘制每一个柱状图\n",
+ " for i in range(0, len(y_data_list)):\n",
+ " # 横向散开绘制\n",
+ " ax.bar(x_data, y_data_list[i], color = colors[i], label = y_data_names[i], width = ind_width)\n",
+ " ax.set_xticks(x_data_len + alteration / 2, x_data)\n",
+ " ax.set_ylabel(y_label)\n",
+ " ax.set_xlabel(x_label)\n",
+ " ax.set_title(title)\n",
+ " ax.legend(loc = 'upper right')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "99f133df-b29a-45d8-8f14-c962bc2b311d",
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "740f78a7-d039-407f-8888-fee0806e0a6e",
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 49,
+ "id": "a8c7e586-a181-4c83-9900-8566f1a9fedb",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "mean_data = data[['tip', 'smoker', 'sex']].groupby(['smoker','sex']).mean()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 61,
+ "id": "4a06c2ab-0629-4847-bd03-c1021b4a4a4f",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "pandas.core.frame.DataFrame"
+ ]
+ },
+ "execution_count": 61,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "type(mean_data)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 69,
+ "id": "86cf51f9-8bcd-4bd1-a474-94ec764d4ba2",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "3.0511666666666666"
+ ]
+ },
+ "execution_count": 69,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "mean_data['tip'][0]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 70,
+ "id": "89d39ff3-7267-4fc6-8ddd-198f373e7ad7",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " tip \n",
+ " \n",
+ " \n",
+ " smoker \n",
+ " sex \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " Yes \n",
+ " Male \n",
+ " 3.051167 \n",
+ " \n",
+ " \n",
+ " Female \n",
+ " 2.931515 \n",
+ " \n",
+ " \n",
+ " No \n",
+ " Male \n",
+ " 3.113402 \n",
+ " \n",
+ " \n",
+ " Female \n",
+ " 2.773519 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " tip\n",
+ "smoker sex \n",
+ "Yes Male 3.051167\n",
+ " Female 2.931515\n",
+ "No Male 3.113402\n",
+ " Female 2.773519"
+ ]
+ },
+ "execution_count": 70,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "mean_data"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 104,
+ "id": "b28e80ee-39b1-48b6-b900-ed6a3634f372",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "categories = [\"male\",\"female\"]\n",
+ "values1 = [mean_data['tip'][0],mean_data['tip'][1]]\n",
+ "values2 = [mean_data['tip'][2],mean_data['tip'][3]]\n",
+ "\n",
+ "# 设置条形图的宽度\n",
+ "bar_width = 0.35\n",
+ "\n",
+ "# 计算并列的条形图的横坐标位置\n",
+ "x_data1 = np.arange(len(categories))\n",
+ "x_data2 = x_data1 + bar_width\n",
+ "\n",
+ "# 创建并列的条形图\n",
+ "\n",
+ "plt.bar(x_data1, values1, width=bar_width, color='#539caf', label='Yes')\n",
+ "plt.bar(x_data2, values2, width=bar_width, color='#7663b0', label='No')\n",
+ "\n",
+ "# 添加标签和标题\n",
+ "plt.xlabel('Sex')\n",
+ "plt.ylabel('Value of tip')\n",
+ "plt.title('values by sec and smoker')\n",
+ "plt.xticks(x_data1 + bar_width / 2, categories) # 设置横坐标刻度位置\n",
+ "\n",
+ "# 添加图例\n",
+ "plt.legend()\n",
+ "\n",
+ "# 显示图形\n",
+ "plt.show()\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "4404cefc-b606-4f66-a1e9-1ddf2fa239eb",
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "2c16e9a3-1b67-4a08-b6e8-3bba502d6ab5",
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "e729c2ab-42fb-45e5-bb75-f0db5ffa244d",
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "3a978a67-991a-4dcd-9654-2931e8104c9b",
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "id": "6951f73f-8961-4457-a6f3-afc3b30dab8a",
+ "metadata": {
+ "scrolled": true
+ },
+ "outputs": [
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 13,
+ "id": "8d1b9708-a824-4160-bcf7-9e9977183cae",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 14,
+ "id": "47193c0a-3a5a-4e9f-bda6-d5bfb603bbb2",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 15,
+ "id": "5cdcdc34-b582-4d18-b103-cd6728a727c4",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYIAAAEWCAYAAABrDZDcAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAHXhJREFUeJzt3X10XXWd7/H3hxAboAWtRBS0TWFEQ7MEJVxQepGq1ztXRRwfFuLD6DVDxXuNKI7iEJfgaBgdnS41zAhotQPSKOIDiE94aSgTqkCLCIXwJLSAPLSgIC0EQvneP/Yv5fSQnJw8nLOT7M9rrb1yzm+f89vfvc/O+Z69f3v/fooIzMysuHbJOwAzM8uXE4GZWcE5EZiZFZwTgZlZwTkRmJkVnBOBmVnBORHMApJOl/S9KaprpaQvTkVdU0HSjZKOrjD/ckn/UKNl7yPpCkmPSvq3WiwjD5J2k/QzSY9I+mGNljFl++Qo9VfcL2x8ds07ABubpK0lT3cHngC2p+cfrn9E9RMRi4cfSzod+JuIeN9E6pJ0BPAF4FCy7Xc58LGIuG+UtywDHgT2jEnecCNpJXBPRHx2MvVMkXcC+wDPj4inJltZ+kL+XkS8eLJ1jVL/Ssq2Xel+YZPnI4IZICLmDk/AXcAxJWXn5x3fDPI84BygBVgIPAp8t8LrFwI3TTYJTAVJU/mjbSFw60SSwBTHYdNFRHiaQROwEXhDWdnpwAXAuWRfbjcC7SXz9wV+BGwB7iT7FTxa/SuBs4DfpLrWAAtL5r8GuAZ4JP19TSqfD9xDlqQA5gK3A38/wjKWAjeUPP9/wNUlz/uBt5WuL/C3wJPAELAV+EOafznZr/wrU7yXAntXuS1fBTxaYTsMpWVuTTHsAnwG+CPwUNrm80ve80Pg/rRtrgAWp/JlZXX9LJUH2RFO6TK/mB4fnbbnKanO81L5W4DrgIeBtcArSt5/CvCntB1uAV4/wnp9vmw7dqT1+iywCdic9qO90utbUpwdZD9Criirbw/gceDpVN9Wsv3tdKZgn6yw7TaS/g/Ssi4EfpCWdS1wcN7/qzNpyj0AT+P8wEZPBIPAm4AG4F+A36V5uwDrgc8BzwH2B+4A/uco9a9M/0xHAXOArwP9ad584C/A+8lOKx6fnj8/zX9j+tJ6AfAt4MJRltGUvjz2TvXcD9wLzAN2S/OeX76+aT2/V1bX5WRfzAem914OfKnKbfnx4e1UYVt8sfz1wIvTtjkb6C2Z/6G0DnOArwHXjVZXKhsrETwFfDnVtxtZ4toMHJ4+5w+k7TMHeBlwN7Bven8LcMAo67XTdkxx3572jbnAj3km8bSkOM8l+9LfbYT6jiY7dVPLfbJ825XvF0Nkp7wagX8kSy6Nef+/zpTJp4Zmj/6I+EVEbAfOAw5O5YcBzRHxzxHxZETcQfYl/e4Kdf08Iq6IiCeALuDVkl4CvBm4LSLOi4inIqIXuBk4BiAiLiX7VXxZeu2I7RcRMQisI0s27cD1ZEcBRwJHpGU8NI51/25E3BoRj5P9Cj1krDdIegXZF9GnxrGcDwNdEXFP2janA+8cPl0SEd+JiEdL5h0saa9x1F/uaeC0iHgirdsJwNkRcVVEbI+I/yRrLzqCrM1jDnCQpMaI2BgRf6xyOe8FlkfEHRGxFfgn4N1lp4FOj4htKY5qTeU+OZb1EXFhRAwBy8l+bBwxifoKxef7Zo/7Sx4/BjSlf+SFwL6SHi6Z3wD8V4W67h5+EBFbJf2Z7FB+X7LTB6U2AfuVPD8H+Chwxhhf5mt45vTHGrIji9eSfbGtqfC+kZSv+9xKL5b0N8AvgZMiotJ2KLcQ+Imkp0vKtgP7SLof6AbeBTSTfYlDdtTzyDiWUWpLSpqly/+ApM6SsueQHQWskfRxsgS0WNKvgZMj4t4qllP+uW4i+27Yp6TsbsZvKvfJsZTus09LuodsvawKPiKY/e4G7oyI55ZM8yLiTRXe85LhB5Lmkp0SujdNC8teu4DsvDSSGshOl5wLfCR94Y5mOBEclR6vIUsEr2X0RDDpRltJC8naJL4QEeeN8+13A/+rbFs2RcSfgPcAx5K1JexFdkoFQBVif4zsKrBhLyybX/6eu4HusuXvno7MiIhVEbGE7DMKstNK1Sj/XBeQnZZ6oEIsleIcy3j3yWrqL91ndyE7fVdNEjScCIrgauCvkk5J1483SGqTdFiF97xJ0hJJzyFriL0qIu4GfgEcKOk9knaVdBxwEHBJet+p6e+HgK8C56bkMJK1ZOe1/xtZQ/GNZF9Gh5M1tI7kAaAl/aOPm6T9gNXAv0fEWROo4iygOyUTJDVLOjbNm0d2NPMQ2Zf7GSPEvn9Z2XXAe9Jn8rdkSbCSbwEnSjpcmT0kvVnSPEkvk/Q6SXPIzs0/zjOXGI+lF/iEpEUp8Z8B/CCqv6roAeD54zgNNt59cqRtV+5QSW9PRxwfJ/ssfldlPIXnRDDLpfOzx5CdN7+T7Lr4b5P9ah3NKuA04M9k19y/N9X1ENlVK58k+8L7NPCWiHhQ0qHAyWRXCW0n+zUaZFfZjBTXNrKrO26MiCdT8W+BTRGxeZS4hm9+ekjStWOs+kj+gewL5TRJW4encbz/68DFwKWSHiX7ojk8zTuX7JTKn4CbePaX0Aqy8/cPS/ppKjuJ7LN5mGwb/5QKImIdWTvBmWSn0m4HPphmzwG+RPb5DjfYn/rsWkb0HbJz+FeQ7SODQGfFd+wc181kyeSOtH4VT8lMYJ8caduVuwg4jmcuZnh7ai+wKigi90ukzcwmbLI3GpqPCMzMCs+JwMys4HxqyMys4HxEYGZWcDPihrK99947Wlpa8g7DzGxGWb9+/YMR0TzW62ZEImhpaWHdunV5h2FmNqNIKu8JYEQ+NWRmVnBOBGZmBedEYGZWcE4EZmYF50RgZlZwNUsEkr4jabOkDSVl8yX9RtJt6e/zarV8e7be3l7a2tpoaGigra2N3t7evEMys2mglkcEK8nGmS31GeCyiHgp2ShWI/ZMaVOvt7eXrq4uenp6GBwcpKenh66uLicDM6ttFxOSWoBLIqItPb8FODoi7pP0IuDyiHjZWPW0t7eH7yOYnLa2Nnp6eli6dOmOsr6+Pjo7O9mwYUOFd5rZTCVpfUS0j/m6OieChyPiuSXz/xIRI54ekrQMWAawYMGCQzdtquq+CBtFQ0MDg4ODNDY27igbGhqiqamJ7durHb/EzGaSahPBtG0sjohzIqI9Itqbm8e8Q9rG0NraSn9//05l/f39tLa25hSRmU0X9U4ED6RTQqS/o41EZVOsq6uLjo4O+vr6GBoaoq+vj46ODrq6uvIOzcxyVu++hi4GPkA2pN4HyIaXszo4/vjjAejs7GRgYIDW1la6u7t3lJtZcdWsjUBSL3A0sDfZ4NOnkY3JegGwALgLeFdE/HmsutxYbGY2ftW2EdTsiCAiRvup+fpaLdPMzMZv2jYWm5lZfTgRmJkVnBOBmVnBORGYmRWcE4GZWcE5EZiZFZwTgZlZwTkRmJkVnBOBmVnBORGYmRWcE4GZWcE5EZiZFZwTgZlZwTkRFEhvby9tbW00NDTQ1tbmgevNDKj/wDSWk97eXrq6ulixYgVLliyhv7+fjo4OAA9OY1ZwNR28fqp4YJrJa2tro6enh6VLl+4o6+vro7Ozkw0bNuQYmZnVSrUD0zgRFERDQwODg4M0NjbuKBsaGqKpqYnt27fnGJmZ1Uq1icBtBAXR2tpKf3//TmX9/f20trbmFJGZTRdOBAXR1dVFR0cHfX19DA0N0dfXR0dHB11dXXmHZmY5c2NxQQw3CHd2djIwMEBrayvd3d1uKDYztxGYmc1WbiMwM7OqOBGYmRWcE4GZWcE5EZiZFZwTgZlZwTkRmJkVnBOBmVnBORGYmRWcE4GZWcE5EZiZFZwTgZlZwTkRmJkVnBOBmVnBORGYmRVcLolA0ick3Shpg6ReSU15xGFmZjkkAkn7AR8D2iOiDWgA3l3vOMzMLJPXqaFdgd0k7QrsDtybUxxmZoVX90QQEX8CvgrcBdwHPBIRl5a/TtIySeskrduyZUu9wzQzK4w8Tg09DzgWWATsC+wh6X3lr4uIcyKiPSLam5ub6x2mmVlh5HFq6A3AnRGxJSKGgB8Dr8khDjMzI59EcBdwhKTdJQl4PTCQQxxmZkY+bQRXARcC1wI3pBjOqXccZmaW2TWPhUbEacBpeSzbzMx25juLC6S3t5e2tjYaGhpoa2ujt7c375DMbBrI5YjA6q+3t5euri5WrFjBkiVL6O/vp6OjA4Djjz8+5+jMLE+KiLxjGFN7e3usW7cu7zBmtLa2Nnp6eli6dOmOsr6+Pjo7O9mwYUOOkZlZrUhaHxHtY77OiaAYGhoaGBwcpLGxcUfZ0NAQTU1NbN++PcfIzKxWqk0EbiMoiNbWVvr7+3cq6+/vp7W1NaeIzGy6cCIoiK6uLjo6Oujr62NoaIi+vj46Ojro6urKOzQzy5kbiwtiuEG4s7OTgYEBWltb6e7udkOxmbmNwMxstnIbgZmZVcWJwMys4JwIzMwKzonAzKzgnAjMzArOicDMrOCcCMzMCs6JwMys4JwIzMwKzonAzKzgnAjMzArOiaBAPFSlmY1kzEQgaX9JP5P0oKTNki6StH89grOpMzxUZU9PD4ODg/T09NDV1eVkYNOCf6TkLCIqTsDvgPeTdVm9K/A+4Kqx3jeV06GHHho2OYsXL47Vq1fvVLZ69epYvHhxThGZZVatWhWLFi2K1atXx5NPPhmrV6+ORYsWxapVq/IObcYD1kUV37FjdkMt6aqIOLys7HcRcURNMtMI3A315HmoSpuuPJ527UxlN9R9kj4jqUXSQkmfBn4uab6k+ZMP1erBQ1XadDUwMMCSJUt2KluyZAkDAwM5RVQ81YxQdlz6++Gy8g8BAbi9YAbo6uri2GOPZXBwkKGhIRobG2lqauLss8/OOzQruOEfKaVHBP6RUl9jHhFExKIKk5PADLF27Vq2bdvG/PnZQdz8+fPZtm0ba9euzTkyKzqPp52/UdsIJL0uIlZLevtI8yPixzWNrITbCCavqamJM844g5NPPnlH2fLlyzn11FMZHBzMMTKz7Kqh7u7uHeNpd3V1eTztKVBtG0GlRPD5iDhN0ndHmB0R8aHJBlktJ4LJk8S2bdvYfffdd5Q99thj7LHHHox1wYCZzUzVJoJR2wgi4rT08J8j4s6yyhdNMj6rszlz5nDWWWftdERw1llnMWfOnByjMrPpoJqrhn40QtmFUx2I1dYJJ5zAKaecwvLly3nsscdYvnw5p5xyCieccELeoZlZzkY9IpD0cmAxsFdZO8GeQFOtA7Op1dPTA8Cpp57KJz/5SebMmcOJJ564o9zMiqtSG8GxwNuAtwIXl8x6FPh+RNTtchO3EZiZjd9UtBFcBFwk6dUR8dspjc7MzKaNau4jcBIwM5vF3A21mVnBjZoIJJ2U/h451QuV9FxJF0q6WdKApFdP9TLMzKw6lY4I/nf6W4vLSr4O/CoiXg4cDLh3KTOznFTqdG5A0kagWdL1JeUiu7P4FRNZoKQ9gaOAD5JV9CTw5ETqMjOzyRv1iCAijgeOAG4HjimZ3pL+TtT+wBbgu5J+L+nbkvYof5GkZZLWSVq3ZcuWSSyumKSJTWZ58Ahl+arYWBwR90fEwcB9wLw03RsRmyaxzF2BVwHfjIhXAtuAz4yw7HMioj0i2pubmyexuGKKqDChUeeZ1ZuHUc1fNWMWvxa4Dfh34D+AWyUdNYll3gPcExFXpecXkiUGMyug7u5uVqxYwdKlS2lsbGTp0qWsWLGC7u7uvEMrjGoGplkOvDEibgGQdCDQCxw6kQVGxP2S7pb0slTn64GbJlKXmc18HqEsf9XcR9A4nAQAIuJWoLHC66vRCZyfGqEPAc6YZH1mNkN5GNX8VZMI1klaIenoNH0LWD+ZhUbEden8/ysi4m0R8ZfJ1GdmM5dHKMtfNaeGPgL8X+BjZJeOXkHWVmBmNmnDI5F1dnbuGKGsu7vbI5TV0ai9j04n7n10ikm+RMisAKrtfdR9DZmZFZwTgZlZwY0rEUjaJXURYWZms0Q1N5StkrRn6gbiJuAWSZ+qfWhmZlYP1RwRHBQRfyUbtvIXwALg/TWNyszM6qaqG8okNZIlgosiYgjwJSdmZrNENYngbGAjsAdwhaSFwF9rGZSZmdXPmDeURcQ3gG+UFG2StLR2IZmZWT1V01i8T+pi4pfp+UHAB2oemZmZ1UU1p4ZWAr8G9k3PbwU+XquAzMysvqpJBHtHxAXA0wAR8RSwvaZRmdms5tHzppdqOp3bJun5pCuFJB0BPFLTqMxsVhu1qyv3g5WLahLBycDFwAGSrgSagXfWNCozM6ubaq4aujYNV/kysm6ob0n3EpiZ2SwwZiKQ9PdlRa+SREScW6OYzMysjqo5NXRYyeMmsjGGrwWcCMzMZoFqTg11lj6XtBdwXs0iMjOzuprIeASPAS+d6kDMzCwf1bQR/IxnOpnbBTgIuKCWQZmZWf1U00bw1ZLHTwGbIuKeGsVjZmZ1Vk0bwZp6BGJmZvkYNRFIepSRxx0QEBHhISvNzGaBURNBRMyrZyBmZpaPatoIAJD0ArL7CACIiLtqEpGZmdVVNeMRvFXSbcCdwBqy0cp+WeO4zMysTqq5j+ALwBHArRGxiOzO4itrGpWZmdVNNYlgKCIeAnaRtEtE9AGH1DguMzOrk2raCB6WNBe4Ajhf0may+wnMzGwWqOaI4FjgceATwK+APwLH1DIoMzOrn0r3EZwJrIqItSXF/1n7kMzMrJ4qHRHcBvybpI2SvizJ7QJmZrPQqIkgIr4eEa8GXgv8GfiupAFJn5N0YN0iNDOzmhqzjSAiNkXElyPilcB7gL8DBmoemZmZ1UU1N5Q1SjpG0vlkN5LdCrxjsguW1CDp95IumWxdZmY2cZUai/8HcDzwZuBq4PvAsojYNkXLPonsyMKd15mZ5ajSEcGpwG+B1og4JiLOn6okIOnFZAnm21NRn5mZTVyl3keX1nC5XwM+DYzaw6mkZcAygAULFtQwFDOzYpvImMWTIuktwOaIWF/pdRFxTkS0R0R7c3NznaIzMyueuicC4EjgrZI2krU7vE7S93KIw8zMyCERRMQ/RcSLI6IFeDewOiLeV+84zMwsk8cRgZmZTSNVj1BWCxFxOXB5njGYmRWdjwjMzArOiWCGa3nhIBLjm4hxvb7lhYN5r6aZ1VCup4Zs8jY90ESgmi5DD0RN6zezfPmIwMys4JwIzMwKzonAzKzgnAjMzArOicDMrOCcCMzMCs6JwMys4JwIzKwmfLPjzOEbysysJnyz48zhIwIzs4JzIjAzKzgnAjOzgnMiMDMrODcWzwLCDWZmNnFOBLNAza/McKIxm9V8asjMrOCcCMzMCs6JwMys4JwIzMwKzonAzKzgnAjMzArOicDMrOCcCMzMCs6JwMys4JwIzMwKzonAzKzgnAjMzArOicDMrOCcCMzMCs7dUJtZzbgL85nBicDMasZjZcwMPjVkZlZwdU8Ekl4iqU/SgKQbJZ1U7xjMzOwZeZwaegr4ZERcK2kesF7SbyLiphxiMTMrvLofEUTEfRFxbXr8KDAA7FfvOMzMLJNrY7GkFuCVwFUjzFsGLANYsGBBXeOaSRbuM4geqG2D2cJ9BoGmmi7DZh/vmzOHIvJpdZc0F1gDdEfEjyu9tr29PdatW1efwIpAgpw+d7OKvG9OKUnrI6J9rNflctWQpEbgR8D5YyUBMzOrrTyuGhKwAhiIiOX1Xr6Zme0sjyOCI4H3A6+TdF2a3pRDHGZmRg6NxRHRDzW+3dDMzKrmO4vNzArOicDMrOCcCMzMCs6JwMys4JwIzMwKzonAzKzgnAjMzArOicDMrOCcCMzMCs6JwMys4Dx4vZnVnUbtZCZG7YDGvVPXjhOBmdWdv9SnF58aMjMrOCcCM7OCcyIwMys4JwIzs4JzY/EsNfpVGeArM8yslBPBLOUvdDOrlk8NmZkVnBOBmVnBORGYmRWcE4GZWcE5EZiZFZwTgZlZwTkRmJkVnBOBmVnBKWbAnUeStgCb8o5jFtkbeDDvIMxG4H1zai2MiOaxXjQjEoFNLUnrIqI97zjMynnfzIdPDZmZFZwTgZlZwTkRFNM5eQdgNgrvmzlwG4GZWcH5iMDMrOCcCMzMCs6JYAaStF3SdZJulPQHSSdL2iXNa5f0jbxjtGKTtLWGdX9Q0pm1qr+IPELZzPR4RBwCIOkFwCpgL+C0iFgHrKvlwiXtGhFP1XIZZlY/PiKY4SJiM7AM+KgyR0u6BEDS6ZK+I+lySXdI+lgqb5E0IOlb6ajiUkm7pXkHSPqVpPWS/kvSy1P5SknLJfUBX85pdW0GS/vQO0ueb01/j0776IWSbpZ0vpSNui3pMElr05Hv1ZLmpbfvm/bT2yT9aw6rM6v4iGAWiIg70qmhF4ww++XAUmAecIukb6bylwLHR8QJki4A3gF8j+zyvRMj4jZJhwP/AbwuvedA4A0Rsb2Gq2PF9EpgMXAvcCVwpKSrgR8Ax0XENZL2BB5Prz8kvecJsv26JyLuziHuWcGJYPbQKOU/j4gngCckbQb2SeV3RsR16fF6oEXSXOA1wA/TDzKAOSV1/dBJwGrk6oi4B0DSdUAL8AhwX0RcAxARf03zAS6LiEfS85uAhYATwQQ5EcwCkvYHtgObgday2U+UPN7OM595efluZKcKHx5ufxjBtslHawX2FOl0dDr185ySeSPtpwJGu9FptP3aJsBtBDOcpGbgLODMmOTdgekX152S3pXqlqSDpyBMM4CNwKHp8bFA4xivv5msLeAwAEnzJPkLvwa8UWem3dLhcyPZr6zzgOVTVPd7gW9K+myq//vAH6aobiuO3SXdU/J8OfAt4KJ07v8yxjjCjIgnJR0H9KSLGR4H3lCrgIvMXUyYmRWcTw2ZmRWcE4GZWcE5EZiZFZwTgZlZwTkRmJkVnBOB1V1J76nDU8sE6niupP8z9dFNjKSvpH6bvlJWfrSk15Q836m/nQks59Sy52snWpfZMF8+anUnaWtEzJ1kHS3AJRHRNs73NdSimwxJfwWaU3cepeWnA1sj4qvp+UqyuC+c4HImve3MyvmIwKYFSQ3pV/U1kq6X9OFUPlfSZZKulXSDpGPTW74EHJCOKL5S2utqet+Zkj6YHm+U9DlJ/cC7KvSw+i5JG1JPl1eMEKPSsjakWI5L5RcDewBXDZel8hbgROATKc7/nmYdlXrUvKOsN85Plaz/50dY/pdINxNKOj+VlfbguUbSBZJulfQlSe9NPXbeIOmA9LpmST9Ky7lG0pHj/7Rs1okIT57qOpH1DXNdmn6SypYBn02P55CNqbCI7O73PVP53sDtZH3QtAAbSuo8muyX9vDzM4EPpscbgU+XzLsMeGl6fDiwOj2+AdgvPX7uCHG/A/gN0EDWed9dwIvSvK2jrOvpwD+WPF8J/JDsR9hBwO2p/I1kPb8qzbsEOGqE+raO9Dyt/8PAi9L2+xPw+TTvJOBr6fEqYEl6vAAYyHt/8JT/5C4mLA87BtYp8UbgFSW/kPci6yr7HuAMSUcBTwP78UwPquPxA8iOMBi9h9UrgZWpW+4fj1DHEqA3slNLD0haAxwGXDzOWH4aEU8DN0kaXpc3pun36flcsvV/1pFJBddExH0Akv4IXJrKbyDrihyyLhoOKln3PSXNi4hHx7kONos4Edh0IaAzIn69U2F2eqcZODQihiRtBJpGeP+Oni2T8tcM92szag+rEXGisjEY3gxcJ+mQiHioLMapUNqOoJK//xIRZ09RvU+XPH+aZ/7XdwFeHRGPY5a4jcCmi18DH5HUCCDpQEl7kB0ZbE5JYClZv/MAj5INtjNsE9kv3TmS9gJeP9JCokIPq5IOiIirIuJzwIPAS8refgVwXGrPaAaOAq4eY73K46y0/h9KRyxI2k/ZMKTlhoa30QRdCnx0+Imk0boctwJxIrDp4tvATcC1kjYAZ5P9ij0faJe0jqxn1JsB0i/1K1PD7VciG53qAuD69J7fj7CMYe8FOiT9AbiRrEtkgK+khtUNZF/65b2u/iTV/wdgNVm7w/1jrNfPgL8rayx+loi4lOz8/W8l3QBcyMgJ5Bzg+uHG4gn4GNn2vF7ZgC4nTrAem0V8+aiZWcH5iMDMrOCcCMzMCs6JwMys4JwIzMwKzonAzKzgnAjMzArOicDMrOD+P4Cj6ecb4rpKAAAAAElFTkSuQmCC\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 16,
+ "id": "6d370196-265b-48bb-917f-c44ba3e20c8e",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYIAAAEWCAYAAABrDZDcAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAH3JJREFUeJzt3X14XGWd//H3pyHSUiqsUos8tAFXMTaXoER56iJBAVdA1LVifWKXuAX1FwH3tyrGn+BDUC9ckI27KBKtCI0i+ACuyoMN1KCgLSIUgqDQAipQwPLQJRLK9/fHOSnTIZlMk5k5mTmf13XNlZkzM+f+npnJ+Z77vs+5b0UEZmaWXzOyDsDMzLLlRGBmlnNOBGZmOedEYGaWc04EZmY550RgZpZzTgQNQNLpki6s0LqWSfpcJdZVCZJulXRIieevkfT+KpU9T9JKSY9L+o9qlJEFSbMkXS7pUUnfq1IZFftNjrP+kr8L2zrbZB2ATUzSEwUPtwP+BmxKH59Q+4hqJyIWjt6XdDrw9xHxnsmuT9J2wJeAdwDNwO8i4uBxXr4UeAh4fkzxghtJy4D7IuKTU1lPhbwdmAe8MCKenurK0h3yhRGx21TXNc76l1H02RX+LmzqXCOoAxGx/egNuAc4umDZRVnHV2fOA14AtKZ/Tynx2gXAbVNNApUgqZIHbQuAOyaTBCoch00XEeFbHd2AtcAbipadDlwMXAA8DtwKtBc8vwtwKbAeuBv4cIn1LwO+ClyVrutaYEHB8wcCvwEeTf8emC5/AXAfSZIC2B74A/C+McroAG4peHw18OuCx4PAWwq3F3gj8BQwAjxBciQPcA3wWeC6NN4rgZ3G2ba9gMdIjvAn+pyXpWU9lZb3BpIDp48DfwQeTj/zFxS853vA/elnsxJYmC5fWrSuy9PlQVLDKSzzc+n9Q9LP82PpOr+dLj8KuAnYAPwSeGXB+z8G/Cn9HH4PvH6M7fp00efYmW7XJ4F1wIPp72iH9PUtaZydJAchK4vWNxt4EngmXd8TJL+306nAb7LEZ7eW9P8gLesS4LtpWTcCe2f9v1pPt8wD8G0rv7DxE8Ew8CagCfg8cH363AxgNfAp4HnAnsBdwBHjrH9Z+s90MLAtcA4wmD73AuCvwHtJmhWXpI9fmD5/eLrTehHwdeCSccqYme48dkrXcz/wZ2AOMCt97oXF25tu54VF67qGZMf8svS91wBfGKfc9wG3AGeTNPncAvxTic96GemOOX18MnA9sFv62XwN6C94/vh0G7YFvgzcNN660mUTJYKngS+m65sFvJpkR71f+j0fl34+25IkuXuBXdL3twAvGWe7tvgc07j/kP42tge+z7OJpyWN8wKSnf6sMdZ3CEnTTTV/k8WfXfHvYoSkyasZ+L8kyaU56//Xerm5aahxDEbETyJiE/BtYO90+WuAuRHxmYh4KiLuItlJv7PEuv4nIlZGxN+AbuAASbsDRwJ3RsS3I+LpiOgHbgeOBoiIK0mOin+evnbM/ouIGAZWkSSbduBmklrAQcD+aRkPb8W2fzMi7oiIJ0mOQvcZ53W7AW0kR+y7AP8H+Jak1jLLOQHojoj70s/mdODto80lEfGNiHi84Lm9Je2wFdtR7BngtIj4W7pt/wp8LSJuiIhNEfEtkv6i/Un6jLYFXiGpOSLWRsQfyyzn3cBZEXFXRDwBnAq8s6gZ6PSI2JjGUa5K/iYnsjoiLomIEeAskoON/aewvlxxe1/juL/g/v8CM9N/5AXALpI2FDzfBPyixLruHb0TEU9IeoRkx7kLSfNBoXXArgWPzyPZwZ4xwc78Wp5t/riWpGbxOpId27Ul3jeW4m3ffpzXPUly5Pi5SNrHr5U0QFKTGSqjnAXADyQ9U7BsEzBP0v1AD7AYmEuyE4ek1vNouRtSZH2aNAvLP05SV8Gy55HUAq6VdDJJAloo6QrgIxHx5zLKKf5e15HsG+YVLLuXrVfJ3+RECn+zz0i6j2S7rAyuETS+e4G7I2LHgtuciHhTiffsPnpH0vYkTUJ/Tm8Lil47n6RdGklNJM0lFwAfkPT3JcoYTQQHp/evJUkEr2P8RDDVTtubp/j+e4F/LPosZ0bEn4B3AceQ9CXsQNKkAqD071ix/y/JWWCjdi56vvg99wI9ReVvl9bMiIjlEbGI5DsKkmalchR/r/NJmqUeKBFLqTgnsrW/yXLWX/ibnUFS+ysnCRpOBHnwa+AxSR9Lzx9vktQm6TUl3vMmSYskPY+kI/aGiLgX+AnwMknvkrSNpGOBVwA/Tt/3ifTv8SSnaF6QJoex/JKkXfu1JB3Ft5LsjPYj6WgdywNAS/qPPhkrSTo8T03jP4gkGV1R5vu/CvRIWgAgaa6kY9Ln5pDUZh4m2bmfMUbsexYtuwl4V/qdvJEkCZbydeBESfspMVvSkZLmSNpL0qGStiVpm3+SZ08xnkg/cIqkPdLEfwbw3Sj/rKIHgBduRTPY1v4mx/rsiu0r6W1pjeNkku/i+jLjyT0nggaXts8eTdJufjdJJ+n5JEet41kOnAY8AuxL0oZM2tRzFPBvJDu8jwJHRcRDkvYFPkJyltAmkqPRIDnLZqy4NpKc3XFrRDyVLv4VsC4iHhwnrtGLnx6WdOMEmz5WmSMkR+1vImmu+Xoa7+1lruIc4DLgSkmPk+xo9kufu4CkSeVPwG08dyfUR9J+v0HSD9NlJ5F8NxtIPuMfUkJErCLpJ/gKSVPaH4B/Tp/eFvgCyfc72mH/ieeuZUzfIGnDX0nyGxkGukq+Y8u4bidJJnel21eySWYSv8mxPrtiPwKO5dmTGd6Wft9WBkVkfoq0mdmkVeJCw7xzjcDMLOecCMzMcs5NQ2ZmOecagZlZztXFBWU77bRTtLS0ZB2GmVldWb169UMRMXei19VFImhpaWHVqlVZh2FmVlckFY8EMCY3DZmZ5ZwTgZlZzjkRmJnlnBOBmVnOORGYmeVc1RKBpG9IelDSmoJlL5B0laQ7079/V63y7bn6+/tpa2ujqamJtrY2+vv7sw7JzKaBatYIlpHMM1vo48DPI+KlJLNYjTkypVVef38/3d3d9Pb2Mjw8TG9vL93d3U4GZlbdISYktQA/joi29PHvgUMi4i+SXgxcExF7TbSe9vb28HUEU9PW1kZvby8dHR2blw0MDNDV1cWaNWtKvNPM6pWk1RHRPuHrapwINkTEjgXP/zUixmwekrQUWAowf/78fdetK+u6CBtHU1MTw8PDNDc3b142MjLCzJkz2bSp3PlLzKyelJsIpm1ncUScFxHtEdE+d+6EV0jbBFpbWxkcHNxi2eDgIK2t5c7bbmaNqtaJ4IG0SYj073gzUVmFdXd309nZycDAACMjIwwMDNDZ2Ul3d3fWoZlZxmo91tBlwHEkU+odRzK9nNXAkiVLAOjq6mJoaIjW1lZ6eno2Lzez/KpaH4GkfpKJwXcimXz6NJI5WS8G5pNMIr44Ih6ZaF3uLDYz23rl9hFUrUYQEeMdar6+WmWamdnWm7adxWZmVhtOBGZmOedEYGaWc04EZmY550RgZpZzTgRmZjnnRGBmlnNOBGZmOedEYGaWc04EZmY550RgZpZzTgRmZjnnRGBmlnNOBNYQ+vv7aWtro6mpiba2Nvr7+7MOyaxu1HpiGrOK6+/vp7u7m76+PhYtWsTg4CCdnZ0AnnjHrAxVnby+UjwxjZXS1tZGb28vHR0dm5cNDAzQ1dXFmjVrMozMLFvlTkzjRGB1r6mpieHhYZqbmzcvGxkZYebMmWzatCnDyMyyVW4icB+B1b3W1lYGBwe3WDY4OEhra2tGEZnVFycCq3vd3d10dnYyMDDAyMgIAwMDdHZ20t3dnXVoZnXBncVW90Y7hLu6uhgaGqK1tZWenh53FJuVyX0EZmYNyn0EZmZWFicCM7OccyIwM8s5JwIzs5xzIjAzyzknAjOznHMiMDPLOScCM7OccyIwM8s5JwIzs5xzIjAzyzknAjOznHMiMDPLOScCM7OcyyQRSDpF0q2S1kjqlzQzizjMzCyDRCBpV+DDQHtEtAFNwDtrHYeZmSWyahraBpglaRtgO+DPGcVhZpZ7NU8EEfEn4EvAPcBfgEcj4sri10laKmmVpFXr16+vdZhmZrmRRdPQ3wHHAHsAuwCzJb2n+HURcV5EtEdE+9y5c2sdpplZbmTRNPQG4O6IWB8RI8D3gQMziMPMzMgmEdwD7C9pO0kCXg8MZRCHmZmRTR/BDcAlwI3ALWkM59U6DjMzS2yTRaERcRpwWhZlm5nZlnxlsZlZzjkRmJnlnBOBmVnOZdJHYDYV0uTfG1G5OMwahROB1Z2SO3PJe3uzreSmITOznHMiMDPLOScCM7OccyIwM8s5JwIzs5xzIjAzyzknAjOznHMiMDPLOScCM7OccyIwM8s5JwIzs5xzIjAzyzknAjOznHMiMDPLOScCM7OcmzARSNpT0uWSHpL0oKQfSdqzFsGZWT709/fT1tZGU1MTbW1t9Pf3Zx1SrpRTI1gOXAzsDOwCfA/wt2RmFdHf3093dze9vb0MDw/T29tLd3e3k0ENlZMIFBHfjoin09uFgKeAMrOK6Onpoa+vj46ODpqbm+no6KCvr4+enp6sQ8sNxQTT+kn6ArAB+A5JAjgW2Bb4L4CIeKTKMdLe3h6rVq2qdjHWCDxVZd1pampieHiY5ubmzctGRkaYOXMmmzZtyjCy+idpdUS0T/S6cmoExwInAAPANcAHgOOB1YD3znXE7bA2HbW2tjI4OLjFssHBQVpbWzOKKH8mnLw+IvaoRSBWXaPtsH19fSxatIjBwUE6OzsBWLJkScbRWZ51d3fT2dn5nN+mm4ZqZ9ymIUmHRsQKSW8b6/mI+H5VIyvgpqGpa2tro7e3l46Ojs3LBgYG6OrqYs2aNRlGVmFuGqpL/f399PT0MDQ0RGtrK93d3T5AqYBym4ZKJYJPR8Rpkr45xtMREcdPNchyORFMXW7aYZ0IzDYrNxGM2zQUEaeldz8TEXcXrdzNRXVmtB22sEbgdlgzg/I6iy8dY9kllQ7Eqmu0HXZgYICRkREGBgbo7Oyku7s769DMLGPj1ggkvRxYCOxQ1E/wfGBmtQOzyhptb+3q6trcDtvT0+N2WDMredbQXsBRwI7A0QXLHwf+tZpBWXUsWbLEO34ze45SfQQ/An4k6YCI+FUNYzIzsxqasI/AScDMrLF5GGozs5wbNxFIOin9e1ClC5W0o6RLJN0uaUjSAZUuw8zMylOqRvAv6d/eKpR7DvCziHg5sDcwVIUyzMysDKUSwZCktcBekm4uuN0i6ebJFijp+cDBQB9ARDwVERsmuz4rnwedM7OxjJsIImIJsD/wB5LTR0dvR7Hl6aRba09gPfBNSb+VdL6k2cUvkrRU0ipJq9avXz+F4srXyDtKT/5hZuOKiAlvwPOAtvTWXM57SqyrHXga2C99fA7w2VLv2XfffaPali9fHnvssUesWLEinnrqqVixYkXssccesXz58qqXXQsLFy6MFStWbLFsxYoVsXDhwowiqpJkHCwziwhgVZSxXy5nYprXARcAawEBuwPHRcTKySQeSTsD10dES/r4H4CPR8SR472nFoPONfronB50zix/KjkxzVnA4RHxuog4GDgCOHuygUXE/cC9kvZKF70euG2y66uUoaEhFi1atMWyRYsWMTTUGP3YnvzDzMZTTiJojojfjz6IiDuA5hKvL0cXcFHa6bwPcMYU1zdljb6j9KBzZjaeCWcoA1ZJ6gO+nT5+N8k0lZMWETeR9BVMG93d3Rx77LHMnj2be+65h/nz57Nx40bOOeecrEOrCA86Z2bjKScRfAD4EPBhkj6ClcB/VzOorE3Ub1KvPOicmY1lws7i6cCdxVY2dxabbTblqSqnk1okgtycVdPonAjMNqvkWUO50OidxWZm49mqRCBpRjpERMPxWTVmllcTdhZLWg6cCGwiOVtoB0lnRcSZ1Q6ulnxWjZnlVTlXFt8UEftIejewL/AxYHVEvLIWAUJt+gisQbiPwGyzSvYRNEtqBt4C/CgiRgD/p5nZuFp2HkaiZreWnYez3uS6Vs51BF8jGWfod8BKSQuAx6oZlJnVt3UPzCRQzcrTAz42nYoJE0FE/CfwnwWL1knqGO/1ZmZWXyZsGpI0T1KfpJ+mj18BHFf1yDLQyPMRmJmNp5w+gmXAFcAu6eM7gJOrFVBWPHGLmeVVOYlgp4i4GHgGICKeJjmVtKH09PTQ19dHR0cHzc3NdHR00NfXR09PT9ahmZlVVTmJYKOkF5KeKSRpf+DRqkaVgaGhIc444wxmzJiBJGbMmMEZZ5zRMPMRmFltTOXsp6yUkwg+AlwGvETSdSSzlXVVNaoMzJo1i6uvvpoTTzyRDRs2cOKJJ3L11Vcza9asrEMzszqSzJc6zg2VfD4rEyaCiLgReB1wIHACsDAibq52YLW2ceNG5syZw+LFi9luu+1YvHgxc+bMYePGjVmHlkuTPg+d8HnoZlupnCEm3le06NWSiIgLqhRTZs4+++wthpg4++yzef/73591WJMylWrmdLgw1+eh1z818HWnk///Cib7s67m/2U5F5S9puD+TJI5hm8kaSJqGJJYvXr1FnMPfPCDH0RZNtxNQckfjYdhsBqoaSLPIOk00vaVc0HZFv0Bknbg2WkrG8Zhhx3GueeeC8DnP/95Tj31VM4991wOP/zwjCMzM6uurZ6YJh136OaIqNlA/bUadO6II47gqquuIiKQxGGHHcYVV1xR9XJrrg5qBFLtj7im+UdSVxr9+6uX7St30Lly+ggu59lB5mYArwAu3vqQpr+G3OmbZWDBvOGa9rssmDdM0nJtk1FOH8GXCu4/DayLiPuqFI+ZNYC1909ypzzp2qqTwFSU00dwbS0CMTOzbIybCCQ9ztjzDgiIiGjIKSvNzPJm3EQQEXNqGYiZmWWjnD4CACS9iIKGuIi4pyoRmZlZTZUzH8GbJd0J3A1cSzJb2U+rHJeZmdVIOYPOfRbYH7gjIvYgubL4uqpGZWZmNVNOIhiJiIeBGZJmRMQAsE+V4zIzsxopp49gg6TtgZXARZIeJLmewMzMGkA5NYJjgCeBU4CfAX8Ejq5mUGZmVjulriP4CrA8In5ZsPhb1Q+pNup9mGYzs0opVSO4E/gPSWslfVFSQ/UL1OMsQmZm1TBuIoiIcyLiAJLZyR4BvilpSNKnJL2sZhGamVlVlTNV5bqI+GJEvAp4F/BWwDO6m9mkTGWqUauOci4oa5Z0tKSLSC4kuwP4p6kWLKlJ0m8l/Xiq6zKz+lGyWXaCm1VHqc7iw4AlwJHAr4HvAEsjolKzuZ9EUrPw4HVmZhkqVSP4BPAroDUijo6IiyqVBCTtRpJgzq/E+szMbPJKjT7aUcVyvwx8FBh3hFNJS4GlAPPnz69iKGZm+VbOBWUVJeko4MGIWF3qdRFxXkS0R0T73LlzaxSdmVn+1DwRAAcBb5a0lqTf4VBJF2YQh5mZkUEiiIhTI2K3iGgB3gmsiIj31DoOMzNLZFEjMDOzaaTsGcqqISKuAa7JMgabvjTmlNlmVmmZJgKzUoLaXUrqpGN55qahOtay83Dpy/UneRn/WLeWnYez3lwzqxLXCOrYugdm1uyoWQ/4iNmsUblGYGaWc04EZmY550RgZpZzTgRmZjnnRGBmlnNOBGZmOedEYGaWc76OoM75ilgzmyongjpXswvKnHDMGpabhszMcq6hE0Etx+LxeDxmVq8aummolmPxgMfjMbP61NA1AjMzm1hD1wisfi2YN1zTGtaCecPAzJqVZ/Wt0X6fTgQ2La29f5I/egliMv+gTgJWvkb7fbppyMws55wIzMxyzonAzCznnAjMzHLOicDMLOecCMzMcs6JwMws55wIzMxyzonAzCznGv7KYo+jb2ZWWsMngpqOPuqkY2Z1yE1DZmY51/A1gkZWyxEQPTqnWeNyIqhjtR0B0UnArFG5acjMLOecCMzMcs6JwMws52qeCCTtLmlA0pCkWyWdVOsYzMzsWVl0Fj8N/FtE3ChpDrBa0lURcVsGsZiZ5V7NawQR8ZeIuDG9/zgwBOxa6zjMzCyR6emjklqAVwE3jPHcUmApwPz58ye1/lqeZz9ank+zrD6VvFg8KHUx+aTmDTfbCvX4+8wsEUjaHrgUODkiHit+PiLOA84DaG9vn9THU9vz7MFJoDa8M7fprB5/n5kkAknNJEngooj4fhYxmE1XpY8oS6vHnZBlL4uzhgT0AUMRcVatyzebDlp2HkZizNtUjLfOlp2HKxO4NaQsagQHAe8FbpF0U7rsExHxkwxiMcvEugdm1nZk3Br2lVn9qXkiiIhBSnaXmJlZLfnKYjOznHMiMDPLOScCM7OccyIwM8s5JwIzs5xzIjAzyzknAjOznPOcxWYZEb7Iy6YHJ4IGVY8jIOZNTa8sdtKxEpwIGpR35mZWLvcRmJnlnBOBmVnOORGYmeWcE4GZWc45EZiZ5ZwTgZlZzuX29FGfZ29ZWjBvuKazhi2YNwzMrFl5Vl9ymwi8M7csrb2/1jtlJwEbn5uGzMxyzonAzCznnAjMzHLOicDMLOecCMzMcs6JwMws55wIzMxyzonAzCznFHVwZZWk9cC6Gha5E/BQDcurtUbevkbeNvD21btab9+CiJg70YvqIhHUmqRVEdGedRzV0sjb18jbBt6+ejddt89NQ2ZmOedEYGaWc04EYzsv6wCqrJG3r5G3Dbx99W5abp/7CMzMcs41AjOznHMiMDPLOSeClKTdJQ1IGpJ0q6STso6pkiTNlPRrSb9Lt+/TWcdUDZKaJP1W0o+zjqWSJH1D0oOS1mQdSzVI2lHSJZJuT/8HD8g6pkqStFbSLZJukrQq63iKuY8gJenFwIsj4kZJc4DVwFsi4raMQ6sISQJmR8QTkpqBQeCkiLg+49AqStJHgHbg+RFxVNbxVIqkg4EngAsioi3reCpN0reAX0TE+ZKeB2wXERuyjqtSJK0F2iNiWl4s5xpBKiL+EhE3pvcfB4aAXbONqnIi8UT6sDm9NdRRgKTdgCOB87OOpdIiYiXwSNZxVIOk5wMHA30AEfFUIyWBeuBEMAZJLcCrgBuyjaSy0maTm4AHgasioqG2D/gy8FHgmawDsa2yJ7Ae+GbarHe+pNlZB1VhAVwpabWkpVkHU8yJoIik7YFLgZMj4rGs46mkiNgUEfsAuwGvldQwTQySjgIejIjVWcdiW20b4NXAuRHxKmAj8PFsQ6q4gyLi1cA/Ah9Km/qmDSeCAmnb+aXARRHx/azjqZa02n0N8MaMQ6mkg4A3p22x3wEOlXRhtiFZme4D7iuooV5CkhgaRkT8Of37IPAD4LXZRrQlJ4JU2pnaBwxFxFlZx1NpkuZK2jG9Pwt4A3B7tlFVTkScGhG7RUQL8E5gRUS8J+OwrAwRcT9wr6S90kWvBxriJA0ASbPTE1BIm7wOB6bV2V/bZB3ANHIQ8F7glrQdHeATEfGTDGOqpBcD35LURHIAcHFENNQplo1MUj9wCLCTpPuA0yKiL9uoKqoLuCg9Y+gu4F8yjqeS5gE/SI412QZYHhE/yzakLfn0UTOznHPTkJlZzjkRmJnlnBOBmVnOORGYmeWcE4GZWc45EVgmJG1KR2IcvbVMYh07Svpg5aObHElnpiO7nlm0/BBJBxY8Xibp7RUu+zOS3lDJdVp++DoCy8qT6XAXU7Ej8EHgv7fmTZKaImLTFMseywnA3Ij4W9HyQ0hGDv1lFcoEICI+Va11W+NzjcCmjXRQvDMl/UbSzZJOSJdvL+nnkm5Mx3Q/Jn3LF4CXpDWKM9Mj7x8XrO8rkv45vb9W0qckDQKLJb1E0s/SQcB+Ienl6esWS1qTztuwcowYlZa1Jo3l2HT5ZcBs4IbRZenyFuBE4JQ0zn9InzpY0i8l3VVYO5D07wXb/5w5I9LPaFlB+aeky5dJeruk9oJa1i2SIn1+zO01A9cILDuzCq7gvjsi3gp0Ao9GxGskbQtcJ+lK4F7grRHxmKSdgOvTHe/HgbbRmoWkQyYoczgiFqWv/TlwYkTcKWk/klrFocCngCMi4k+jQ3IUeRuwD7A3sBPwG0krI+LNkp4oruVExFpJXwWeiIgvpWV3klzpvQh4OXAZcImkw4GXkoxDI+AySQenQ1CP2gfYdXROguIYI2JV+hrSJqrRK1jPG2d7zZwILDNjNQ0dDryy4Ah5B5Id433AGemIjc+QzBMxbxJlfhc2jzB7IPC99LJ/gG3Tv9cByyRdDIw18OAioD9tWnpA0rXAa0h25lvjhxHxDHCbpNFtOTy9/TZ9vD3J9hcmgruAPSX1Av8DXDnWyiW9g2TgtsMn2F4zJwKbVgR0RcQVWyxMmnfmAvtGxIiSEUZnjvH+p9myubP4NRvTvzOADWP1UUTEiekR85HATZL2iYiHi2KshMJ+BBX8/XxEfG28N0XEXyXtDRwBfAh4B3B84WskLQQ+DRwcEZskjbu9ZuA+AptergA+oGQ4cCS9LB2tcQeSuQZGJHUAC9LXPw7MKXj/OuAVkraVtAPJKJbPkc4zcbekxWk5SneuSHpJRNyQdr4+BOxe9PaVwLFpW/1ckpm1fj3BdhXHWWr7j0+P4JG0q6QXFb4gbRqbERGXAv+PouGa0+3+DvC+iFg/0faagWsENr2cD7QANyppw1gPvAW4CLhcyaTfN5EOnx0RD0u6TsmE7j+NiH9Pm3RuBu7k2SaWsbwbOFfSJ0mm7fwO8DvgTEkvJTk6/3m6rNAPgAPS5QF8NB1GuZTLSfoAjiEZZXNMEXGlpFbgV2kTzhPAe0hmlBu1K8lMXqMHcacWreYtJIny66PNQGlNYLztNfPoo2ZmeeemITOznHMiMDPLOScCM7OccyIwM8s5JwIzs5xzIjAzyzknAjOznPv/8fpstKd55w8AAAAASUVORK5CYII=\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 17,
+ "id": "b894329d-88e7-48c1-886e-1b6131b913b9",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "smoker Yes No\n",
+ "sex \n",
+ "Male 3.051167 3.113402\n",
+ "Female 2.931515 2.773519\n"
+ ]
+ },
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "aefe2f74-5174-42cd-b32d-83991f0f1980",
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "markdown",
+ "id": "ab2fba5b-ad22-43a3-a196-c0ae9a03a523",
+ "metadata": {},
+ "source": [
+ "## 练习4:泰坦尼克号海难幸存状况分析 (8个题)\n",
+ "\n",
+ "1. 不同仓位等级中幸存和遇难的乘客比例(提示:箱图或者提琴图)\n",
+ "2. 不同性别的幸存比例(提示:箱图或者提琴图)\n",
+ "3. 幸存和遇难乘客的票价分布(提示:箱图或者提琴图)\n",
+ "4. 幸存和遇难乘客的年龄分布(提示:箱图或者提琴图)\n",
+ "5. 不同上船港口的乘客仓位等级分布(提示:箱图或者提琴图)\n",
+ "6. 幸存和遇难乘客堂兄弟姐妹的数量分布(提示:箱图或者提琴图)\n",
+ "7. 幸存和遇难乘客父母子女的数量分布(提示:箱图或者提琴图)\n",
+ "8. 单独乘船与否和幸存之间有没有联系(提示:统计柱状图)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 105,
+ "id": "171e617f-9983-445a-b872-ad925ee239a4",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " survived \n",
+ " pclass \n",
+ " sex \n",
+ " age \n",
+ " sibsp \n",
+ " parch \n",
+ " fare \n",
+ " embarked \n",
+ " class \n",
+ " who \n",
+ " adult_male \n",
+ " deck \n",
+ " embark_town \n",
+ " alive \n",
+ " alone \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 0 \n",
+ " 3 \n",
+ " male \n",
+ " 22.0 \n",
+ " 1 \n",
+ " 0 \n",
+ " 7.2500 \n",
+ " S \n",
+ " Third \n",
+ " man \n",
+ " True \n",
+ " NaN \n",
+ " Southampton \n",
+ " no \n",
+ " False \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 1 \n",
+ " 1 \n",
+ " female \n",
+ " 38.0 \n",
+ " 1 \n",
+ " 0 \n",
+ " 71.2833 \n",
+ " C \n",
+ " First \n",
+ " woman \n",
+ " False \n",
+ " C \n",
+ " Cherbourg \n",
+ " yes \n",
+ " False \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 1 \n",
+ " 3 \n",
+ " female \n",
+ " 26.0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 7.9250 \n",
+ " S \n",
+ " Third \n",
+ " woman \n",
+ " False \n",
+ " NaN \n",
+ " Southampton \n",
+ " yes \n",
+ " True \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 1 \n",
+ " 1 \n",
+ " female \n",
+ " 35.0 \n",
+ " 1 \n",
+ " 0 \n",
+ " 53.1000 \n",
+ " S \n",
+ " First \n",
+ " woman \n",
+ " False \n",
+ " C \n",
+ " Southampton \n",
+ " yes \n",
+ " False \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " 0 \n",
+ " 3 \n",
+ " male \n",
+ " 35.0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 8.0500 \n",
+ " S \n",
+ " Third \n",
+ " man \n",
+ " True \n",
+ " NaN \n",
+ " Southampton \n",
+ " no \n",
+ " True \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " survived pclass sex age sibsp parch fare embarked class \\\n",
+ "0 0 3 male 22.0 1 0 7.2500 S Third \n",
+ "1 1 1 female 38.0 1 0 71.2833 C First \n",
+ "2 1 3 female 26.0 0 0 7.9250 S Third \n",
+ "3 1 1 female 35.0 1 0 53.1000 S First \n",
+ "4 0 3 male 35.0 0 0 8.0500 S Third \n",
+ "\n",
+ " who adult_male deck embark_town alive alone \n",
+ "0 man True NaN Southampton no False \n",
+ "1 woman False C Cherbourg yes False \n",
+ "2 woman False NaN Southampton yes True \n",
+ "3 woman False C Southampton yes False \n",
+ "4 man True NaN Southampton no True "
+ ]
+ },
+ "execution_count": 105,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "data = sns.load_dataset(\"titanic\")\n",
+ "data.head()\n",
+ "# 幸存与否,仓位等级,性别,年龄,堂兄弟姐妹数,父母子女数,票价,上船港口缩写,仓位等级,人员分类,是否成年男性,所在甲板,上船港口,是否幸存,是否单独乘船"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "20fe467f-9267-47d8-b645-841c6cf50a25",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "data1 = data.groupby(\"pclass\")[\"survived\"].sum()\n",
+ "\n",
+ "data2= data.groupby(\"pclass\")[\"pclass\"].count()\n",
+ "\n",
+ "data_df = pd.concat([data1, data2], axis=1, keys=[\"survived\",\"total\"])\n",
+ "\n",
+ "data_df[\"unsurvived\"] = data_df[\"total\"] - data_df[\"survived\"]\n",
+ "\n",
+ "data_df[\"survived_prop\"] = data_df[\"survived\"]/data_df[\"total\"]\n",
+ "\n",
+ "data_df[\"unsurvived_prop\"] = data_df[\"unsurvived\"]/data_df[\"total\"]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 129,
+ "id": "1414f1ba-9194-429b-8c04-a2e13b0ef580",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " survived \n",
+ " total \n",
+ " unsurvived \n",
+ " survived_prop \n",
+ " unsurvived_prop \n",
+ " \n",
+ " \n",
+ " pclass \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 136 \n",
+ " 216 \n",
+ " 80 \n",
+ " 0.629630 \n",
+ " 0.370370 \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 87 \n",
+ " 184 \n",
+ " 97 \n",
+ " 0.472826 \n",
+ " 0.527174 \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 119 \n",
+ " 491 \n",
+ " 372 \n",
+ " 0.242363 \n",
+ " 0.757637 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " survived total unsurvived survived_prop unsurvived_prop\n",
+ "pclass \n",
+ "1 136 216 80 0.629630 0.370370\n",
+ "2 87 184 97 0.472826 0.527174\n",
+ "3 119 491 372 0.242363 0.757637"
+ ]
+ },
+ "execution_count": 129,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "data_df"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "4f40e9c0-5b2c-46d1-b4cc-34d167677cdb",
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 130,
+ "id": "8af6710f-78a9-4021-8c5b-0608ffaadddf",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def stackedbarplot(x_data, y_data_list, y_data_names, colors, x_label, y_label, title):\n",
+ " _, ax = plt.subplots()\n",
+ " # 循环绘制堆积柱状图\n",
+ " for i in range(0, len(y_data_list)):\n",
+ " if i == 0:\n",
+ " ax.bar(x_data, y_data_list[i], color = colors[i], align = 'center', label = y_data_names[i])\n",
+ " else:\n",
+ " # 采用堆积的方式,除了第一个分类,后面的分类都从前一个分类的柱状图接着画\n",
+ " # 用归一化保证最终累积结果为1,下面bottom参数表示纵向从哪里开始画\n",
+ " ax.bar(x_data, y_data_list[i], color = colors[i], bottom = y_data_list[i - 1], align = 'center', label = y_data_names[i])\n",
+ " ax.set_ylabel(y_label)\n",
+ " ax.set_xlabel(x_label)\n",
+ " ax.set_title(title)\n",
+ " ax.legend(loc = 'upper right') # 设定图例位置"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 132,
+ "id": "e321b06a-767e-4630-8122-fea7e442a189",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "iVBORw0KGgoAAAANSUhEUgAAAlEAAAHFCAYAAADSY6wWAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8qNh9FAAAACXBIWXMAAA9hAAAPYQGoP6dpAABurklEQVR4nO3dd1QU19sH8O/Sm4BURZEiFiyIYlTsWLDFgN0Yu9iwlxh7V9RYY4+NmKhBDRq7YsHesUvUKAgqiGAELCAs9/3Dl/1lXcQdXGSJ3885ew57586dZ3Zndx/unbkjE0IIEBEREZEkOgUdABEREVFhxCSKiIiIKA+YRBERERHlAZMoIiIiojxgEkVERESUB0yiiIiIiPKASRQRERFRHjCJIiIiIsoDJlFEREREeaC1SdT169fRq1cvuLi4wMjICGZmZqhWrRrmzZuH58+fK+o5Ozvj66+//uzxyWQyDB48OM/rv3r1CnPmzEHVqlVhZmYGU1NTeHp6Yvbs2Xj16lWe2z1z5gymTp2KFy9e5LmN9zVs2BAymUzx0NfXh7OzM/r06YOHDx9qbDvh4eFK29HV1YWtrS1at26NS5cuaWw7H9KzZ084OztLWufJkyeYOnUqrl69mi8x5fZ+NmzYEA0bNsyX7X5MbGwsAgMDUbZsWRgbG8PKygqVK1dG3759ERsbq6i3b98+TJ069ZO2tXnzZixevPjTAi4EnJ2d0bNnz4/We/8zUrRoUVSpUgX9+/fHuXPn8j9QDZg4cSJKlSoFPT09WFpa5lhn/vz5kMlkOH/+vFJ5VlYWrKysIJPJcOfOHaVlb9++hYmJCdq2bZsvcffs2RNmZmaf1Ebv3r3RvHlzlfKlS5eifPnyMDQ0hIuLC6ZNm4aMjIxP2lZeqPvZ/pzy8t0sRUZGBkqXLp237xmhhX7++Wehp6cnKlasKJYvXy6OHTsmDh06JGbPni1cXFyEv7+/oq6Tk5No1arVZ48RgBg0aFCe1o2PjxeVKlUSxsbG4ocffhCHDh0Shw4dEmPHjhXGxsaiUqVKIj4+Pk9t//jjjwKAiIqKytP6OWnQoIFwdXUVZ8+eFWfPnhXHjx8Xy5cvF3Z2dsLR0VG8evVKI9s5duyYACBmz54tzp49K06cOCGWLFkirKyshImJibh7965GtvMhf//9t4iIiJC0zsWLFwUAsWHDhnyJKbf389atW+LWrVv5st3cxMbGChsbG+Hm5iZWrlwpjh49Knbs2CFmzZolqlSpIsLDwxV1Bw0aJD71a6ZVq1bCycnpE6PWfk5OTqJHjx4frQdAtG/fXpw9e1acOXNGHDhwQMyfP194eHgIAGLo0KH5H+wn2LlzpwAgJkyYIE6dOiUuXryYY71Lly4JACIoKEipPCIiQgAQpqamYuXKlUrLTpw4IQCIn376KV9i79GjhzA1Nc3z+hEREUJHR0dln2fOnClkMpkYN26cOHbsmJg3b54wMDAQffv2/dSQJZHy2f6cevToke/fAcHBwaJo0aIiMTFR0npal0SdOXNG6OrqiubNm4u0tDSV5enp6eLPP/9UPC+MSZSvr6/Q09MTJ0+eVFl28uRJoaenJ5o1a5antvMriapYsaJK+bp16wQAcfDgQY1sJzuJ2rZtm1L5L7/8IgCIyZMna2Q7miQ1iZKacObH+/mpJk+eLACIBw8e5LhcLpcr/mYSpT4pSVRO3z2ZmZmid+/eAoBYsWJFPkSoGTNnzhQAxNOnT3OtJ5fLhaWlpcp34cKFC4WDg4P49ttvRceOHZWWTZ8+XQAQN27c0HjcQnx6EtWxY0dRq1YtpbLExERhZGQk+vXrp1Q+a9YsIZPJPus/SlI+25/T50ii0tPThZWVlZg1a5ak9bRuOG/27NmQyWT4+eefYWhoqLLcwMAA33zzjUr5gQMHUK1aNRgbG6N8+fJYv369Sp34+Hj0798fJUuWhIGBgaLLNDMzU6leeno6pk+fDnd3dxgZGcHa2ho+Pj44c+bMB+MWQmD8+PHQ19fHmjVrPljv0qVLOHToEPr06YO6deuqLK9bty569+6NgwcP4vLlywCA6OhoyGQyBAcHq9SXyWSK4ZKpU6fi+++/BwC4uLgouvzDw8MBAEePHkXDhg1hbW0NY2NjlCpVCu3atcPr168/GG9uLCwsAAD6+voAgJMnT0Imk2HLli0qdTdu3AiZTIaLFy9K3k716tUBAE+fPlUqv3fvHrp06QI7OzsYGhrC3d0dy5cvV1n/1q1b8PX1hYmJCWxtbTFo0CDs3btX6bUBcu4y3rZtG2rWrAkLCwuYmJjA1dUVvXv3BvBu+PGrr74CAPTq1Uvxeme/H9ld/zdu3ICvry+KFCmCxo0bAwDCwsLg5+eHkiVLwsjICG5ubujfvz8SExMV2/7Y+5nTcN7z588RGBiIEiVKwMDAAK6urpgwYQLS09OV6mUPR//6669wd3eHiYkJqlSpgj179nzk3QCSkpKgo6MDOzu7HJfr6Ogo9j/7/fj3EFR0dDQAYPny5ahfvz7s7OxgamqKypUrY968eUpDGA0bNsTevXvx8OFDpTayX//330Mg58/LgwcP0LlzZzg4OMDQ0BD29vZo3LjxR4dhL126hM6dO8PZ2RnGxsZwdnbGt99+qzKMHRwcDJlMhmPHjmHgwIGwsbGBtbU12rZtiydPnijVzcjIwJgxY1CsWDGYmJigbt26uHDhQq5xqENXVxfLli2DjY0NfvzxR0V5WloaRo0aBU9PT1hYWMDKygre3t74888/ldZv3LgxypcvD/HePemFEHBzc0OrVq1y3X5WVhbmzZunGJKys7ND9+7d8ejRI0UdZ2dnTJw4EQBgb2+v9Hl5n46ODurXr4/Tp08rfUeHh4ejYcOGaNCggcp7Hx4eDltbW1SsWBHAu+G9mTNnKmKytbVFr1698OzZM5XthYSEwNvbG6ampjAzM0OzZs1w5cqVXPcZAE6fPg0bGxt8/fXXuZ6K8fTpU+zYsQPdunVTKj9w4ADS0tLQq1cvpfJevXpBCIGdO3d+NAZNUfezDXz4dIL3v0ezP4/z58/HwoUL4eLiAjMzM3h7e+c4/BwcHIxy5copvtM3btyYYyzTpk1DzZo1YWVlBXNzc1SrVg3r1q1TOn779OkDKyurHH/jGjVqpDhOgHe5RadOnfDzzz+rfAZypfF07hNkZmYKExMTUbNmTbXXcXJyEiVLlhQVKlQQGzduFAcPHhQdOnQQAMTx48cV9eLi4oSjo6NwcnISq1evFocPHxYzZswQhoaGomfPnop6GRkZwsfHR+jp6YnRo0eLffv2iV27donx48eLLVu2KOrhX/8NpqWlic6dO4siRYqI/fv35xrv7NmzBYBc6+3bt0+pGzsqKuqDvR0AxJQpU4QQ77pihwwZIgCI0NBQxfBbcnKyiIqKEkZGRqJp06Zi586dIjw8XGzatEl069ZN/PPPP7nGnN0TlZGRITIyMsSrV6/E+fPnhYeHh3B1dVXqMaxataqoU6eOShtfffWV+Oqrr3Ldzod6ovbs2SMAiAULFijKbt26JSwsLETlypXFxo0bxaFDh8SoUaOEjo6OmDp1qqLekydPhLW1tShVqpQIDg4W+/btE926dRPOzs4CgDh27Jii7vv/7Zw5c0bIZDLRuXNnsW/fPnH06FGxYcMG0a1bNyGEEMnJyWLDhg0CgJg4caLi9Y6NjVW0p6+vL5ydnUVQUJA4cuSIotdu5cqVIigoSOzatUscP35c/PLLL6JKlSqiXLly4u3bt0KI3N/P7PelQYMGinjfvHkjPDw8hKmpqZg/f744dOiQmDRpktDT0xMtW7ZUek0BCGdnZ1GjRg2xdetWsW/fPtGwYUOhp6cn7t+/n+v79NtvvwkAwtfXVxw4cEARz/v+/vtv0b59ewFAEfvZs2cVx8uIESPEypUrxYEDB8TRo0fFokWLhI2NjejVq5fS+1ynTh1RrFgxpTaE+N/x8u/3UIicPy/lypUTbm5u4tdffxXHjx8Xf/zxhxg1apTKuu/btm2bmDx5stixY4c4fvy4+P3330WDBg2Era2tePbsmaJe9nHg6uoqhgwZIg4ePCjWrl0rihYtKnx8fJTa7NGjh5DJZOL7778Xhw4dEgsXLhQlSpQQ5ubmn9QTla1z584CgOI4fPHihejZs6f49ddfxdGjR8WBAwfE6NGjhY6Ojvjll18U6/35558CgAgLC1Nqb+/evQKA2Lt3b65x9evXTwAQgwcPFgcOHBCrVq0Stra2wtHRUfFaRUREiD59+ggA4sCBA0qfl5wsWrRIABBnzpwRQvyvd2r16tUiMjJSAFD01KSnpwtjY2PRoUMHRd3mzZsLU1NTMW3aNBEWFibWrl0rSpQoISpUqCBev36t2E52r0/v3r3Fnj17RGhoqPD29hampqZKPUHv90SFhIQIQ0NDMXDgQJGZmZnr67Nx40YBQNy+fVupfOzYsQKAePnypco6NjY24ttvv821XSGE4rv5Y4+srKxc21H3sy2E6vdPtve/R7M/j87OzqJ58+Zi586dYufOnaJy5cqiaNGi4sWLF4q62Z8jPz8/sXv3bvHbb78JNzc3xW/3v/Xs2VOsW7dOhIWFibCwMDFjxgxhbGwspk2bpqhz7do1AUCsWbNGad1bt24JAGL58uVK5SEhIQKAuH79eq6v079pVRIVHx8vAIjOnTurvY6Tk5MwMjISDx8+VJS9efNGWFlZif79+yvK+vfvL8zMzJTqCSHE/PnzlT6I2Qf6+y/6+7K/yJKSkkTdunVFiRIlxNWrVz8a74ABAwQA8ddff32wTvaXw8CBA4UQ6idRQnx4+Gf79u0CgFoxvq9BgwYCgMqjbNmyIjIyUqlu9ofgypUrirILFy4IAEpf2DnJ/lEMCQkRGRkZ4vXr1+L06dOiXLlyokKFCkrJXrNmzUTJkiVVPuSDBw8WRkZG4vnz50IIIb7//vscu8SbNWv20SQq+9j494f8fbkN5/Xo0UMAEOvXr891v7OyskRGRoZ4+PChAKA0XJ3bcN77X2KrVq0SAMTWrVuV6s2dO1cAEIcOHVKUARD29vYiJSVFURYfHy90dHRUzkHJKd7+/fsLHR0dAUDIZDLh7u4uRowYoRKnusN5crlcZGRkiI0bNwpdXV3F+yfEh4fz1E2iEhMTBQCxePHij8bxMZmZmeLly5fC1NRULFmyRFGefdwHBgYq1Z83b54AIOLi4oQQ//tsjxgxQqnepk2bBACNJFE//PCDACDOnz//wX3IyMgQffr0EVWrVlWUy+Vy4erqKvz8/JTqt2jRQpQuXTrXH+Ds/Xp//8+fPy8AiPHjxyvKpkyZIgAoJaEfcvXqVcV5kkIIcfnyZaXvT3t7e7Fs2TIhhBDHjx9XGsrcsmWLACD++OMPpTazP7PZ9WJiYoSenp4YMmSIUr3U1FRRrFgxpSHDfydRc+bMEbq6umLu3Lkf3Q8hhBg4cKAwNjZWeR379u0rDA0Nc1ynbNmywtfXN9d2s493dR4f+6dBymdbahJVuXJlpUQz+3chu3NCLpcLBwcHUa1aNaXXKDo6Wujr6+c6nJf9/TF9+nRhbW2ttH6DBg2Ep6enUv2BAwcKc3NzkZqaqlR+7949AUDlXLvcaN1wXl54enqiVKlSiudGRkYoW7asUpf7nj174OPjAwcHB2RmZioeLVq0AAAcP34cALB//34YGRkphmxyExUVBW9vb6SkpODcuXOoUqWKRvZH/H9XYvawhSZ4enrCwMAA/fr1wy+//IIHDx5IWr906dK4ePEiLl68iLNnz2Lz5s0wNjZG48aNce/ePUW9b7/9FnZ2dkrDakuXLoWtrS06deqk1rY6deoEfX19mJiYoE6dOkhJScHevXsVV/GkpaXhyJEjaNOmDUxMTJTez5YtWyItLU3RTXz8+HFUqlQJFSpUUNrGt99++9E4sofqOnbsiK1bt+Lx48dqxf++du3aqZQlJCRgwIABcHR0hJ6eHvT19eHk5AQAiIyMzNN2jh49ClNTU7Rv316pPPuKryNHjiiV+/j4oEiRIorn9vb2sLOz++gVlzKZDKtWrcKDBw+wYsUK9OrVCxkZGVi0aBEqVqyo+Cx9zJUrV/DNN9/A2toaurq60NfXR/fu3SGXy3H37l212lCHlZUVSpcujR9//BELFy7ElStXkJWVpda6L1++xA8//AA3Nzfo6elBT08PZmZmePXqVY7v0/unGnh4eACA4jU9duwYAOC7775TqtexY0fo6elJ3reciByGIrZt24Y6derAzMxMcbytW7dOaR90dHQwePBg7NmzBzExMQCA+/fv48CBAwgMDMz1+yh7v96/urBGjRpwd3dXOfbU5eHhAWtra8WwXXh4OIoVK4Zy5coBAOrXr6/YdnYdHx8fAO++8y0tLdG6dWul7whPT08UK1ZMUf/gwYPIzMxE9+7dleoZGRnlOGQohED//v0xZcoUbN68GWPGjFFrX548eQJbW9scX8fcXtuP/Q44ODgovps/9vDy8sq1LU19tnPSqlUr6OrqKp6//9m4c+cOnjx5gi5duijts5OTE2rXrq3S3tGjR9GkSRNYWFgovj8mT56MpKQkJCQkKOoNGzYMV69exenTpwEAKSkp+PXXX9GjRw+VKy2zhzGlfNdrVRJlY2MDExMTREVFSVrP2tpapczQ0BBv3rxRPH/69Cl2794NfX19pUf2mGj2uSjPnj2Dg4OD0tjvh1y4cAF3795Fp06dULJkSbVizU72ctvH7HNGHB0d1WpTHaVLl8bhw4dhZ2eHQYMGoXTp0ihdujSWLFmi1vpGRkaoXr06qlevjlq1auHbb7/F/v37ERcXh8mTJyvqGRoaon///ti8eTNevHiBZ8+eYevWrQgICMjxHLeczJ07FxcvXsTx48cxYcIEPH36FP7+/orzepKSkpCZmYmlS5eqvJ8tW7YE8L/3MykpCfb29irbyKnsffXr18fOnTsVX7AlS5ZEpUqVcjzn60NMTExgbm6uVJaVlQVfX1+EhoZizJgxOHLkCC5cuKBI/P593EqRlJSEYsWKqXzp2tnZQU9PD0lJSUrl6nxucuPk5ISBAwdi3bp1uHfvHkJCQpCWlqY4jys3MTExqFevHh4/fowlS5bg5MmTuHjxoiL5zutrkBOZTIYjR46gWbNmmDdvHqpVqwZbW1sMHToUqampua7bpUsXLFu2DAEBATh48CAuXLiAixcvwtbWNscY339Ns4/57LrZ70GxYsWU6unp6eX4fuRF9o+Sg4MDACA0NBQdO3ZEiRIl8Ntvv+Hs2bO4ePEievfujbS0NKV1e/fuDWNjY6xatQrAu/PWjI2NP/pPZfZ+FS9eXGWZg4ODyrGnLplMhgYNGuD06dPIyMjAsWPH0KBBA8XyBg0a4Pjx4xBC4NixYyhWrBjKly8P4N13/osXL2BgYKDyPREfH6/4jsg+1/Krr75SqRcSEqJ0niLw7jyrkJAQVKxYUfFPuDrevHkDIyMjlXJra2ukpaXleN7O8+fPYWVllWu7BgYG8PT0VOuh7vQMn/LZ/pC8fjZyKrtw4QJ8fX0BAGvWrMHp06dx8eJFTJgwQalNAPDz84Ozs7PiuyU4OBivXr3CoEGDVLaT/f5I+f7RzL8+GqKrq4vGjRtj//79ePTokdqJiTpsbGzg4eGBWbNm5bg8+wvH1tYWp06dQlZW1kcTqU6dOqFYsWKYMGECsrKyFCdM5qZp06YYP348du7cmeNcIQAUJxI2bdoUwP/e2PdPDpb6xVSvXj3Uq1cPcrkcly5dwtKlSzF8+HDY29ujc+fOktoC3n1h2tjY4Nq1a0rlAwcOxJw5c7B+/XqkpaUhMzMTAwYMULtdV1dXxcnk9evXh7GxMSZOnIilS5di9OjRKFq0KHR1ddGtW7ccPwjAuxOxgXcf3PdPSAfeXWSgDj8/P/j5+SE9PR3nzp1DUFAQunTpAmdnZ3h7e390/Zz+i7x58yauXbuG4OBg9OjRQ1H+999/qxXTh1hbW+P8+fMQQihtNyEhAZmZmbCxsfmk9j+mY8eOCAoKws2bNz9ad+fOnXj16hVCQ0MVPXAAJM239aHPxfs/esC7H4V169YBAO7evYutW7di6tSpePv2rSJheF9ycjL27NmDKVOmYOzYsYry9PR0pbnqpMj+IYmPj0eJEiUU5ZmZmXlONP7tzZs3OHz4MEqXLq34/vztt9/g4uKCkJAQpePi/dcNeHexSI8ePbB27VqMHj0aGzZsQJcuXT44l9P7+xUXF6fyvf3kyZNPOvZ8fHwQGhqK8+fP4+TJkwgKClIsa9CgARITE3H58mWcO3cObdq0USzLPrn/wIEDObab3QubHdv27duVjsUPMTQ0xLFjx9CsWTM0adIEBw4cQNGiRT+6no2NDSIiIlTKK1euDAC4ceMGatasqSjPTvQqVaqUa7vR0dGK77uPOXbsWJ7mlsvps21kZITk5GSVujl9/tTx78/G+94v+/3336Gvr489e/YoJaY5nYSvo6ODQYMGYfz48ViwYAFWrFiBxo0bK3oz/y37cy3leNWqnigAGDduHIQQ6Nu3L96+fauyPCMjA7t375bc7tdff42bN2+idOnSih6Vfz+yk6gWLVogLS0txyvhcjJx4kQsXrwYkydPxrhx4z5av3r16vD19cW6desU3Yv/durUKaxfvx7NmzdXdL3a29vDyMgI169fV6r7/tU1gGp2nxNdXV3UrFlTkZnn9MFWx6NHj5CYmKhyJUfx4sXRoUMHrFixAqtWrULr1q2VhlulGjNmDNzc3DBnzhykpqbCxMQEPj4+uHLlCjw8PHJ8P7M/kA0aNMDNmzdx+/ZtpTZ///13STEYGhqiQYMGmDt3LgAortpR5/V+X/YP2fs9c6tXr85xu+q237hxY7x8+VLliyT76pbsKwM/VVxcXI7lL1++RGxsrOKzBHw4/pxeAyFEjle2fqh3LPsKoPc/F7t27co1/rJly2LixImoXLlyrse+TCaDEELlfVq7di3kcnmu2/iQ7B+wTZs2KZVv3bpV5SphqeRyOQYPHoykpCT88MMPinKZTAYDAwOlBCo+Pj7H7w8AGDp0KBITE9G+fXu8ePFCrUmFGzVqBOBdwvZvFy9eRGRk5Ccde9nDc4sWLUJycrJSElCxYkVYW1sjKCgIaWlpirrAu+/8pKQkyOXyHL8jsn9EmzVrBj09Pdy/fz/Hetn/0P1b1apVcfz4cTx69AgNGzZUGj76kPLlyyMpKUkl8WjevDmMjIxUfnOyr/j09/fPtV1NDudJ+Ww7Ozvj7t27Ssl4UlJSrlex56ZcuXIoXrw4tmzZojQk/fDhQ5U2ZTIZ9PT0lIYH37x5g19//TXHtgMCAmBgYIDvvvsOd+7c+eAxnX2ay/unf+RGq3qiAMDb2xsrV65EYGAgvLy8MHDgQFSsWBEZGRm4cuUKfv75Z1SqVAmtW7eW1O706dMRFhaG2rVrY+jQoShXrhzS0tIQHR2Nffv2YdWqVShZsiS+/fZbbNiwAQMGDMCdO3fg4+ODrKwsnD9/Hu7u7jn22AwbNgxmZmbo168fXr58iZ9++inXceyNGzeiSZMm8PX1xdChQxVfMEePHsWSJUtQvnx5pQ+UTCZD165dsX79epQuXRpVqlTBhQsXsHnzZpW2s/+rWbJkCXr06AF9fX2UK1cOmzZtwtGjR9GqVSuUKlUKaWlpimkgmjRp8tHX782bN4rhJrlcjqioKMybNw8AMHz48Bxfk+z/qjZs2PDR9nOjr6+P2bNno2PHjliyZAkmTpyIJUuWoG7duqhXrx4GDhwIZ2dnpKam4u+//8bu3btx9OhRRWzr169HixYtMH36dNjb22Pz5s3466+/ACDX3sbJkyfj0aNHaNy4MUqWLIkXL15gyZIl0NfXVwwplC5dGsbGxti0aRPc3d1hZmYGBwcHpS+b95UvXx6lS5fG2LFjIYSAlZUVdu/ejbCwMJW6H3o//30uU7bu3btj+fLl6NGjB6Kjo1G5cmWcOnUKs2fPRsuWLdV6n9Uxa9YsnD59Gp06dYKnpyeMjY0RFRWFZcuWISkpSeny+uz4586dixYtWkBXVxceHh5o2rQpDAwM8O2332LMmDFIS0vDypUr8c8//+T4GoSGhmLlypXw8vKCjo4OqlevjmLFiqFJkyYICgpC0aJF4eTkhCNHjiA0NFRp/evXr2Pw4MHo0KEDypQpAwMDAxw9ehTXr19X6mF6n7m5OerXr48ff/wRNjY2cHZ2xvHjx7Fu3bqP9sx8iLu7O7p27YrFixdDX18fTZo0wc2bNzF//nyVYd/cPH36FOfOnYMQAqmpqbh58yY2btyIa9euYcSIEejbt6+i7tdff43Q0FAEBgaiffv2iI2NxYwZM1C8eHGl8xmzlS1bFs2bN8f+/ftRt25dtc71LFeuHPr164elS5dCR0cHLVq0QHR0NCZNmgRHR0eMGDFC7X17X8WKFWFnZ4cdO3bA1tYW7u7uimUymQz169fHjh07AEApiercuTM2bdqEli1bYtiwYahRowb09fXx6NEjHDt2DH5+fmjTpg2cnZ0xffp0TJgwAQ8ePEDz5s1RtGhRPH36FBcuXICpqSmmTZumEpe7uztOnjyJJk2aoH79+jh8+HCuoycNGzaEEALnz59XDEUB787ZmzhxIiZNmgQrKyv4+vri4sWLmDp1KgICAj76g25gYJBjopcXUj7b3bp1w+rVq9G1a1f07dsXSUlJmDdvnqTj+N90dHQwY8YMBAQEoE2bNujbty9evHiBqVOnqgzntWrVCgsXLkSXLl3Qr18/JCUlYf78+R88ZcTS0hLdu3fHypUr4eTk9MH84dy5c9DV1UX9+vXVD1ztU9A/s6tXr4oePXqIUqVKCQMDA2FqaiqqVq0qJk+eLBISEhT1PjTZZk5XDjx79kwMHTpUuLi4CH19fWFlZSW8vLzEhAkTlC4vffPmjZg8ebIoU6aMMDAwENbW1qJRo0aKy2yFyPkKmS1btgg9PT3Rq1evj05K9vLlSzF79mzh6ekpTExMhImJifDw8BAzZ87M8VLX5ORkERAQIOzt7YWpqalo3bq1iI6OVrk6Twghxo0bJxwcHBRXWBw7dkycPXtWtGnTRjg5OQlDQ0NhbW0tGjRoIHbt2pVrnNmvJf51hYeOjo5wcHAQLVq0yHUGW2dnZ+Hu7v7R9rN9aIqDbDVr1lS6JDYqKkr07t1blChRQujr6wtbW1tRu3ZtMXPmTKX1bt68KZo0aSKMjIyElZWV6NOnj2ICz2vXrinqvX9VyZ49e0SLFi1EiRIlhIGBgbCzsxMtW7ZUmSR1y5Ytonz58kJfX1/p/chtYr7bt2+Lpk2biiJFioiiRYuKDh06iJiYGLXfTyFyPsaTkpLEgAEDRPHixYWenp5wcnIS48aNU5m4NqfjVwj1Jnw8d+6cGDRokKhSpYqwsrISurq6wtbWVjRv3lzs27dPqW56eroICAgQtra2QiaTKV1puHv3blGlShVhZGQkSpQoIb7//nuxf/9+lauInj9/Ltq3by8sLS0VbWSLi4sT7du3F1ZWVsLCwkJ07dpVMdN19tV5T58+FT179hTly5cXpqamwszMTHh4eIhFixZ99LL0R48eiXbt2omiRYuKIkWKiObNm4ubN2+qvE7ZV+e9PxN1TlcQpqeni1GjRgk7OzthZGQkatWqJc6ePStpss1/fxbNzc1F5cqVRb9+/RTTP7xvzpw5wtnZWRgaGgp3d3exZs0axVVyOQkODhYAxO+///7ReLLJ5XIxd+5cUbZsWaGvry9sbGxE165dVaYwkHJ1XraOHTsK4N1M7e9bvHixACBKlCihsiwjI0PMnz9fcZyZmZmJ8uXLi/79+4t79+4p1d25c6fw8fER5ubmwtDQUDg5OYn27duLw4cPK+rk9Jl+9OiRKF++vHB2ds51ehC5XC6cnZ1VrmDMtmTJElG2bFlhYGAgSpUqJaZMmaKY7uRzkfLZFuLdRMju7u7CyMhIVKhQQYSEhHzw6rwff/xRZf2cvu/Wrl2r+O0tW7asWL9+fY6Tba5fv16UK1dOGBoaCldXVxEUFKSYADqnq5nDw8MFADFnzpwP7n+9evVE69atc32N3if7/x0h0qjr16+jSpUqWL58OQIDAws6HBX9+vXDli1bkJSUBAMDg4IOh0irtGvXDufOnUN0dLRiMl36dAsWLMCsWbPw+PFjGBsbF3Q4X5RRo0Zh5cqViI2NzfEijvv376NMmTI4ePCg4nxkdWjdcB4Vbvfv38fDhw8xfvx4FC9eXK0bqua36dOnw8HBAa6urnj58iX27NmDtWvXYuLEiUygiP5feno6IiIicOHCBezYsQMLFy5kAqVhgwYNwrJly7B8+XKMHj26oMP5Ipw7dw53797FihUr0L9//w9eBTtz5kw0btxYUgIFMIkiDZsxY4biViLbtm2DiYlJQYcEfX19/Pjjj3j06BEyMzNRpkwZLFy4EMOGDSvo0Ii0RlxcHGrXrg1zc3P0798fQ4YMKeiQ/nOMjIzw66+/qnU7GdIMb29vmJiY4Ouvv8bMmTNzrJOZmYnSpUurdXHY+zicR0RERJQHWjfFAREREVFhwCSKiIiIKA+YRBERERHlwRd3YnlWVhaePHmCIkWKaPQGv0RERJR/xP9PLqvu/W0/hy8uiXry5IlGb+xLREREn09sbKxG7637Kb64JCr7dhmxsbF5np6eiIiIPq+UlBQ4OjrmeNurgvLFJVHZQ3jm5uZMooiIiAoZbToVRzsGFYmIiIgKGSZRRERERHnAJIqIiIgoD764c6KIiKhwkMvlyMjIKOgw6DMyMDDQmukL1MEkioiItIoQAvHx8Xjx4kVBh0KfmY6ODlxcXGBgYFDQoaiFSRQREWmV7ATKzs4OJiYmWnU1FuWf7Mmw4+LiUKpUqULxvjOJIiIirSGXyxUJlLW1dUGHQ5+Zra0tnjx5gszMTOjr6xd0OB9VeAYeiYjoPy/7HCgTE5MCjoQKQvYwnlwuL+BI1MMkioiItE5hGMohzSts77uk4bw7d+5gy5YtOHnyJKKjo/H69WvY2tqiatWqaNasGdq1awdDQ8P8ipWIiIhIa6jVE3XlyhU0bdoUVapUwYkTJ/DVV19h+PDhmDFjBrp27QohBCZMmAAHBwfMnTsX6enpam38xIkTaN26NRwcHCCTybBz586PrnP8+HF4eXnByMgIrq6uWLVqlVrbIiIi+i8KDw+HTCbL96sZe/bsCX9//3zdRmGjVk+Uv78/vv/+e4SEhMDKyuqD9c6ePYtFixZhwYIFGD9+/EfbffXqFapUqYJevXqhXbt2H60fFRWFli1bom/fvvjtt99w+vRpBAYGwtbWVq31iYio8Gq/budn29b2Pv6S10lISMCkSZOwf/9+PH36FEWLFkWVKlUwdepUeHt7az7I/1e7dm3ExcXBwsIi37ZBOVMribp3755aczZ4e3vD29sbb9++VWvjLVq0QIsWLdSqCwCrVq1CqVKlsHjxYgCAu7s7Ll26hPnz5zOJIiKiAtWuXTtkZGTgl19+gaurK54+fYojR47g+fPneWpPCAG5XA49vdx/qg0MDFCsWLE8bYM+jVrDeVInvcqvSbLOnj0LX19fpbJmzZrh0qVLnNWWiIgKzIsXL3Dq1CnMnTsXPj4+cHJyQo0aNTBu3Di0atUK0dHRkMlkuHr1qtI6MpkM4eHhAP43LHfw4EFUr14dhoaGWLduHWQyGf766y+l7S1cuBDOzs4QQigN5yUnJ8PY2BgHDhxQqh8aGgpTU1O8fPkSAPD48WN06tQJRYsWhbW1Nfz8/BAdHa2oL5fLMXLkSFhaWsLa2hpjxoyBECJfXrvCTNLVeUlJSTh27Jgiq05MTMTcuXMxffp0REZG5kuA/xYfHw97e3ulMnt7e2RmZiIxMTHHddLT05GSkqL0ICIi0iQzMzOYmZlh586dap8X/CFjxoxBUFAQIiMj0b59e3h5eWHTpk1KdTZv3owuXbqoXM1mYWGBVq1a5Vjfz88PZmZmeP36NXx8fGBmZoYTJ07g1KlTMDMzQ/PmzRUjSQsWLMD69euxbt06nDp1Cs+fP8eOHTs+ab/+i9S+Ou/ChQvw9fVFSkoKLC0tERYWhg4dOkBPTw9CCMyZMwenTp1CtWrV8jNelQMmOzP+0GWRQUFBmDZtWr7G9G8/BPz52bZF2mnuWr8C3T6PQSrMx2ARSz008i8GY4MU6OmmaTAqaR5Fv5C8zoJ5yzFm3DCsXLkKlSt5oGbNOvD7ui3c3Ssh7tG7f+CfPknFI8t3bSenJAMAnsW/xKPoF3gW/66XaOigH+Be5isAwJtUoFWLtgjeuAb9+4wCADx48DcuX76MeUHLlNZ7HJOMly+AZk39MWLUQNyLfAJjYxOkpqZgz569+HnlL3gU/QK/b/0NWXJg6sT5it/OGVMWoWIVZ2z7fQ8a1G+EBQsWIXDAcNT0agwAmDh2Dvbt2483rzPUem1KOltKfv0KI7V7oiZMmIAOHTogOTkZ48ePh7+/Pxo3boy7d+/i3r176NKlC2bMmJGfsaJYsWKIj49XKktISICent4HZ7YdN24ckpOTFY/Y2Nh8jZGIiL5MLVt8g0vnI7F+zWY0qN8Y586dQovWDbF1+2ZJ7Xh4eCo9/6Z1Wzx+HIuIKxcBADv+3IaKFSqjbJnyOa7f2McXerq6OHR4PwBg34HdMDMzQ/16jQAAN25cRfTDByhfyRHlKpZEuYolUbmqK9LT0/AwJgopKclISIhHtapfKdrU09ODR+WqkvbjS6B2EnX58mWMHDkSRYoUwbBhw/DkyRP07dtXsXzQoEG4ePFivgSZzdvbG2FhYUplhw4dQvXq1T84PbyhoSHMzc2VHkRERPnByNAI9ev5YPjQMdj5xyF0aNcFCxcFQef/e3z+fV5R5gfO5TUxMVV6bm9XDLVr1cPOP7cDAP7c9Qfa+Hf8YAwGBgZo2cLvf/X/3I7WrdooTlDPElmoXMkTB/aeUHocP3oJ/t+0z/vOf4HUTqLevn0LY2NjAIC+vj5MTExgY2OjWG5tbY2kpCRJG3/58iWuXr2qONEuKioKV69eRUxMDIB3vUjdu3dX1B8wYAAePnyIkSNHIjIyUjFeO3r0aEnbJSIi+hzKlCmH129ew8r63e9lQsL/RlNu3b6hdjv+/h2we88OXI64gIcxUfimddtc67fx74DjJ47gzt1InDl3Em38OyiWVa5YBVHR92FjbQMXZ1elh7m5BczNLWBnVwxXrlxSrJOZmYkbN6+qHe+XQu0kytHREQ8ePFA8//3331G8eHHF87i4OKWkSh2XLl1C1apVUbXquy7CkSNHomrVqpg8ebKizeyECgBcXFywb98+hIeHw9PTEzNmzMBPP/3E6Q2IiKhA/fPPc3Tq8g1Cd4QgMvImYmIfYs/enVi1+if4Nm0JYyNjVKv6FZavXIy79/7CufOn8eOCWWq336LZ10h9mYrxE0ehtnc9FC/mkGv9WjXrwMbGFkOH90PJkqWUhuba+HeAVVFr9On3Hc5fOIOY2Ic4e+40pkwbi7i4xwCAPr36Y/mqxdh/cA/+vn8XEyaN5oVZOVD7xPLOnTsjISFB8bxVq1ZKy3ft2oUaNWpI2njDhg1zvWQyODhYpaxBgwaIiIiQtB0iIqL8ZGJiiqqeXli7fiUePoxCRmYmHIqXwLedu2PwoJEAgPnzlmL0mCFo9U0jlHZ1w/ix0/Bd99x7lLIVKWKOpo2bY8++nZg/b9lH68tkMvi1bodVPy/F8KFjlJYZG5tg+9a9CJozFf0Gdserly9hX6w46tZuADOzIgCAfgGDkZDwFKO+D4SOTAcdO3RFc99WSEllIvVvMqGhiR9ev34NXV1drb93XkpKCiwsLJCcnJwv50fxyigqzFdG0X9DYT4Gs6/OcyjuCD3d/JlzkPJfXq/OS0tLQ1RUFFxcXGBkZKS0LL9/v/NC0g2Ic2NiYqKppoiIiIi0nqTJNj9m+vTpOHHihCabJCIiItJKGk2iNmzYgObNm6N169aabJaIiIhI62hsOA94N0VBWloajh8/rslmiYiIiLSORnuiAMDIyAjNmjXTdLNEREREWkXjSRQRERHRl0DtJCojIwNjxoyBm5sbatSogQ0bNigtf/r0KXR1dTUeIBEREZE2UjuJmjVrFjZu3IgBAwbA19cXI0aMQP/+/ZXqaGjKKSIiIiKtp/aJ5Zs2bcLatWvx9ddfAwB69eqFFi1aoFevXli/fj2AdzOkEhEREX0J1O6Jevz4MSpVqqR4Xrp0aYSHh+Ps2bPo1q0b5HJ5vgRIRERE+c+7rgfWrl+Zr9sIDw+HTCbDixcv8nU7n4vaPVHFihXD/fv34ezsrChzcHDA0aNH4ePjgx49euRHfERERACApTM/3/Q5QyY2kLxOh85fo2KFypg6OUip/MChvejbvytio/7RVHj5Ys+fR3n3EYnU7olq1KgRNm/erFKenUhFR0drMi4iIiLSgLdv36pVz9raBsbGTKKkUDuJmjRpEjp27JjjshIlSuDEiROKc6OIiIhI1cLFc9CsZT38Efo7vOt6oIJHKQQO6Y2XL1MVdfbu+xNNmteGW/niqFzVFd929cfr168AvOvtmjp9nFKbffp9hxGjAxXPvet6YMnS+RgxOhAVPErhh3HD4NfWF0Fzpyqtl5SUCJcytjhz9qRivezhvEFD+yBwSG+l+hkZGfCoVhoh2zYBeHcx2cpVS1CnvifcyheHb4u62LtP+ebT+/btQ9myZWFsbAwfH5//XIeL2kmUk5NTrpNoFi9enEN6REREH/EwJhoHw/Zhw7rfsWHt7zh//gyWr1wMAHiaEI/BwwLQqUNXHDt8Hlu37EbzZl9Lvvp99ZqfUK6sO/buCsfQId+jjV97/LnrD6V2du8JhY2NHWrVrKOyfhu/Dgg7fACvXr1UlB0/cQSvX79Gy+bvbu02b/5MbN2+GbNmLMCRQ2cR0CcQw0b0x9lzpwEAsbGxaNu2LVq2bImrV68iICAAY8eOlfpyaTWN3vblxIkTqFKlCiwsLDTZLBER0X9GVlYWFv64HGZmRQAAbdt0xOkzJwAACQlPkZmZiRbNvkbJkqUAAO7lK0reRm3v+hjQb4jiuXmRtpg2cwIuXDyLmjVqAwB27toO/2/aQUdHtT+lQf3GMDExwYGDe9CubWdF/aaNm6NIEXO8fv0Ka9atQMjmP+FVrQYAwKmUMy5ePIdNWzagQ+dWWLlyJVxdXbFo0SLIZDKUK1cON27cwNy5cyXvj7bS6IzlDRs2hKurKxYsWKDJZomIiP4zHEuWUiRQAGBnVwxJSc8AABXcK6FunQZo2qIuBgT2xOYtv+BF8gvJ2/Co7Kn03NraBvXqNsTOP7cBAGJiH+JyxEW08euQ4/r6+vpo1dIPO/7cDgB4/foVDoXth///17977w7S09PQpVtblKtYUvH4Y8fvePgwGgAQGRmJWrVqKU1/5O3tLXlftJnGb0AcFRWFgwcParJZIiIirWdmVgQpqSkq5SkpyShS5H9Jk56e8k+vTCZDVlYWAEBXVxebf92BS5fP48TJY9jwy8+Yt2Amdu04jFKOTtDR0VEZ2svMzFDZpomJqUpZG78OmDJ9HKZPnYedf25D2bLlUaFC5Q/uTxu/DujQ+WskJj7DyVPHYGhoCJ+GTQAA4v/jDV4XgmLFiiutZ2hg8K7OFzABt0Z7opycnNCwYUMEBQV9vDIREdF/iFvpMrh+44pK+bVrEXB1KaN2OzKZDF9Vr4VRI8bhwN4T0Nc3wIGDewAAVlY2SEh4qqgrl8tx506kWu02822F9PR0hB8/jD93/YG2/jlfLJatuldNOBQvgd17dmDHn9vwdUs/GPx/glSmTDkYGhjiyZNYuDi7Kj0cHEoCACpUqIBz584ptfn+88KONyAmIiLSgO7dAvDwYTQmTBqN27dv4MGDvxG8cQ1Ctv6mdH5Sbq5cuYSlyxfg2vUrePw4FvsP7Mbz54ko41YWAFCndj0cOXYIR44exN/372LCpFFISU1Wq20TE1P4Nm2B+Qtn497fd+D/Tftc68tkMvh90x6/bd6Ak6fC0eZfSZeZWRH06zsY02ZOwLY/tiD6YRRu3rqO4I1rsO2PLQCAAQMG4P79+xg5ciTu3LmDzZs3Izg4WK1YCwuNDeddu3YN1apV48zlRET0RXIsWQp/bN2HefNn4rvu7ZD+Ng0uLm5YMH85vm7lr1YbZkWK4PyFs1i3YRVepqaiRAlHTBo/Az4NmwIAOnXoituRNzF81EDo6eohoM9AeNeqp3aMbfw6oEfvTqhZozZKlHD8eH3/Dli2YiFKlnDEV9VrKS37ftQE2FjbYvmKRYiJjYa5uQUqVayCwYEjAAClSpXCH3/8gREjRmDFihWoUaMGZs+ejd69e+e0qUJJJjQ0aHnt2jVUrVpVMa6rrVJSUmBhYYHk5GSYm5trvP0fAv78eCX6T5u71q9At89jkArzMVjEUg+N/IvBobgj9HQNNBgVfU4lnS3ztF5aWhqioqLg4uICIyMjpWX5/fudF2r3RLVt2zbX5cnJybwBMREREX0x1E6idu/ejaZNm8Le3j7H5RzGIyIioi+J2kmUu7s72rVrhz59+uS4/OrVq9izZ4/GAiMiIiLSZmpfnefl5YWIiIgPLjc0NESpUqU0EhQRERGRtlO7J2rVqlW5Dtm5u7sjKipKI0EREdGXSQgA4suYqJFUFbb3Xe0kytDQMD/jICIiQtprOeTyLGTK30Jfj787X5q3b98CeDdze2GgVhL16tUrmJqqTiGvqfpEREQAkJkhcD8yFQaGerCyAvR0DXjldyGUlpYmeZ2srCw8e/YMJiYmKrfG0VZqRenm5oYhQ4agZ8+ecHBwyLGOEAKHDx/GwoULUb9+fYwbN06jgRIR0Zch8nIqAKC0eyZ0dXUA5lCFzpu3JnlaT0dHB6VKlSo0ibNaSVR4eDgmTpyIadOmwdPTE9WrV4eDgwOMjIzwzz//4Pbt2zh79iz09fUxbtw49OvXL7/jJiKi/7DIy6m4d/0ljEx0UUh+T+lfRs9snKf1DAwMoKNTeO5Ip1YSVa5cOWzbtg2PHj3Ctm3bcOLECZw5cwZv3ryBjY0NqlatijVr1qBly5aFaueJiEh7ZWYIvEzOLOgwKA/en238v0rSoGPJkiUxYsQIjBgxIr/iISIiIioUPrnbSC6X4+rVq/jnn380EQ8RERFRoSA5iRo+fDjWrVsH4F0CVb9+fVSrVg2Ojo4IDw/XdHxEREREWklyErV9+3ZUqVIFwLv76UVHR+Ovv/7C8OHDMWHCBI0HSERERKSNJCdRiYmJKFasGABg37596NChA8qWLYs+ffrgxo0bGg+QiIiISBtJTqLs7e1x+/ZtyOVyHDhwAE2aNAEAvH79utDMMEpERET0qSRPCdqrVy907NgRxYsXh0wmQ9OmTQEA58+fR/ny5TUeIBEREZE2kpxETZ06FZUqVUJsbCw6dOiguKeerq4uxo4dq/EAiYiIiLRRnm5O0759e6XnL168QI8ePTQSEBEREVFhIPmcqLlz5yIkJETxvGPHjrC2tkbJkiVx/fp1jQZHREREpK0kJ1GrV6+Go6MjACAsLAxhYWHYv38/mjdvjtGjR2s8QCIiIiJtJHk4Ly4uTpFE7dmzBx07doSvry+cnZ1Rs2ZNjQdIREREpI0k90QVLVoUsbGxAKA0xYEQAnK5XLPREREREWkpyT1Rbdu2RZcuXVCmTBkkJSWhRYsWAICrV6/Czc1N4wESERERaSPJSdSiRYvg7OyM2NhYzJs3D2ZmZgDeDfMFBgZqPEAiIiIibSQ5idLX18/xBPLhw4drIh4iIiKiQiFP80QBwO3btxETE4O3b98qlX/zzTefHBQRERGRtpOcRD148ABt2rTBjRs3IJPJIIQAAMhkMgDgyeVERET0RZB8dd6wYcPg4uKCp0+fwsTEBLdu3cKJEydQvXp1hIeH50OIRERERNpHck/U2bNncfToUdja2kJHRwc6OjqoW7cugoKCMHToUFy5ciU/4iQiIiLSKpJ7ouRyueKKPBsbGzx58gQA4OTkhDt37mg2OiIiIiItJbknqlKlSrh+/TpcXV1Rs2ZNzJs3DwYGBvj555/h6uqaHzESERERaR3JSdTEiRPx6tUrAMDMmTPx9ddfo169erC2tla6MTERERHRf5nkJKpZs2aKv11dXXH79m08f/4cRYsWVVyhR0RERPRfl+d5ov7NyspKE80QERERFRpqJVFt27ZVu8HQ0NA8B0NERERUWKiVRFlYWOR3HERERESFilpJ1IYNG/I7DiIiIqJCRfI8UVFRUbh3755K+b179xAdHS05gBUrVsDFxQVGRkbw8vLCyZMnc62/adMmVKlSBSYmJihevDh69eqFpKQkydslIiIi+hSSk6iePXvizJkzKuXnz59Hz549JbUVEhKC4cOHY8KECbhy5Qrq1auHFi1aICYmJsf6p06dQvfu3dGnTx/cunUL27Ztw8WLFxEQECB1N4iIiIg+ieQk6sqVK6hTp45Kea1atXD16lVJbS1cuBB9+vRBQEAA3N3dsXjxYjg6OmLlypU51j937hycnZ0xdOhQuLi4oG7duujfvz8uXbokdTeIiIiIPonkJEomkyE1NVWlPDk5GXK5XO123r59i8uXL8PX11ep3NfXN8eeLgCoXbs2Hj16hH379kEIgadPn2L79u1o1arVB7eTnp6OlJQUpQcRERHRp5KcRNWrVw9BQUFKCZNcLkdQUBDq1q2rdjuJiYmQy+Wwt7dXKre3t0d8fHyO69SuXRubNm1Cp06dYGBggGLFisHS0hJLly794HaCgoJgYWGheDg6OqodIxEREdGHSE6i5s2bh6NHj6JcuXLo1asXevXqhXLlyuHEiRP48ccfJQfw/iznQogPznx++/ZtDB06FJMnT8bly5dx4MABREVFYcCAAR9sf9y4cUhOTlY8YmNjJcdIRERE9D7JM5ZXqFAB165dw/Lly3Ht2jUYGxuje/fuGDx4sKSZy21sbKCrq6vS65SQkKDSO5UtKCgIderUwffffw8A8PDwgKmpKerVq4eZM2eiePHiKusYGhrC0NBQwh4SERERfVyebvtSokQJzJ49+5M2bGBgAC8vL4SFhaFNmzaK8rCwMPj5+eW4zuvXr6Gnpxyyrq4ugHc9WERERESfi9rDea9fv8agQYNQokQJ2NnZoUuXLkhMTPykjY8cORJr167F+vXrERkZiREjRiAmJkYxPDdu3Dh0795dUb9169YIDQ3FypUr8eDBA5w+fRpDhw5FjRo14ODg8EmxEBEREUmhdk/UlClTEBwcjO+++w5GRkbYsmULBg4ciG3btuV54506dUJSUhKmT5+OuLg4VKpUCfv27YOTkxMAIC4uTmnOqJ49eyI1NRXLli3DqFGjYGlpiUaNGmHu3Ll5joGIiIgoL9ROokJDQ7Fu3Tp07twZANC1a1fUqVMHcrlcMaSWF4GBgQgMDMxxWXBwsErZkCFDMGTIkDxvj4iIiEgT1B7Oi42NRb169RTPa9SoAT09PTx58iRfAiMiIiLSZmonUXK5HAYGBkplenp6yMzM1HhQRERERNpO7eE8IQR69uypNF1AWloaBgwYAFNTU0VZaGioZiMkIiIi0kJqJ1E9evRQKevatatGgyEiIiIqLNROojZs2JCfcRAREREVKpJv+0JERERETKKIiIiI8oRJFBEREVEeMIkiIiIiygPJSdSrV6/yIw4iIiKiQkVyEmVvb4/evXvj1KlT+REPERERUaEgOYnasmULkpOT0bhxY5QtWxZz5szhrV+IiIjoiyM5iWrdujX++OMPPHnyBAMHDsSWLVvg5OSEr7/+GqGhobwNDBEREX0R8nxiubW1NUaMGIFr165h4cKFOHz4MNq3bw8HBwdMnjwZr1+/1mScRERERFpF7RnL3xcfH4+NGzdiw4YNiImJQfv27dGnTx88efIEc+bMwblz53Do0CFNxkpERESkNSQnUaGhodiwYQMOHjyIChUqYNCgQejatSssLS0VdTw9PVG1alVNxklERESkVSQnUb169ULnzp1x+vRpfPXVVznWcXV1xYQJEz45OCIiIiJtJTmJiouLg4mJSa51jI2NMWXKlDwHRURERKTtJCdRJiYmkMvl2LFjByIjIyGTyVC+fHn4+/tDTy/Pp1gRERERFSqSs56bN2/im2++wdOnT1GuXDkAwN27d2Fra4tdu3ahcuXKGg+SiIiISNtInuIgICAAlSpVwqNHjxAREYGIiAjExsbCw8MD/fr1y48YiYiIiLSO5J6oa9eu4dKlSyhatKiirGjRopg1a9YHTzQnIiIi+q+R3BNVrlw5PH36VKU8ISEBbm5uGgmKiIiISNuplUSlpKQoHrNnz8bQoUOxfft2PHr0CI8ePcL27dsxfPhwzJ07N7/jJSIiItIKag3nWVpaQiaTKZ4LIdCxY0dFmRACwLv76snl8nwIk4iIiEi7qJVEHTt2LL/jICIiIipU1EqiGjRokN9xEBERERUqkk8sJyIiIiImUURERER5wiSKiIiIKA+YRBERERHlAZMoIiIiojxQ6+q8qlWrKs0TlZuIiIhPCoiIiIioMFArifL391f8nZaWhhUrVqBChQrw9vYGAJw7dw63bt1CYGBgvgRJREREpG3USqKmTJmi+DsgIABDhw7FjBkzVOrExsZqNjoiIiIiLSX5nKht27ahe/fuKuVdu3bFH3/8oZGgiIiIiLSd5CTK2NgYp06dUik/deoUjIyMNBIUERERkbZTazjv34YPH46BAwfi8uXLqFWrFoB350StX78ekydP1niARERERNpIchI1duxYuLq6YsmSJdi8eTMAwN3dHcHBwejYsaPGAyQiIiLSRpKTKADo2LEjEyYiIiL6ouVpss0XL15g7dq1GD9+PJ4/fw7g3fxQjx8/1mhwRERERNpKck/U9evX0aRJE1hYWCA6OhoBAQGwsrLCjh078PDhQ2zcuDE/4iQiIiLSKpJ7okaOHImePXvi3r17SlfjtWjRAidOnNBocERERETaSnISdfHiRfTv31+lvESJEoiPj9dIUERERETaTnISZWRkhJSUFJXyO3fuwNbWViNBEREREWk7yUmUn58fpk+fjoyMDACATCZDTEwMxo4di3bt2mk8QCIiIiJtJDmJmj9/Pp49ewY7Ozu8efMGDRo0gJubG4oUKYJZs2blR4xEREREWkfy1Xnm5uY4deoUjh49ioiICGRlZaFatWpo0qRJfsRHREREpJUkJ1HR0dFwdnZGo0aN0KhRo/yIiYiIiEjrSR7Oc3V1Rd26dbF69WrFRJtEREREXxrJSdSlS5fg7e2NmTNnwsHBAX5+fti2bRvS09PzIz4iIiIirSQ5iapWrRp+/PFHxMTEYP/+/bCzs0P//v1hZ2eH3r1750eMRERERFonT/fOA95NbeDj44M1a9bg8OHDcHV1xS+//KLJ2IiIiIi0luQTy7PFxsZiy5Yt2Lx5M27cuAFvb28sW7ZMk7EVSve9RUGHQERERJ+B5CTq559/xqZNm3D69GmUK1cO3333HXbu3AlnZ+d8CI+IiIhIO0lOombMmIHOnTtjyZIl8PT0zIeQiIiIiLSf5CQqJiYGMpksP2IhIiIiKjTUSqKuX7+OSpUqQUdHBzdu3Mi1roeHh0YCIyIiItJmaiVRnp6eiI+Ph52dHTw9PSGTySDE/06gzn4uk8kgl8vzLVgiIiIibaHWFAdRUVGwtbVV/P3gwQNERUUpHtnPHzx4IDmAFStWwMXFBUZGRvDy8sLJkydzrZ+eno4JEybAyckJhoaGKF26NNavXy95u0RERESfQq2eKCcnJ8Xftra2MDEx0cjGQ0JCMHz4cKxYsQJ16tTB6tWr0aJFC9y+fRulSpXKcZ2OHTvi6dOnWLduHdzc3JCQkIDMzEyNxENERESkLsknltvZ2cHf3x/dunVD06ZNoaOT5/k6sXDhQvTp0wcBAQEAgMWLF+PgwYNYuXIlgoKCVOofOHAAx48fx4MHD2BlZQUAnFqBiIiICoTkDGjjxo1IT09HmzZt4ODggGHDhuHixYuSN/z27VtcvnwZvr6+SuW+vr44c+ZMjuvs2rUL1atXx7x581CiRAmULVsWo0ePxps3byRvn4iIiOhTSO6Jatu2Ldq2bYvU1FRs374dW7ZsQe3ateHi4oKuXbti8uTJarWTmJgIuVwOe3t7pXJ7e3vEx8fnuM6DBw9w6tQpGBkZYceOHUhMTERgYCCeP3/+wfOi0tPTlW6OnJKSouaeEhEREX1YnsfiihQpgl69euHQoUO4du0aTE1NMW3aNMntvD/nVPZVfjnJysqCTCbDpk2bUKNGDbRs2RILFy5EcHDwB3ujgoKCYGFhoXg4OjpKjpGIiIjofXlOotLS0rB161b4+/ujWrVqSEpKwujRo9Ve38bGBrq6uiq9TgkJCSq9U9mKFy+OEiVKwMLCQlHm7u4OIQQePXqU4zrjxo1DcnKy4hEbG6t2jEREREQfIjmJOnToEHr06AF7e3sMGDAAdnZ2OHjwIGJiYjB37ly12zEwMICXlxfCwsKUysPCwlC7du0c16lTpw6ePHmCly9fKsru3r0LHR0dlCxZMsd1DA0NYW5urvQgIiIi+lSSkyh/f3+8efMGv/zyC54+fYqff/4ZDRo0yNPGR44cibVr12L9+vWIjIzEiBEjEBMTgwEDBgB414vUvXt3Rf0uXbrA2toavXr1wu3bt3HixAl8//336N27N4yNjfMUAxEREVFeSDqxPDMzE3PmzEGHDh1QvHjxT954p06dkJSUhOnTpyMuLg6VKlXCvn37FPNSxcXFISYmRlHfzMwMYWFhGDJkCKpXrw5ra2t07NgRM2fO/ORYiIiIiKSQlETp6elh7Nix8PPz01gAgYGBCAwMzHFZcHCwSln58uVVhgCJiIiIPjfJw3k1a9bElStX8iMWIiIiokJD8jxRgYGBGDVqFB49egQvLy+YmpoqLffw8NBYcERERETaSnIS1alTJwDA0KFDFWUymUwxv5NcLtdcdERERERaSnISFRUVlR9xEBERERUqkpOo7CvniIiIiL5kkpOojRs35rr83/M6EREREf1XSU6ihg0bpvQ8IyMDr1+/hoGBAUxMTJhEERER0RdB8hQH//zzj9Lj5cuXuHPnDurWrYstW7bkR4xEREREWifPNyD+tzJlymDOnDkqvVRERERE/1UaSaIAQFdXF0+ePNFUc0RERERaTfI5Ubt27VJ6LoRAXFwcli1bhjp16mgsMCIiIiJtJjmJ8vf3V3ouk8lga2uLRo0aYcGCBZqKi4iIiEirSU6isrKy8iMOIiIiokLlk8+JksvluHr1Kv755x9NxENERERUKEhOooYPH45169YBeJdA1a9fH9WqVYOjoyPCw8M1HR8RERGRVpKcRG3fvh1VqlQBAOzevRvR0dH466+/MHz4cEyYMEHjARIRERFpI8lJVGJiIooVKwYA2LdvHzp06ICyZcuiT58+uHHjhsYDJCIiItJGkpMoe3t73L59G3K5HAcOHECTJk0AAK9fv4aurq7GAyQiIiLSRpKvzuvVqxc6duyI4sWLQyaToWnTpgCA8+fPo3z58hoPkIiIiEgbSU6ipk6dikqVKiE2NhYdOnSAoaEhgHczlo8dO1bjARIRERFpI8lJFAC0b99epaxHjx6fHAwRERFRYZGnJOrIkSM4cuQIEhISVCbfXL9+vUYCIyIiItJmkpOoadOmYfr06ahevbrivCgiIiKiL43kJGrVqlUIDg5Gt27d8iMeIiIiokJB8hQHb9++Re3atfMjFiIiIqJCQ3ISFRAQgM2bN+dHLERERESFhuThvLS0NPz88884fPgwPDw8oK+vr7R84cKFGguOiIiISFtJTqKuX78OT09PAMDNmzeVlvEkcyIiIvpSSE6ijh07lh9xEBERERUqks+JIiIiIqI89ET5+PjkOmx39OjRTwqIiIiIqDCQnERlnw+VLSMjA1evXsXNmzd56xciIiL6YkhOohYtWpRj+dSpU/Hy5ctPDoiIiIioMNDYOVFdu3blffOIiIjoi6GxJOrs2bMwMjLSVHNEREREWk3ycF7btm2VngshEBcXh0uXLmHSpEkaC4yIiIhIm0lOoiwsLJSe6+jooFy5cpg+fTp8fX01FhgRERGRNlM7ibp79y7Kli2LDRs25Gc8RERERIWC2klU1apVUapUKXzzzTfw9/eHt7d3fsZFRHl031sUdAhERF8EtU8sT0pKwrx585CUlIQ2bdrA3t4effr0wa5du5CWlpafMRIRERFpHbWTKCMjI7Ru3Rpr165FXFwcduzYAVtbW4wdOxbW1tbw8/PD+vXrkZCQkJ/xEhEREWmFPE1xIJPJULt2bcyZMwe3b9/G1atXUb9+fQQHB8PR0RHLly/XdJxEREREWkXy1Xk5KVOmDEaNGoVRo0YhKSkJz58/10SzRERERFpLrSRq165dajUmk8nQunVrWFtbf1JQRERERNpOrSTK399f6blMJoMQQul5NrlcrpnIiIiIiLSYWudEZWVlKR6HDh2Cp6cn9u/fjxcvXiA5ORn79u1DtWrVcODAgfyOl4iIiEgrSD4navjw4Vi1ahXq1q2rKGvWrBlMTEzQr18/REZGajRAIiIiIm0k+eq8+/fvq9z6BXh3O5jo6GhNxERERESk9SQnUV999RWGDx+OuLg4RVl8fDxGjRqFGjVqaDQ4IiIiIm0lOYnKnlDTyckJbm5ucHNzQ6lSpRAXF4d169blR4xEREREWkfyOVFubm64fv06wsLC8Ndff0EIgQoVKqBJkyZKV+kRERER/ZflabJNmUwGX19f1K9fH4aGhkyeiIiI6IsjeTgvKysLM2bMQIkSJWBmZoaoqCgAwKRJkzicR0RERF8MyUnUzJkzERwcjHnz5sHAwEBRXrlyZaxdu1ajwRERERFpK8lJ1MaNG/Hzzz/ju+++g66urqLcw8MDf/31l0aDIyIiItJWkpOox48fw83NTaU8KysLGRkZGgmKiIiISNtJTqIqVqyIkydPqpRv27YNVatW1UhQRERERNpO8tV5U6ZMQbdu3fD48WNkZWUhNDQUd+7cwcaNG7Fnz578iJGIiIhI60juiWrdujVCQkKwb98+yGQyTJ48GZGRkdi9ezeaNm2aHzESERERaZ08zRPVrFkzNGvWTNOxEBERERUaknuievXqhSNHjkAIoZEAVqxYARcXFxgZGcHLyyvH861ycvr0aejp6cHT01MjcRARERFJITmJSkpKQqtWrVCyZEmMGjUKV65cyfPGQ0JCMHz4cEyYMAFXrlxBvXr10KJFC8TExOS6XnJyMrp3747GjRvnedtEREREn0JyErVr1y7Ex8djypQpuHz5MqpXr44KFSpg9uzZiI6OltTWwoUL0adPHwQEBMDd3R2LFy+Go6MjVq5cmet6/fv3R5cuXeDt7S01fCIiIiKNkJxEAYClpSX69euH8PBwPHz4EL169cKvv/6a4/xRH/L27VtcvnwZvr6+SuW+vr44c+bMB9fbsGED7t+/jylTpqi1nfT0dKSkpCg9iIiIiD5VnpKobBkZGbh06RLOnz+P6Oho2Nvbq71uYmIi5HK5yjr29vaIj4/PcZ179+5h7Nix2LRpE/T01DsnPigoCBYWFoqHo6Oj2jESERERfUiekqhjx46hb9++sLe3R48ePVCkSBHs3r0bsbGxktuSyWRKz4UQKmUAIJfL0aVLF0ybNg1ly5ZVu/1x48YhOTlZ8chLjERERETvkzzFQcmSJZGUlIRmzZph9erVaN26NYyMjCRv2MbGBrq6uiq9TgkJCTn2aKWmpuLSpUu4cuUKBg8eDODdrWaEENDT08OhQ4fQqFEjlfUMDQ1haGgoOT4iIiKi3EhOoiZPnowOHTqgaNGin7RhAwMDeHl5ISwsDG3atFGUh4WFwc/PT6W+ubk5bty4oVS2YsUKHD16FNu3b4eLi8snxUNEREQkheQkql+/fhrb+MiRI9GtWzdUr14d3t7e+PnnnxETE4MBAwYAeDcU9/jxY2zcuBE6OjqoVKmS0vp2dnYwMjJSKSciIiLKb2olUW3btkVwcDDMzc3Rtm3bXOuGhoaqvfFOnTohKSkJ06dPR1xcHCpVqoR9+/bByckJABAXF/fROaOIiIiICoJaSZSFhYXiZG9zc/McT/zOq8DAQAQGBua4LDg4ONd1p06diqlTp2osFiIiIiJ1qZVEbdiwQfH3xxIbIiIioi+B5CkOpk2bhvv37+dHLERERESFhuQk6o8//kDZsmVRq1YtLFu2DM+ePcuPuIiIiIi0muQk6vr167h+/ToaNWqEhQsXokSJEmjZsiU2b96M169f50eMRERERFonTzOWV6xYEbNnz8aDBw9w7NgxuLi4YPjw4ShWrJim4yMiIiLSSp907zwAMDU1hbGxMQwMDJCRkaGJmIiIiIi0Xp6SqKioKMyaNQsVKlRA9erVERERgalTp37wxsFERERE/zWSZyz39vbGhQsXULlyZfTq1QtdunRBiRIl8iM2IiIiIq0lOYny8fHB2rVrUbFixfyIh4iIiKhQkDScl5GRgd9//12jM5YTERERFUaSkih9fX2kp6cziSIiIqIvnuQTy4cMGYK5c+ciMzMzP+IhIiIiKhQknxN1/vx5HDlyBIcOHULlypVhamqqtDw0NFRjwRERERFpK8lJlKWlJdq1a5cfsRAREREVGpKTqA0bNuRHHERERESFyifPWE5ERET0JZLcE+Xi4pLr1XkPHjz4pICIiIiICgPJSdTw4cOVnmdkZODKlSs4cOAAvv/+e03FRURERKTVJCdRw4YNy7F8+fLluHTp0icHRERERFQYaOycqBYtWuCPP/7QVHNEREREWk1jSdT27dthZWWlqeaIiIiItJrk4byqVasqnVguhEB8fDyePXuGFStWaDQ4IiIiIm0lOYny9/dXeq6jowNbW1s0bNgQ5cuX11RcRERERFpNchI1ZcqU/IiDiIiIqFCRfE5UREQEbty4oXj+559/wt/fH+PHj8fbt281GhwRERGRtpKcRPXv3x93794F8G5izU6dOsHExATbtm3DmDFjNB4gERERkTaSnETdvXsXnp6eAIBt27ahQYMG2Lx5M4KDgznFAREREX0xJCdRQghkZWUBAA4fPoyWLVsCABwdHZGYmKjZ6IiIiIi0lOQkqnr16pg5cyZ+/fVXHD9+HK1atQIAREVFwd7eXuMBEhEREWkjyUnU4sWLERERgcGDB2PChAlwc3MD8G6yzdq1a2s8QCIiIiJtJHmKAw8PD6Wr87L9+OOP0NXV1UhQRERERNpOchL1IUZGRppqioiIiEjraezeeURERERfEiZRRERERHmgVhKVkpKS33EQERERFSpqJVFFixZFQkICAKBRo0Z48eJFfsZEREREpPXUSqLMzMyQlJQEAAgPD0dGRka+BkVERESk7dS6Oq9Jkybw8fGBu7s7AKBNmzYwMDDIse7Ro0c1Fx0RERGRllIrifrtt9/wyy+/4P79+zh+/DgqVqwIExOT/I6NiIiISGuplUQZGxtjwIABAIBLly5h7ty5sLS0zM+4iIiIiLSa5Mk2jx07pvhbCAEAkMlkmouIiIiIqBDI0zxRGzduROXKlWFsbAxjY2N4eHjg119/1XRsRERERFpLck/UwoULMWnSJAwePBh16tSBEAKnT5/GgAEDkJiYiBEjRuRHnERERERaRXIStXTpUqxcuRLdu3dXlPn5+aFixYqYOnUqkygiIiL6IkgezouLi0Pt2rVVymvXro24uDiNBEVERESk7SQnUW5ubti6datKeUhICMqUKaORoIiIiIi0neThvGnTpqFTp044ceIE6tSpA5lMhlOnTuHIkSM5JldERERE/0WSe6LatWuH8+fPw8bGBjt37kRoaChsbGxw4cIFtGnTJj9iJCIiItI6knuiAMDLywu//fabpmMhIqL/gPveoqBDIPos8jRPFBEREdGXjkkUERERUR4wiSIiIiLKAyZRRERERHnAJIqIiIgoDyRfnffq1SvMmTMHR44cQUJCArKyspSWP3jwQGPBEREREWkryUlUQEAAjh8/jm7duqF48eKQyWT5ERcRERGRVpOcRO3fvx979+5FnTp18iMeIiIiokJB8jlRRYsWhZWVVX7EQkRERFRoSE6iZsyYgcmTJ+P169f5EQ8RERFRoSA5iVqwYAEOHjwIe3t7VK5cGdWqVVN6SLVixQq4uLjAyMgIXl5eOHny5AfrhoaGomnTprC1tYW5uTm8vb1x8OBBydskIiIi+lSSz4ny9/fX2MZDQkIwfPhwrFixAnXq1MHq1avRokUL3L59G6VKlVKpf+LECTRt2hSzZ8+GpaUlNmzYgNatW+P8+fOoWrWqxuIiIiIi+hjJSdSUKVM0tvGFCxeiT58+CAgIAAAsXrwYBw8exMqVKxEUFKRSf/HixUrPZ8+ejT///BO7d+9mEkVERESfleQkKtvly5cRGRkJmUyGChUqSE5i3r59i8uXL2Ps2LFK5b6+vjhz5oxabWRlZSE1NTXXE93T09ORnp6ueJ6SkiIpTiIiIqKcSE6iEhIS0LlzZ4SHh8PS0hJCCCQnJ8PHxwe///47bG1t1WonMTERcrkc9vb2SuX29vaIj49Xq40FCxbg1atX6Nix4wfrBAUFYdq0aWq1R0RERKQuySeWDxkyBCkpKbh16xaeP3+Of/75Bzdv3kRKSgqGDh0qOYD3J+sUQqg1geeWLVswdepUhISEwM7O7oP1xo0bh+TkZMUjNjZWcoxERERE75PcE3XgwAEcPnwY7u7uirIKFSpg+fLl8PX1VbsdGxsb6OrqqvQ6JSQkqPROvS8kJAR9+vTBtm3b0KRJk1zrGhoawtDQUO24iIiIiNQhuScqKysL+vr6KuX6+voq99HLjYGBAby8vBAWFqZUHhYWhtq1a39wvS1btqBnz57YvHkzWrVqpX7gRERERBokOYlq1KgRhg0bhidPnijKHj9+jBEjRqBx48aS2ho5ciTWrl2L9evXIzIyEiNGjEBMTAwGDBgA4N1QXPfu3RX1t2zZgu7du2PBggWoVasW4uPjER8fj+TkZKm7QURERPRJJCdRy5YtQ2pqKpydnVG6dGm4ubnBxcUFqampWLp0qaS2OnXqhMWLF2P69Onw9PTEiRMnsG/fPjg5OQEA4uLiEBMTo6i/evVqZGZmYtCgQShevLjiMWzYMKm7QURERPRJJJ8T5ejoiIiICISFheGvv/6CEAIVKlT46LlJHxIYGIjAwMAclwUHBys9Dw8Pz9M2iIiIiDQtz/NENW3aFE2bNtVkLERERESFhlpJ1E8//YR+/frByMgIP/30U6518zLNAREREVFho1YStWjRInz33XcwMjLCokWLPlhPJpMxiSIiIqIvglpJVFRUVI5/ExEREX2pJF+dN336dLx+/Vql/M2bN5g+fbpGgiIiIiLSdpKTqGnTpuHly5cq5a9fv+Y96oiIiOiLITmJ+tC97a5duwYrKyuNBEVERESk7dSe4qBo0aKQyWSQyWQoW7asUiIll8vx8uVLxUzjRERERP91aidRixcvhhACvXv3xrRp02BhYaFYZmBgAGdnZ3h7e+dLkERERETaRu0kqkePHsjMzAQANGnSBCVLlsy3oIiIiIi0naRzovT09BAYGAi5XJ5f8RAREREVCpJPLK9ZsyauXLmSH7EQERERFRqS750XGBiIUaNG4dGjR/Dy8oKpqanScg8PD40FR0RERKStJCdRnTp1AqB8jzyZTKaY+oBDfURERPQlkJxE8bYvRERERHlIopycnPIjDiIiIqJCRXISBQD379/H4sWLERkZCZlMBnd3dwwbNgylS5fWdHxEREREWkny1XkHDx5EhQoVcOHCBXh4eKBSpUo4f/48KlasiLCwsPyIkYiIiEjrSO6JGjt2LEaMGIE5c+aolP/www9o2rSpxoIjIiIi0laSe6IiIyPRp08flfLevXvj9u3bGgmKiIiISNtJTqJsbW1x9epVlfKrV6/Czs5OEzERERERaT3Jw3l9+/ZFv3798ODBA9SuXRsymQynTp3C3LlzMWrUqPyIkYiIiEjrSE6iJk2ahCJFimDBggUYN24cAMDBwQFTp05VmoCTiIiI6L9MchIlk8kwYsQIjBgxAqmpqQCAIkWKaDwwIiIiIm2Wp3miACAhIQF37tyBTCZDuXLlYGtrq8m4iIiIiLSa5BPLU1JS0K1bNzg4OKBBgwaoX78+HBwc0LVrVyQnJ+dHjERERERaR3ISFRAQgPPnz2Pv3r148eIFkpOTsWfPHly6dAl9+/bNjxiJiIiItI7k4by9e/fi4MGDqFu3rqKsWbNmWLNmDZo3b67R4IiIiIi0leSeKGtra1hYWKiUW1hYoGjRohoJioiIiEjbSU6iJk6ciJEjRyIuLk5RFh8fj++//x6TJk3SaHBERERE2krycN7KlSvx999/w8nJCaVKlQIAxMTEwNDQEM+ePcPq1asVdSMiIjQXKREREZEWkZxE+fv750MYRERERIWL5CRqypQp+REHERERUaGS58k2L1++jMjISMhkMlSoUAFVq1bVZFxEREREWk1yEpWQkIDOnTsjPDwclpaWEEIgOTkZPj4++P333zlzOREREX0RJF+dN2TIEKSkpODWrVt4/vw5/vnnH9y8eRMpKSm8ATERERF9MST3RB04cACHDx+Gu7u7oqxChQpYvnw5fH19NRocERERkbaS3BOVlZUFfX19lXJ9fX1kZWVpJCgiIiIibSc5iWrUqBGGDRuGJ0+eKMoeP36MESNGoHHjxhoNjoiIiEhbSU6ili1bhtTUVDg7O6N06dJwc3ODi4sLUlNTsXTp0vyIkYiIiEjrSD4nytHREREREQgLC8Nff/0FIQQqVKiAJk2a5Ed8RERERFpJUhKVmZkJIyMjXL16FU2bNkXTpk3zKy4iIiIirSZpOE9PTw9OTk6Qy+X5FQ8RERFRoSD5nKiJEydi3LhxeP78eX7EQ0RERFQoSD4n6qeffsLff/8NBwcHODk5wdTUVGl5RESExoIjIiIi0laSkyg/Pz/IZLL8iIWIiIio0JCcRE2dOjUfwiAiIiIqXNQ+J+r169cYNGgQSpQoATs7O3Tp0gWJiYn5GRsRERGR1lI7iZoyZQqCg4PRqlUrdO7cGWFhYRg4cGB+xkZERESktdQezgsNDcW6devQuXNnAEDXrl1Rp04dyOVy6Orq5luARERERNpI7Z6o2NhY1KtXT/G8Ro0a0NPTU7qHHhEREdGXQu0kSi6Xw8DAQKlMT08PmZmZGg+KiIiISNupPZwnhEDPnj1haGioKEtLS8OAAQOU5ooKDQ3VbIREREREWkjtJKpHjx4qZV27dtVoMERERESFhdpJ1IYNG/IzDiIiIqJCRfK984iIiIiISRQRERFRnjCJIiIiIsoDJlFEREREeVDgSdSKFSvg4uICIyMjeHl54eTJk7nWP378OLy8vGBkZARXV1esWrXqM0VKRERE9D8FmkSFhIRg+PDhmDBhAq5cuYJ69eqhRYsWiImJybF+VFQUWrZsiXr16uHKlSsYP348hg4dij/++OMzR05ERERfugJNohYuXIg+ffogICAA7u7uWLx4MRwdHbFy5coc669atQqlSpXC4sWL4e7ujoCAAPTu3Rvz58//zJETERHRl67Akqi3b9/i8uXL8PX1VSr39fXFmTNnclzn7NmzKvWbNWuGS5cuISMjI99iJSIiInqf2pNtalpiYiLkcjns7e2Vyu3t7REfH5/jOvHx8TnWz8zMRGJiIooXL66yTnp6OtLT0xXPk5OTAQApKSmfugs5ynjzOl/apcIjv44tdfEYJB6DVNDy4xjMblMIofG286rAkqhsMplM6bkQQqXsY/VzKs8WFBSEadOmqZQ7OjpKDZVILRZDCjoC+tLxGKSClp/HYGpqKiwsLPJvAxIUWBJlY2MDXV1dlV6nhIQEld6mbMWKFcuxvp6eHqytrXNcZ9y4cRg5cqTieVZWFp4/fw5ra+tckzWSLiUlBY6OjoiNjYW5uXlBh0NfIB6DVNB4DOYfIQRSU1Ph4OBQ0KEoFFgSZWBgAC8vL4SFhaFNmzaK8rCwMPj5+eW4jre3N3bv3q1UdujQIVSvXh36+vo5rmNoaAhDQ0OlMktLy08LnnJlbm7OLw8qUDwGqaDxGMwf2tIDla1Ar84bOXIk1q5di/Xr1yMyMhIjRoxATEwMBgwYAOBdL1L37t0V9QcMGICHDx9i5MiRiIyMxPr167Fu3TqMHj26oHaBiIiIvlAFek5Up06dkJSUhOnTpyMuLg6VKlXCvn374OTkBACIi4tTmjPKxcUF+/btw4gRI7B8+XI4ODjgp59+Qrt27QpqF4iIiOgLJRPadJo7FWrp6ekICgrCuHHjVIZQiT4HHoNU0HgMflmYRBERERHlQYHfO4+IiIioMGISRURERJQHTKKIiIiI8oBJFOUoOjoaMpkMV69eLehQ6AvG45AKGo9Byg2TKCpwaWlp6NmzJypXrgw9PT34+/sXdEj0BQoPD4efnx+KFy8OU1NTeHp6YtOmTQUdFn1B7ty5Ax8fH9jb28PIyAiurq6YOHEiMjIyCjo0+oACv3cekVwuh7GxMYYOHYo//vijoMOhL9SZM2fg4eGBH374Afb29ti7dy+6d+8Oc3NztG7duqDDoy+Avr4+unfvjmrVqsHS0hLXrl1D3759kZWVhdmzZxd0eJQD9kR94bKysjB37ly4ubnB0NAQpUqVwqxZs1TqyeVy9OnTBy4uLjA2Nka5cuWwZMkSpTrh4eGoUaMGTE1NYWlpiTp16uDhw4cAgGvXrsHHxwdFihSBubk5vLy8cOnSJQCAqakpVq5cib59+6JYsWL5v9OkdbThOBw/fjxmzJiB2rVro3Tp0hg6dCiaN2+OHTt25P8LQAVOG45BV1dX9OrVC1WqVIGTkxO++eYbfPfddzh58mT+vwCUJ+yJ+sKNGzcOa9aswaJFi1C3bl3ExcXhr7/+UqmXlZWFkiVLYuvWrbCxscGZM2fQr18/FC9eHB07dkRmZib8/f3Rt29fbNmyBW/fvsWFCxcUN3n+7rvvULVqVaxcuRK6urq4evXqB+93SF8ebT0Ok5OT4e7unm/7TdpDG4/Bv//+GwcOHEDbtm3zdd/pEwj6YqWkpAhDQ0OxZs0alWVRUVECgLhy5coH1w8MDBTt2rUTQgiRlJQkAIjw8PAc6xYpUkQEBwd/NKYePXoIPz8/teKn/wZtPA6FEGLbtm3CwMBA3Lx5U636VHhp2zHo7e0tDA0NBQDRr18/IZfL1d8Z+qw4nPcFi4yMRHp6Oho3bqxW/VWrVqF69eqwtbWFmZkZ1qxZo7i3oZWVFXr27IlmzZqhdevWWLJkCeLi4hTrjhw5EgEBAWjSpAnmzJmD+/fv58s+UeGjjcdheHg4evbsiTVr1qBixYqfvpOk1bTtGAwJCUFERAQ2b96MvXv3Yv78+ZrZUdK8gs7iqOBcv35dABAPHjxQWfb+f18hISHCyMhILF++XERERIh79+6Jfv36iSpVqiitFxERIWbPni28vb2FmZmZOHv2rGLZnTt3xMKFC0XTpk2FgYGBCA0NVdkue6K+PNp2HIaHhwszMzOxevVqje8raSdtOwb/7ddffxXGxsYiMzNTI/tKmsUk6gv25s0bYWxsrFYX9uDBg0WjRo2U6jRu3Fjli+PfatWqJYYMGZLjss6dO4vWrVurlDOJ+vJo03F47NgxYWpqKpYtWyZ9R6jQ0qZj8H0bN24Uenp6IiMj4+M7Qp8dTyz/ghkZGeGHH37AmDFjYGBggDp16uDZs2e4deuWSre2m5sbNm7ciIMHD8LFxQW//vorLl68CBcXFwBAVFQUfv75Z3zzzTdwcHDAnTt3cPfuXXTv3h1v3rzB999/j/bt28PFxQWPHj3CxYsX0a5dO0X7t2/fxtu3b/H8+XOkpqYqJrbz9PT8XC8HFRBtOQ7Dw8PRqlUrDBs2DO3atUN8fDwAwMDAAFZWVp/3RaHPSluOwU2bNkFfXx+VK1eGoaEhLl++jHHjxqFTp07Q0+PPtVYq6CyOCpZcLhczZ84UTk5OQl9fX5QqVUrMnj1b5b+vtLQ00bNnT2FhYSEsLS3FwIEDxdixYxX/fcXHxwt/f39RvHhxYWBgIJycnMTkyZOFXC4X6enponPnzsLR0VEYGBgIBwcHMXjwYPHmzRtFHE5OTgKAyoO+DNpwHPbo0SPHY7BBgwYF86LQZ6UNx+Dvv/8uqlWrJszMzISpqamoUKGCmD17ttJ3JWkXmRBCFEj2RkRERFSI8eo8IiIiojxgEkVERESUB0yiiIiIiPKASRQRERFRHjCJIiIiIsoDJlFEREREecAkioiIiCgPmEQRUaEVHh4OmUyGFy9eFHQoRPQFYhJFRERElAdMooiIiIjygEkUERWohg0bYvDgwRg8eDAsLS1hbW2NiRMnIvuOVOnp6RgzZgwcHR1haGiIMmXKYN26dTm2lZSUhG+//RYlS5aEiYkJKleujC1btijV2b59OypXrgxjY2NYW1ujSZMmePXqFYB3w4M1atSAqakpLC0tUadOHTx8+DB/XwAiKrR4W2giKnC//PIL+vTpg/Pnz+PSpUvo168fnJyc0LdvX3Tv3h1nz57FTz/9hCpVqiAqKgqJiYk5tpOWlgYvLy/88MMPMDc3x969e9GtWze4urqiZs2aiIuLw7fffot58+ahTZs2SE1NxcmTJyGEQGZmJvz9/dG3b19s2bIFb9++xYULFyCTyT7zq0FEhQVvQExEBaphw4ZISEjArVu3FAnL2LFjsWvXLuzcuRPlypVDWFgYmjRporJueHg4fHx88M8//8DS0jLH9lu1agV3d3fMnz8fERER8PLyQnR0NJycnJTqPX/+HNbW1ggPD0eDBg00vp9E9N/D4TwiKnC1atVS6vHx9vbGvXv3cOXKFejq6qqd1MjlcsyaNQseHh6wtraGmZkZDh06hJiYGABAlSpV0LhxY1SuXBkdOnTAmjVr8M8//wAArKys0LNnTzRr1gytW7fGkiVLEBcXp/mdJaL/DCZRRKS1jIyMJNVfsGABFi1ahDFjxuDo0aO4evUqmjVrhrdv3wIAdHV1ERYWhv3796NChQpYunQpypUrh6ioKADAhg0bcPbsWdSuXRshISEoW7Yszp07p/H9IqL/BiZRRFTg3k9Uzp07hzJlyqBKlSrIysrC8ePH1Wrn5MmT8PPzQ9euXVGlShW4urri3r17SnVkMhnq1KmDadOm4cqVKzAwMMCOHTsUy6tWrYpx48bhzJkzqFSpEjZv3vzpO0hE/0lMooiowMXGxmLkyJG4c+cOtmzZgqVLl2LYsGFwdnZGjx490Lt3b+zcuRNRUVEIDw/H1q1bc2zHzc0NYWFhOHPmDCIjI9G/f3/Ex8crlp8/fx6zZ8/GpUuXEBMTg9DQUDx79gzu7u6IiorCuHHjcPbsWTx8+BCHDh3C3bt34e7u/rleBiIqZHh1HhEVuO7du+PNmzeoUaMGdHV1MWTIEPTr1w8AsHLlSowfPx6BgYFISkpCqVKlMH78+BzbmTRpEqKiotCsWTOYmJigX79+8Pf3R3JyMgDA3NwcJ06cwOLFi5GSkgInJycsWLAALVq0wNOnT/HXX3/hl19+QVJSEooXL47Bgwejf//+n+11IKLChVfnEVGBatiwITw9PbF48eKCDoWISBIO5xERERHlAZMoIiIiojzgcB4RERFRHrAnioiIiCgPmEQRERER5QGTKCIiIqI8YBJFRERElAdMooiIiIjygEkUERERUR4wiSIiIiLKAyZRRERERHnAJIqIiIgoD/4PJhs+5baHsTMAAAAASUVORK5CYII=\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "stackedbarplot(x_data = [\"class1\",\"class2\",\"class3\"]\n",
+ " , y_data_list = [data_df[\"survived_prop\"], data_df[\"unsurvived_prop\"]]\n",
+ " , y_data_names = ['Survived', 'Unsurvived']\n",
+ " , colors = ['#539caf', '#7663b0']\n",
+ " , x_label = \"pclass\"\n",
+ " , y_label = 'Proportion of survived/unsurvived by Pclass(1,2,3)'\n",
+ " , title = 'Proportion of survived/unsurvived by Pclass(1,2,3)')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "0bc62adf-193e-415c-bd9d-42b3432a8643",
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 133,
+ "id": "62ad1d42-2779-4354-802c-3653ac65f079",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "data1 = data.groupby(\"sex\")[\"survived\"].sum()\n",
+ "\n",
+ "data2= data.groupby(\"sex\")[\"pclass\"].count()\n",
+ "\n",
+ "data_df = pd.concat([data1, data2], axis=1, keys=[\"survived\",\"total\"])\n",
+ "\n",
+ "data_df[\"unsurvived\"] = data_df[\"total\"] - data_df[\"survived\"]\n",
+ "\n",
+ "data_df[\"survived_prop\"] = data_df[\"survived\"]/data_df[\"total\"]\n",
+ "\n",
+ "data_df[\"unsurvived_prop\"] = data_df[\"unsurvived\"]/data_df[\"total\"]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 137,
+ "id": "8e1aaed2-6465-4ccd-a355-3645f0e3c117",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " survived \n",
+ " total \n",
+ " unsurvived \n",
+ " survived_prop \n",
+ " unsurvived_prop \n",
+ " \n",
+ " \n",
+ " sex \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " female \n",
+ " 233 \n",
+ " 314 \n",
+ " 81 \n",
+ " 0.742038 \n",
+ " 0.257962 \n",
+ " \n",
+ " \n",
+ " male \n",
+ " 109 \n",
+ " 577 \n",
+ " 468 \n",
+ " 0.188908 \n",
+ " 0.811092 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " survived total unsurvived survived_prop unsurvived_prop\n",
+ "sex \n",
+ "female 233 314 81 0.742038 0.257962\n",
+ "male 109 577 468 0.188908 0.811092"
+ ]
+ },
+ "execution_count": 137,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "data_df"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 136,
+ "id": "f5931df5-7705-4865-9296-68398e6e0ae1",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "stackedbarplot(x_data = [\"female\",\"male\"]\n",
+ " , y_data_list = [data_df[\"survived_prop\"], data_df[\"unsurvived_prop\"]]\n",
+ " , y_data_names = ['Survived', 'Unsurvived']\n",
+ " , colors = ['#539caf', '#7663b0']\n",
+ " , x_label = \"pclass\"\n",
+ " , y_label = 'Proportion of survived/unsurvived by sex'\n",
+ " , title = 'Proportion of survived/unsurvived by sex')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "ce087023-153c-4531-bbd5-b625753019b6",
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "4284a40c-e56e-4098-9306-6d43ec00bdb9",
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 21,
+ "id": "fb9c51de-39f4-432a-b877-74f5a31f5256",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ " survived unservived total survived_prop unsurvived_prop\n",
+ "sex \n",
+ "female 233 314 314 0.742038 1.0\n",
+ "male 109 577 577 0.188908 1.0\n"
+ ]
+ },
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "08f88221-ed64-4529-a709-9c5cb0f1c54b",
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 143,
+ "id": "5ff83825-c30c-4561-80c3-ad74a208ab95",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def plot_survived_vs_feature(feature): \n",
+ " plt.boxplot([data.loc[data[\"survived\"] == i,feature] for i in set(data[\"survived\"])],\n",
+ " labels=[i for i in set(data[\"survived\"])],\n",
+ " patch_artist=True,\n",
+ " boxprops=dict(color='blue')\n",
+ " )\n",
+ " plt.xlabel(\"survived\")\n",
+ " plt.ylabel(feature)\n",
+ " plt.title(\"The box with 2 teatures \"+\" for \"+ feature)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 144,
+ "id": "8b67fef7-81ff-416f-be5a-e30635f9b515",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "plot_survived_vs_feature(\"fare\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "a11ec0f6-b1f5-4ec7-bd24-73de959e7f74",
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "a9e6cb91-876d-4f2b-83bd-69b7d2fa08be",
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 22,
+ "id": "6cdf21fe-898f-450d-b69a-107c481bc8eb",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "3ad17db6-8e44-497c-90b9-e9f4f9e38e8e",
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "6db68cc5-3813-4099-8af4-f45cc2efe161",
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 152,
+ "id": "d9c75118-e39c-4948-b51e-e64394096df5",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "#age 中有nan,因此删除nan行\n",
+ "data = data.dropna(subset=['age'])\n",
+ "plot_survived_vs_feature(\"age\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "a315af12-89de-4bba-81a2-7820e3985258",
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 23,
+ "id": "12dd5a11-623a-4c05-927e-355508daf357",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "745e2e00-98f8-4001-b630-a52c54527838",
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 153,
+ "id": "d0265ec9-b07e-4130-8342-7af34e887797",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " survived \n",
+ " pclass \n",
+ " sex \n",
+ " age \n",
+ " sibsp \n",
+ " parch \n",
+ " fare \n",
+ " embarked \n",
+ " class \n",
+ " who \n",
+ " adult_male \n",
+ " deck \n",
+ " embark_town \n",
+ " alive \n",
+ " alone \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 0 \n",
+ " 3 \n",
+ " male \n",
+ " 22.0 \n",
+ " 1 \n",
+ " 0 \n",
+ " 7.2500 \n",
+ " S \n",
+ " Third \n",
+ " man \n",
+ " True \n",
+ " NaN \n",
+ " Southampton \n",
+ " no \n",
+ " False \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 1 \n",
+ " 1 \n",
+ " female \n",
+ " 38.0 \n",
+ " 1 \n",
+ " 0 \n",
+ " 71.2833 \n",
+ " C \n",
+ " First \n",
+ " woman \n",
+ " False \n",
+ " C \n",
+ " Cherbourg \n",
+ " yes \n",
+ " False \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 1 \n",
+ " 3 \n",
+ " female \n",
+ " 26.0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 7.9250 \n",
+ " S \n",
+ " Third \n",
+ " woman \n",
+ " False \n",
+ " NaN \n",
+ " Southampton \n",
+ " yes \n",
+ " True \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 1 \n",
+ " 1 \n",
+ " female \n",
+ " 35.0 \n",
+ " 1 \n",
+ " 0 \n",
+ " 53.1000 \n",
+ " S \n",
+ " First \n",
+ " woman \n",
+ " False \n",
+ " C \n",
+ " Southampton \n",
+ " yes \n",
+ " False \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " 0 \n",
+ " 3 \n",
+ " male \n",
+ " 35.0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 8.0500 \n",
+ " S \n",
+ " Third \n",
+ " man \n",
+ " True \n",
+ " NaN \n",
+ " Southampton \n",
+ " no \n",
+ " True \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " survived pclass sex age sibsp parch fare embarked class \\\n",
+ "0 0 3 male 22.0 1 0 7.2500 S Third \n",
+ "1 1 1 female 38.0 1 0 71.2833 C First \n",
+ "2 1 3 female 26.0 0 0 7.9250 S Third \n",
+ "3 1 1 female 35.0 1 0 53.1000 S First \n",
+ "4 0 3 male 35.0 0 0 8.0500 S Third \n",
+ "\n",
+ " who adult_male deck embark_town alive alone \n",
+ "0 man True NaN Southampton no False \n",
+ "1 woman False C Cherbourg yes False \n",
+ "2 woman False NaN Southampton yes True \n",
+ "3 woman False C Southampton yes False \n",
+ "4 man True NaN Southampton no True "
+ ]
+ },
+ "execution_count": 153,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "data = sns.load_dataset(\"titanic\")\n",
+ "data.head()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 182,
+ "id": "c12388be-bddf-40d6-a91a-2ec111d975d6",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "df1 = data[data[\"pclass\"] == 1]\n",
+ "df2 = data[data[\"pclass\"] == 2]\n",
+ "df3 = data[data[\"pclass\"] == 3]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 183,
+ "id": "42155529-a456-45a7-9196-783c5c96086f",
+ "metadata": {
+ "scrolled": true
+ },
+ "outputs": [],
+ "source": [
+ "df1 = df1.groupby(\"embarked\").count()[\"pclass\"]\n",
+ "df2 = df2.groupby(\"embarked\").count()[\"pclass\"]\n",
+ "df3 = df3.groupby(\"embarked\").count()[\"pclass\"]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 194,
+ "id": "ab81d4a4-f290-42df-9e83-50822ba21a8c",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "embarked\n",
+ "C 66\n",
+ "Q 72\n",
+ "S 353\n",
+ "Name: pclass, dtype: int64"
+ ]
+ },
+ "execution_count": 194,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "df3"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 196,
+ "id": "0b5364cc-b391-47ef-9fa2-de4565049b00",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "categories = [1,2,3]\n",
+ "values1 = df1\n",
+ "values2 = df2\n",
+ "values3 = df3\n",
+ "# 设置条形图的宽度\n",
+ "bar_width = 0.2\n",
+ "\n",
+ "# 计算并列的条形图的横坐标位置\n",
+ "x_data1 = np.arange(len(categories))\n",
+ "x_data2 = x_data1 + bar_width\n",
+ "x_data3 = x_data2 + bar_width\n",
+ "# 创建并列的条形图\n",
+ "\n",
+ "plt.bar(x_data1, values1, width=bar_width, color='#539caf', label='C')\n",
+ "plt.bar(x_data2, values2, width=bar_width, color='#7663b0', label='Q')\n",
+ "plt.bar(x_data3, values3, width=bar_width, color='#1663b0', label='S')\n",
+ "\n",
+ "# 添加标签和标题\n",
+ "plt.xlabel('Embarked')\n",
+ "plt.ylabel('pclass')\n",
+ "plt.title('Number of people for different kind of pclass by Embarked')\n",
+ "plt.xticks(x_data1 + bar_width, categories) # 设置横坐标刻度位置\n",
+ "\n",
+ "# 添加图例\n",
+ "plt.legend()\n",
+ "\n",
+ "# 显示图形\n",
+ "plt.show()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "58e5cc86-6d81-4970-9b24-91371f623e02",
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "f0983a81-48ce-42db-b6ce-13521863ee67",
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 25,
+ "id": "7c1d49c4-9a90-420c-a477-cbbb6997e40b",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "pclass 1 2 3\n",
+ "embarked \n",
+ "C 85 17 66\n",
+ "Q 2 3 72\n",
+ "S 127 164 353\n"
+ ]
+ },
+ {
+ "data": {
+ "image/png": "iVBORw0KGgoAAAANSUhEUgAAAY4AAAEWCAYAAABxMXBSAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAIABJREFUeJzt3XmYHFW5x/HvLyEmAYIIiRhIIGFVEESIgIqyuSAuRC6rgIBgUEHgggouV0HgildZxAUJRtmJkTUgKogBRdkSZAuLRAhkSAghbAmSAOG9f5zToTKZ6ameme7pyfw+z9PPdJ3a3q7u6bfrnFOnFBGYmZmV1a+nAzAzs97FicPMzGrixGFmZjVx4jAzs5o4cZiZWU2cOMzMrCZOHJmk8yWd0kP7lqTfSHpe0p09EUNbJJ0o6eJOrruJpH9KWiDpqDrEtqOklsL0dEk75ufLHU9JX5Y0V9JCSWt2dzxdJSkkbdjOvIWS1u/kdm+WdFgn163LMav2WrtbVz7D9dTd3zfd+TolzZT0kWrLrNQdO6oHSTOBwcD6EfFyLjsMOCAiduzB0Ophe+CjwIjKa10BfAO4OSLe24idRcRmhclljqekAcAZwHYRcW8j4imSdD7QEhHf6cz6EbFq90bUsZ4+Zs0ofyetBSwpFJ8fEUf2TEQ9p9nPOFYCju7pIGolqX+Nq6wHzFyBkgak1zS9MytK6uoPmtbHcy1gUBfiqfX9XBF06ZitwD4dEasWHj2eNLrh/6VmzZ44fgR8TdLqrWdIGpVPeVcqlC09LZd0sKS/SzpT0guSHpP0gVw+S9Izkg5qtdmhkm7M1Su3SFqvsO135nnPSXpE0t6FeedLOkfS9ZJeBnZqI961JU3O68+Q9MVcfijwK+D9uUrgpDbWrbyWn0p6UdLDknYpzH+rpAmS5kh6StIplS+7Euu2GVdbJG0n6R/5eN5bqRpqY7m/5GPws/yaNs4xXihpnqQnJH1HUr823qvngBPb2ObgfJyfl/Qg8L5W82dK+kgbx/My4JG82As5tprfT0kDJf1Y0pO5+uaXkgbn5XeU1CLpuPy5miPpkDxvHLA/8I0cz7XtHd/C/rfPn9Gd8vTSqp0c288l/T5/Tu+QtEFh3Y/m9/hFST8DVGU/AyWdJWl2fpyVyzZu65i1Wrfy/zcurztH0nGF+f0lfUvSv3Oc0ySNbGM7n1Sq0nwpv+YTC/MGSbpY0vz8mbtL0lp53sFK/9MLJD0uaf8qh3SQpN/mZe+W9J68ja9LuqJVPD+VdFaVbbVJ3f9985O83kv52H2oMO9ESZfnY/MScHCrWAZIukzSFZLeIqmfpBPyezFf0iRJaxSWP1Dpf3K+pG+XesER0ZQPYCbwEeBK4JRcdhip+gNgFBDASoV1bgYOy88PBl4HDgH6A6cATwI/BwYCHwMWAKvm5c/P0x/O838C3JrnrQLMyttaCdgKeBbYrLDui8AHScl4UBuv5xbgF6RfcVsC84BdCrHeWuVYVF7LfwMDgH3y/tbI868Gzs1xvh24Ezi85LrV4joRuDg/XweYD+yWX+NH8/SwdmJe+l7k6QuBa4Ah+b37F3Boqxi/mo/v4Da2dxrwN2ANYCTwAKn6Z5nPS1vHk1aflc68n8BZwOS8/yHAtcAP8vI75vi/n4/xbsB/gLcVtndKB5/3ADYEPp5j26b1vMK2ngO2ybFfAkzM84YCLwF75jj+O8d1WDv7/D5wO+kzMwz4B3Bye/9frdatzL8sH8/NSZ+dynvwdeB+YBNS8noPsGYbr2fHvG4/YAtgLjA2zzs8H+eVSf/DWwOr5f29BGySlxteee/aiPNE4LXCMfka8Hh+Phx4GVg9L7sS8AywdbXvpA7+R7v8fZPnHwCsmWM6Dnia/L1SeE1j83EbnMsuzs9/n7ffPy9/TH6fR+R9nQtcludtCiwsxHFGfh1tvs6l8TUyGdTy4M3E8W7SP/Ewak8cjxbmbZ6XX6tQNh/YsvBGTizMW5VUlzmS9GX7t1bxnQt8r7DuhVVey8i8rSGFsh+Q6kcrsXaUOGYDKpTdCRxIqlJYTOHLFtgPmFJi3Y7iOpE3E8fxwEWt4voTcFA7MRffi/45xk0L8w8vvJcHA0928Hl4DNi1MD2OzieOmt5P0hffy8AGhbL3A4/n5zsCr7DsZ/EZUvtAZXtlEsc3gSeAzduYV0wcvyrM2w14OD//PHB7q7hbaD9x/BvYrTD9cVIV33LHrI11K/PfWSj7P2BCfv4IsHuV17phO/POAs7Mz79ASmZbtFpmFeAF4L9o40dGq2VPbHVM+gFzgA/l6T8AX8zPPwU8WGVbM0lfsi8UHpV1D6abvm/a2ffzwHsKr+mvbbzOyaQfgmez7P/7Q+Qfg3l6OCnxrAR8t1UcqwCv0kHiaNrG8YqIeEDSdcAJpANQi7mF56/k7bUuKzY8zirsd6FStcnapDrzbSW9UFh2JeCittZtw9rAcxGxoFD2BDCmzIvInor8zhbWr8Q2AJgjLa2V6NcqnvbWrSWu9YC9JH26UDYAmFIi9qHAW/K2i/tZpzBd7fiRYy0u80R7C5ZQ6/s5jPSrd1rhGIuUECvmR8Trhen/sOxnq4xjSAnr/g6We7qd/SxzjCIiJHX0uWz9nqxdPlxg+fdk8/x8JCkxVSVpW9LZ5LtJn5GBwO/y7IvydiYqVVdfDHw7UoeHfUhnDxMk/R04LiIe7ijGiHhDqTde5XVeAHwZOI/0K/+i5VdfxtiI+HM787rr+2ZWrvY7LE8H6UxraFvrFmxH+p/cr9X/+3rAVZLeKJQtIf3obP2ZeVnS/HZe31LN3sZR8T3giyz7RVNp+Fy5UPaOLu5naR2spFVJ1RKzSQf2lohYvfBYNSK+XFg3aN9sYA1JQwpl6wJP1RDbOip8a+X1K7EtBoYWYlstlu1l1N66tcQ1i3TGUTwGq0TEaSVif5b0C2e9Kvupdvwg/Uos1pGvW2K/7an1/XyW9E+/WWH5t0b53k4dvbaKvYCxko4puXxryxyj/J4v165QMJvl35PZNe6z9XtSWX8WsMHyiy/nUtIv5ZER8Vbgl+R2mYh4LSJOiohNgQ+Qzgg+n+f9KSI+Svr1/DDpi7/DGJXa1UYU4rwa2ELSu/P2LykRc3dp8/smt2ccD+xNqu5cnVTrUvwfbuszdQOpxuCmSltQNgv4RKvP+6CIeIrlPzMrk6rIquoViSMiZgC/BY4qlM0jffEckBvivkC5D2o1uyk1TL4FOBm4IyJmAdcBG+dGpAH58T5J7yoZ/yzSKfcPcoPfFsCh1PYhfTtwVN73XsC7gOsjYg7pA3O6pNVyQ9gGknYosW4tcV0MfFrSx/PxHqTUKDyixOtfAkwCTpU0JDcCHpu3WdYk4JuS3pb3+dUa1m2tpvczIt4gfTGdKentAJLWkfTxkvubC5S5DmM2sAvpvfpKyW0X/R7YTNIeSp1GjqL6j6nLgO9IGiZpKKnaotZrAf5H0sqSNiPV7/82l/8KOFnSRkq2UNvXggwhnfUukrQN8LnKDEk7SdpcqaPHS6QfH0skrSXpM5JWIf1oWsiyXWRb27pwTI7J69wOEBGLgMtJCezOiHiyxtffFe193wwhtTPMA1aS9F3SGUeHIuL/SK/lpvyeQkrGp1Ya3/P7vXuedznwqUIc36dEXugViSP7Pqn+reiLpEa4+cBmpC/BrriUdHbzHKkhbn+AXJXzMWBf0j/308APSafVZe1HqheeDVxFqk+/sYb17wA2Iv36PRXYMyIqp5SfJ53mP0iqC72c9EuszLql4sof6N2Bb5E+0LNIx77sZ+irpLPEx4BbScf61yXXBTiJVBXyOClRdlSl0K5Ovp/HAzOA23NPlj+TGn7LmABsqtTb5uoOYnuSlDyOV40X7kXEs6SzltNI/xMbAX+vssopwFTgPlJD9t25rBa3kI7LTcCPI+KGXH4GKdnfQPrSn0BquG3tK8D3JS0gJa5JhXnvIH2WXyJVU99CSmz9SA3Gs0n/qzvk7bTnGlK71vOktr09IuK1wvwLSFVsZT5T1yr1jqs8riqxTnva/L4htR3+gdSB5AlgER1X5S4VESeTzqT+nHtP/YR0VndDPs63A9vmZacDR+RY5pCOUUtb2y3SslVh1owkHUxq4Ny+keuatUfSKHLvpFZtO72OpHVJ1V3viIiXejqe3qA3nXGYmXWr3OZxLKlnkZNGSU3fq8rMrB5yG8lcUnXQrj0cTq/iqiozM6tJh1VVkvaqdNdUGibiSklb1T80MzNrRh2ecUi6LyK2kLQ9qY/wj4FvRcS2jQiwmqFDh8aoUaN6Ogwzs15l2rRpz0bEsM6uX6aNo9I/+pPAORFxjQoDkfWkUaNGMXXq1J4Ow8ysV5HUlZEXSvWqekrSuaSrGK+XNLDkemZmtgIqkwD2Jl2QsmtEvEC6LP7rdY3KzMyaVpmqquHA7yNisdL9F7YgDZFtZmZ9UJnEcQUwRulGMhNIl65fShrOuem89tprtLS0sGjRop4OpUODBg1ixIgRDBgwoKdDMTMrrUzieCMiXpe0B3BWRPxU0j/rHVhntbS0MGTIEEaNGsWyA8I2l4hg/vz5tLS0MHr06J4Ox8ystDJtHK9J2o80kN51uaxpfyIvWrSINddcs6mTBoAk1lxzzV5xZmRmVlQmcRxCutvZqRHxuKTR1D70ckM1e9Ko6C1xmpkVdZg4IuLBiDgqIi7L04+XuXlPvl/DnZLulTRd0km5/Hylm8vfkx9b5nJJOlvSDEn3+ep0M7Pm1GEbh6SNSFeMbwoMqpRHREc3plkM7JxviTgAuFXSH/K8r0fE5a2W/wTp/gEbkcaKPyf/7ZI9J1S9/UHNLj90bKnlnn76aY455hjuuusuBg4cyKhRozjrrLPYeOONuzUeM7NGK9M4/hvSzUbOBHYiVV11WMeS73m7ME8OyI9q45vsTrrfcpBulrO6pOH5Dne9SkTw2c9+loMOOoiJEycCcM899zB37lwnDrMV0NlDuvcHahlHLSj3I7YeyrRxDI6Im0jjWj0REScCO5fZeL7F6D3AM8CNEXFHnnVqro46M1+JDul+4sW7XLWw7D3GK9scJ2mqpKnz5s0rE0bDTZkyhQEDBvClL31padmWW27Jhz70oR6Mysyse5RJHIvyzU4elXSkpM+S7mHdoYhYEhFbkm4Ov43SDeG/CbwTeB/pKvTj8+JtncUsd4YSEeMjYkxEjBk2rNNjdNXVAw88wNZbb93TYZiZ1UWZxHEMsDLpxvdbk+7Ze1AtO8lDldxMGrZkTiSLSdVg2+TFWoCRhdVGkO4pbGZmTaRMr6q7ImJhRLRExCERsUdE3N7RepKGSVo9Px8MfAR4WNLwXCZgLPBAXmUy8Pncu2o74MXe2L4BsNlmmzFt2rSeDsPMrC7aTRySrpU0ub1HiW0PB6ZIug+4i9TGcR1wiaT7gfuBocApefnrgceAGcB5wFe68Lp61M4778zixYs577zzlpbddddd3HLLLT0YlZlZ96jWq+rHXdlwRNwHvLeN8jYb1nNvqiO6ss+2lO0+250kcdVVV3HMMcdw2mmnMWjQoKXdcc3Mert2E0dE3AJLb+j+SkS8kaf7AwPbW8+Stddem0mTJvV0GGZm3a5M4/hNpMbxisHAn+sTjpmZNbsyiWNQRFQu5CM/X7nK8mZmtgIrkzheLo4bJWlr4JX6hWRmZs2szJAjxwC/k1S5pmI4sE/9QjIzs2bWYeKIiLskvRPYhHR198MR8VrdIzMzs6ZUZnTcQaRrKrYnDQHyN0m/jAjfgcjMrA8qU1V1IbAA+Gme3g+4CNirXkF1p+MPu6Zbt/fDX+1earmWlhaOOOIIHnzwQZYsWcJuu+3G6aefzsCB7slsZr1bmcbxTSLi0IiYkh/jAI8NXkVEsMceezB27FgeffRRHn30UV555RW+8Y1v9HRoZmZdViZx/DOPHQWApG2Bv9cvpN7vL3/5C4MGDeKQQw4BoH///px55plceOGFLFy4sIO1zcyaW5nEsS3wD0kzJc0EbgN2kHR/HofKWpk+ffpyw6qvttpqjBo1ihkzZvRQVGZm3aNMG8eudY9iBRMRpMF/ly83M+vtynTHfaIRgaxINttsM6644oplyl566SXmzp3LJpts0kNRmZl1jzJVVVajXXbZhf/85z9ceOGFACxZsoTjjjuOI488ksGDB/dwdGZmXdPuGYekgfkufb1a2e6z3akyrPoRRxzBySefzLx589hnn3349re/3fBYzMy6W7UzjtsAJF3UoFhWKCNHjmTy5Mk8+uijXH/99fzxj3/0XQHNbIVQrY3jLZIOAj4gaY/WMyPiyvqFtWL5wAc+wBNPuKnIzFYM1RLHl4D9gdWBT7eaF4ATh5lZH1TtDoC3ArdKmhoRExoYk5mZNbEy13FcJOko4MN5+hbglx4h18ysbyqTOH4BDMh/AQ4EzgEOq1dQZmbWvMokjvdFxHsK03+RdG9HK+Xh2P8KDMz7uTwividpNDARWAO4GzgwIl6VNJA0Eu/WwHxgn4iYWdOrMTOzuiuTOJZI2iAi/g0gaX1gSYn1FgM7R8RCSQNI7SV/AI4FzoyIiZJ+CRxKOoM5FHg+IjaUtC/wQ7rhToNnD7m6q5tYxlELxpZa7tRTT+XSSy+lf//+9OvXj3PPPZdtt922W2MxM+sJZRLH14Epkh4j3QFwPeCQjlaKNDBTZSjYAfkRwM7A53L5BcCJpMSxe34OcDnwM0mKXjjA02233cZ1113H3XffzcCBA3n22Wd59dVXezosM7NuUWasqpskbcSyt44tdUW5pP7ANGBD4OfAv4EXIuL1vEgLsE5+vg4wK+/zdUkvAmsCz7ba5jhgHMC6665bJoyGmzNnDkOHDl1606ahQ4f2cERmZt2n1FhVEbE4Iu6LiHtrGYYkIpZExJbACGAb4F1tLZb/Lj+c7JvzitscHxFjImLMsGHDyobSUB/72MeYNWsWG2+8MV/5yle45ZZbejokM7Nu05BBDiPiBeBmYDtgdUmVM50RwOz8vAUYCZDnvxV4rhHxdbdVV12VadOmMX78eIYNG8Y+++zD+eef39NhmZl1i7olDknDJK2enw8GPgI8BEwB9syLHQRUbgo+OU+T5/+lN7ZvVPTv358dd9yRk046iZ/97GfLDbNuZtZbVRsdd6tqK0bE3R1sezhwQW7n6AdMiojrJD0ITJR0CvBPoHJV+gTSxYYzSGca+5Z8DU3nkUceoV+/fmy00UYA3HPPPay33no9HJWZWfeo1jh+ev47CBgD3Etqh9gCuAPYvtqGI+I+4L1tlD9Gau9oXb4I2KtU1DUo2322Oy1cuJCvfvWrvPDCC6y00kpsuOGGjB8/vuFxmJnVQ7WxqnYCkDQRGBcR9+fpdwNfa0x4vdPWW2/NP/7xj54Ow8ysLsq0cbyzkjQAIuIBYMv6hWRmZs2szAWAD0n6FXAxqXvsAaRGbjMz64PKJI5DgC8DR+fpv5Ku9G5aEYHU1mUhzaUXdxozsz6szJXji4Az86PpDRo0iPnz57Pmmms2dfKICObPn8+gQYN6OhQzs5p0mDgkfZA0htR6xeUjYv36hdV5I0aMoKWlhXnz5vV0KB0aNGgQI0aM6OkwzMxqUqaqagLw36Qxp8qMitujBgwYwOjRo3s6DDOzFVaZxPFiRPyh7pGYmVmvUCZxTJH0I+BK0j02gFJXjpuZ2QqoTOKo3H1oTKGscl8NMzPrY8r0qtqpEYGYmVnvUG2QwwMi4mJJx7Y1PyLOqF9YZmbWrKqdcayS/w5pRCBmZtY7VBvk8Nz89KcRscwNlSS5v6uZWR9VZpDDayWtVpmQ9C7g2vqFZGZmzaxM4vhfUvJYVdLWwOWkgQ7NzKwPKtOr6veSBgA3kNo7xkbEo3WPzMzMmlK1XlU/JV2vUbEa8BjwVUlExFH1Ds7MzJpPtTOOqa2mp9UzEDMz6x2q9aq6oJGBmJlZ71CmcbxTJI2UNEXSQ5KmSzo6l58o6SlJ9+THboV1vilphqRHJH28XrGZmVnnlRmrqrNeB46LiLslDQGmSboxzzszIn5cXFjSpsC+wGbA2sCfJW0cEU0/lLuZWV/S4RmHpL3KlLUWEXMqI+hGxALSfcrXqbLK7sDEiFgcEY8DM4BtOtqPmZk1Vpmqqm+WLGuXpFHAe4E7ctGRku6T9GtJb8tl6wCzCqu10EaikTRO0lRJU3vDXf7MzFY01brjfgLYDVhH0tmFWauRqqFKkbQqcAVwTES8JOkc4GRSV9+TgdOBLwBt3SA8liuIGA+MBxgzZsxy883MrL6qtXHMJnXJ/QzLdsVdQLqVbIfyhYNXAJdExJUAETG3MP884Lo82QKMLKw+IsdgZmZNpFp33HuBeyVdGhGv1bphSSLdr/yh4hDskoZHxJw8+Vnggfx8MnCppDNIjeMbAXfWul8zM6uvMr2qtpF0IrBeXl5ARMT6Haz3QeBA4H5J9+SybwH7SdqSVA01EzictMHpkiYBD5Kqwo5wjyozs+ZTJnFMIFVNTQNKf5FHxK203W5xfZV1TgVOLbsPMzNrvDKJ48WI+EPdIzEzs16hTOKYIulHwJXA4kph5RoNMzPrW8okjm3z3zGFsgB27v5wzMys2ZW5H8dOjQjEzMx6hzJDjqwlaYKkP+TpTSUdWv/QzMysGZUZcuR84E+kaysA/gUcU6+AzMysuZVJHEMjYhLwBkBEvE4N3XLNzGzFUiZxvCxpTfK4UZK2A16sa1RmZta0yvSqOpY0HMgGkv4ODAP2rGtUZmbWtKomDkn9gEHADsAmpCvBH+nM2FVmZrZiqJo4IuINSadHxPuB6Q2KyczMmliZNo4bJP1XHu3WzMz6uLJtHKsASyS9wpuj465W18jMzKwplblyfEgjAjEzs96hzJXjknSApP/J0yMlbVP/0MzMrBmVaeP4BfB+4HN5eiHw87pFZGZmTa3U6LgRsZWkfwJExPOS3lLnuMzMrEmVOeN4TVJ/3rxyfBh5+BEzM+t7yiSOs4GrgLdLOhW4FfjfukZlZmZNq92qKkmjI+LxiLhE0jRgF1JX3LER8VDDIjQzs6ZS7YzjcgBJN0XEwxHx84j4WdmkkXtfTZH0kKTpko7O5WtIulHSo/nv23K5JJ0taYak+yRt1eVXZ2Zm3a5a43g/Sd8DNpZ0bOuZEXFGB9t+HTguIu6WNASYJulG4GDgpog4TdIJwAnA8cAngI3yY1vgHN68ba2ZmTWJamcc+wKLSMllSBuPqiJiTkTcnZ8vAB4C1gF2By7Ii10AjM3PdwcujOR2YHVJw2t+RWZmVlfVzjh2jYgfShoYEd/vyk4kjQLeC9wBrBURcyAlF0lvz4utA8wqrNaSy+a02tY4YBzAuuuu25WwzMysE6qdcRyS/46tskyHJK0KXAEcExEvVVu0jbJYriBifESMiYgxw4YN60poZmbWCdXOOB6SNBMYJum+QnllkMMtOtq4pAGkpHFJRFyZi+dKGp7PNoYDz+TyFmBkYfURwOySr8PMzBqk3cQREftJegfwJ+AztW44D8M+AXioVUP6ZOAg4LT895pC+ZGSJpIaxV+sVGmZmVnz6OhGTk8D7+nktj8IHAjcL+meXPYtUsKYJOlQ4ElgrzzvemA3YAbwH96sKjMzsyZS7QLASRGxt6T7WbatoVRVVUTcStvtFpAuJmy9fABHdByymZn1pGpnHEfnv59qRCBmZtY7VGvjqHSZfaJx4ZiZWbOrVlW1gDa6w1b41rFmZn1TtTOOIQCSvg88DVxEarPYnxJXjpuZ2YqpzLDqH4+IX0TEgoh4KSLOAf6r3oGZmVlzKpM4lkjaX1J/Sf0k7Q8sqXdgZmbWnMokjs8BewNz82Mv3rz/uJmZ9TEd3nM8ImaSRq41MzMrdcZhZma2lBOHmZnVpGriyI3hezcqGDMza35VE0dEvAEc2aBYzMysFyhTVXWjpK9JGilpjcqj7pGZmVlT6rBXFfCF/Lc4cm0A63d/OGZm1uzKdMcd3YhAzMysd+gwceTbv34Z+HAuuhk4NyJeq2NcZmbWpMpUVZ0DDAB+kacPzGWH1SsoM7POOv6wazpeqJut0/A99qwyieN9EVG8fexfJN1br4DMzKy5lR3kcIPKhKT18SCHZmZ9Vpkzjq8DUyQ9Rrofx3rAIXWNyszMmla1OwDuFRG/Ax4DNgI2ISWOhyNicYPiMzOzJlOtquqb+e8VEbE4Iu6LiHvLJg1Jv5b0jKQHCmUnSnpK0j35sVth3jclzZD0iKSPd+7lmJlZvVWrqpovaQowWtLk1jMj4jMdbPt84GfAha3Kz4yIHxcLJG0K7AtsBqwN/FnSxhHhthQzsyZTLXF8EtiKdK/x02vdcET8VdKokovvDkzMZzOPS5oBbAPcVut+zcysvtpNHBHxKnC7pA9ExLxu3OeRkj4PTAWOi4jnSd2gby8s00I7XaMljQPGAay77rrdGJaZmZXRYXfcbk4a5wAbAFsCc3jzTEZt7bqdeMZHxJiIGDNs2LBuDM3MzMpo6I2cImJuRCzJw7WfR6qOgnSGMbKw6AhgdiNjMzOzchqaOCQNL0x+Fqj0uJoM7CtpoKTRpO6/dzYyNjMzK6fMIIcbk6qY1oqId0vaAvhMRJzSwXqXATsCQyW1AN8DdpS0JakaaiZwOEBETJc0CXgQeB04wj2qzMyaU5krx88jXT1+LkBE3CfpUqBq4oiI/doonlBl+VOBU0vEY2ZmPahMVdXKEdG62uj1egRjZmbNr0zieDYPchgAkvYk9YgyM7M+qExV1RHAeOCdkp4CHgcOqGtUZmbWtMrcOvYx4COSVgH6RcSC+odlZmbNqtrouMe2Uw5ARJxRp5jMzKyJVTvjGNKwKMzMrNeoNlbVSY0MxMzMeocyFwCOAH4KfJDUs+pW4OiIaKlzbGbWy+054eqG73ODNoe+s+5Upjvub0hDgqxNGrH22lxmZmZ9UJnEMSwifhMRr+fH+YCHpTUz66PKXgB4gKT++XEAML/egZmZWXMqkzi+AOwNPJ0fe+YyMzPrg8pcAPgk0NH9xc3MrI/o8IxD0vqSrpU0T9Izkq6RtH4jgjMzs+ZTpqrqUmASMJzUs+p3wGX1DMrMzJpXmcRFsjf7AAALT0lEQVShiLio0KvqYtq5H7iZma34yoyOO0XSCcBEUsLYB/i9pDUAIuK5OsZnZmZNpkzi2Cf/PbxV+RdIicTtHWZmfUiZXlWjGxGImZn1DmV6Va0s6TuSxufpjSR9qv6hmZlZMyo7VtWrwAfydAtwSkcrSfp17r77QKFsDUk3Sno0/31bLpeksyXNkHSfpK068VrMzKwByiSODSLi/4DXACLiFSg1/OT5wK6tyk4AboqIjYCb8jTAJ4CN8mMccE6J7ZuZWQ8okzhelTSY3AVX0gbA4o5Wioi/Aq17XO0OXJCfXwCMLZRfGMntwOqShpeIzczMGqxM4vge8EdgpKRLSGcK3+jk/taKiDkA+e/bc/k6wKzCci25bDmSxkmaKmnqvHnzOhmGmZl1VpleVTdKuhvYjlRFdXREPNvNcbRV9dXmRYYRMR4YDzBmzBhfiGhm1mBlruMA2AHYnvRlPgC4qpP7mytpeETMyVVRz+TyFmBkYbkRwOxO7sPMzOqoTHfcXwBfAu4HHgAOl/TzTu5vMnBQfn4QcE2h/PO5d9V2wIuVKi0zM2suZc44dgDeHRGVxvELSEmkKkmXATsCQyW1kNpKTgMmSToUeBLYKy9+PbAbMAP4D3BIbS/DzMwapUzieARYF3giT48E7utopYjYr51Zu7SxbABHlIjFzMx6WJnEsSbwkKQ78/T7gNskTQaICN/kycysDymTOL5b9yjMzKzXKNMd95ZGBGJmZr1D2e64K5w9J1zd8H1efujYjhcyM2tyZa4cNzMzW6rdxCHppvz3h40Lx8zMml21qqrhknYAPiNpIq2GBYmIu+samZmZNaVqieO7pGHPRwBntJoXwM71CsrMzJpXu4kjIi4HLpf0PxFxcgNjMjOzJlamO+7Jkj4DfDgX3RwR19U3LDMza1ZlBjn8AXA08GB+HJ3LzMysDypzHccngS0j4g1YOsjhP4Fv1jMws77q7CGNv8boqAW+xsjKK3sdx+qF52+tRyBmZtY7lDnj+AHwT0lTSF1yP4zPNszM+qwyjeOXSbqZNCqugOMj4ul6B2bdw9UeZtbdSo1Vle/GN7nOsZiZWS/gsarMzKwmThxmZlaTqolDUj9JDzQqGDMza35VE0e+duNeSes2KB4zM2tyZRrHhwPT8z3HX64UduVe45JmAguAJcDrETFG0hrAb4FRwExg74h4vrP7MDOz+iiTOE6q0753iohnC9MnADdFxGmSTsjTx9dp32Zm1kkdNo7ne47PBAbk53cB9bgXx+7ABfn5BYAvBjAza0JlBjn8InA5cG4uWgfo6lVlAdwgaZqkcblsrXy9SOW6kbe3E884SVMlTZ03b14XwzAzs1qVqao6AtgGuAMgIh6V1OaXeg0+GBGz83ZulPRw2RUjYjwwHmDMmDHRxTjMzKxGZa7jWBwRr1YmJK1EOmPotIiYnf8+A1xFSkxzJQ3P+xgOPNOVfZiZWX2USRy3SPoWMFjSR4HfAdd2doeSVpE0pPIc+BjwAGlIk4PyYgcB13R2H2ZmVj9lqqpOAA4F7gcOB64HftWFfa4FXCWpsv9LI+KPku4CJkk6FHgS2KsL+zAzszopMzruG/nmTXeQqqgeiYhOV1VFxGPAe9oonw/s0tntmplZY3SYOCR9Evgl8G/SsOqjJR0eEX+od3BmZtZ8ylRVnU66WG8GgKQNgN8DThxmZn1QmcbxZypJI3sM93gyM+uz2j3jkLRHfjpd0vXAJFIbx16kq8fNzKwPqlZV9enC87nADvn5POBtdYvIzMyaWruJIyIOaWQgZmbWO5TpVTUa+CppuPOly3dlWHUzM+u9yvSquhqYQLpa/I36hmNmZs2uTOJYFBFn1z0SMzPrFcokjp9I+h5wA7C4UhgR9bgnh5mZNbkyiWNz4EBgZ96sqoo8bWZmfUyZxPFZYP3i0OpmZtZ3lbly/F5g9XoHYmZmvUOZM461gIfzsOfFNg53xzUz64PKJI7v1T2KPuL4wxp/b6p1Gr5HM1vRlbkfxy2NCMTMzHqHMleOL+DNe4y/BRgAvBwRq9UzMDMza05lzjiGFKcljQW2qVtEZmbW1Mr0qlpGRFyNr+EwM+uzylRV7VGY7AeM4c2qKzMz62PK9Koq3pfjdWAmsHtdojEzs6ZXpo2jofflkLQr8BOgP/CriDitkfs3M7Pqqt069rtV1ouIOLm7g5HUH/g58FGgBbhL0uSIeLC792VWhq+9MVtetTOOl9soWwU4FFgT6PbEQeqtNSMiHgOQNJFULebEYew54eqG73MD1PB9mjW7areOPb3yXNIQ4GjgEGAicHp763XROsCswnQLsG1xAUnjgHF5cqGkR+oUS7MZCjzb00GUcbS/ayv8nvUuveb9gi6/Z+t1ZeWqbRyS1gCOBfYHLgC2iojnu7LDDrR1KJbpwRUR44HxdYyhKUmaGhFjejoOK8/vWe/i96u8am0cPwL2IH1Jbx4RCxsQTwswsjA9ApjdgP2amVlJ1S4APA5YG/gOMFvSS/mxQNJLdYrnLmAjSaMlvQXYF5hcp32ZmVknVGvjqPmq8q6KiNclHQn8idQd99cRMb3RcTSpPlc9twLwe9a7+P0qSRG+CNzMzMpr+FmFmZn1bk4cZmZWEyeOJifp15KekfRAT8diHZM0UtIUSQ9Jmi7p6J6OyaqTNEjSnZLuze/ZST0dU7NzG0eTk/RhYCFwYUS8u6fjseokDQeGR8Td+cLZacBYD5vTvCQJWCUiFkoaANwKHB0Rt/dwaE3LZxxNLiL+CjzX03FYORExJyLuzs8XAA/h4aeaWiSV69QG5Id/UVfhxGFWJ5JGAe8F7ujZSKwjkvpLugd4BrgxIvyeVeHEYVYHklYFrgCOiYh6XTBr3SQilkTElqTRKraR5GrhKpw4zLpZrie/ArgkIq7s6XisvIh4AbgZ2LWHQ2lqThxm3Sg3tE4AHoqIM3o6HuuYpGGSVs/PBwMfAR7u2aiamxNHk5N0GXAbsImkFkmH9nRMVtUHgQOBnSXdkx+79XRQVtVwYIqk+0jj5d0YEdf1cExNzd1xzcysJj7jMDOzmjhxmJlZTZw4zMysJk4cZmZWEycOMzOriROH9UmSlhS6y94j6YQa1t1RUpe6a0q6WdKYTq7b5f2bdUW7t441W8G9koeYaDhJ/Xtiv2bdxWccZgWSZkr6X0m3SZoqaStJf5L0b0lfKiy6mqSrJD0o6ZeS+uX1z8nrLXNfh7zd70q6FdirUN5P0gWSTsnTH8v7vlvS7/KYV0jaVdLDef09GnIwzNrhxGF91eBWVVX7FObNioj3A38Dzgf2BLYDvl9YZhvgOGBzYAPe/DL/dkSMAbYAdpC0RWGdRRGxfURMzNMrAZcA/4qI70gaCnwH+EhEbAVMBY6VNAg4D/g08CHgHd10DMw6xVVV1ldVq6qanP/eD6ya76uxQNKiyphGwJ0R8RgsHRZme+ByYG9J40j/W8OBTYH78jq/bbWfc4FJEXFqnt4uL//3NOQVbyENN/NO4PGIeDTv72JgXOdetlnXOXGYLW9x/vtG4XlluvI/03qsnpA0Gvga8L6IeF7S+cCgwjIvt1rnH8BOkk6PiEWASOMk7VdcSNKWbezPrMe4qsqsc7aRNDq3bexDut3oaqTk8KKktYBPdLCNCcD1wO8krQTcDnxQ0oYAklaWtDFppNbRkjbI6+3X5tbMGsRnHNZXDc53fKv4Y0SU7pJLqkI6jdTG8Vfgqoh4Q9I/genAY8DfO9pIRJwh6a3ARcD+wMHAZZIG5kW+ExH/ytVfv5f0LClJ+UZD1mM8Oq6ZmdXEVVVmZlYTJw4zM6uJE4eZmdXEicPMzGrixGFmZjVx4jAzs5o4cZiZWU3+H4p3rvuM3scsAAAAAElFTkSuQmCC\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "b80fc143-bf3e-430a-bcd1-f541be6d931f",
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 197,
+ "id": "45b29fd1-03dd-460a-a837-df0f0c1dafa4",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "plot_survived_vs_feature(\"sibsp\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 26,
+ "id": "6f788970-a071-4986-b529-694f85aeb2b5",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0 608\n",
+ "1 209\n",
+ "2 28\n",
+ "4 18\n",
+ "3 16\n",
+ "8 7\n",
+ "5 5\n",
+ "Name: sibsp, dtype: int64"
+ ]
+ },
+ "execution_count": 26,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "8fc069d6-29dd-4c23-b657-2391baf2dcdf",
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 198,
+ "id": "6afc1541-2a9b-4118-bcf6-4e7bd349744d",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "plot_survived_vs_feature(\"parch\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 27,
+ "id": "6770cfff-e584-4d79-87b5-b37dce7e2cfb",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0 678\n",
+ "1 118\n",
+ "2 80\n",
+ "5 5\n",
+ "3 5\n",
+ "4 4\n",
+ "6 1\n",
+ "Name: parch, dtype: int64"
+ ]
+ },
+ "execution_count": 27,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "df09fcdc-f6fb-4039-9a5b-6f20927a2f09",
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 215,
+ "id": "adcca888-7fe0-48c6-bc26-e2e8e2646cf9",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "df1 = data[data[\"survived\"] == 0].groupby(\"alone\")[\"alone\"].count()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 217,
+ "id": "e6120007-73b1-4015-bbbd-5feebfca542b",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "df2 = data[data[\"survived\"] == 1].groupby(\"alone\")[\"alone\"].count()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 222,
+ "id": "cc9465f1-c689-4ebe-9ffd-30a998cdf7a4",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "categories = [\"False\",\"True\"]\n",
+ "values1 = df1\n",
+ "values2 = df2\n",
+ "\n",
+ "# 设置条形图的宽度\n",
+ "bar_width = 0.2\n",
+ "\n",
+ "# 计算并列的条形图的横坐标位置\n",
+ "x_data1 = np.arange(len(categories))\n",
+ "x_data2 = x_data1 + bar_width\n",
+ "\n",
+ "# 创建并列的条形图\n",
+ "\n",
+ "plt.bar(x_data1, values1, width=bar_width, color='#539caf', label='0')\n",
+ "plt.bar(x_data2, values2, width=bar_width, color='#7663b0', label='1')\n",
+ "\n",
+ "\n",
+ "# 添加标签和标题\n",
+ "plt.xlabel('alone')\n",
+ "plt.ylabel('pclass')\n",
+ "plt.title('Number of people for whether be survived or alone or not')\n",
+ "plt.xticks(x_data1 + bar_width, categories) # 设置横坐标刻度位置\n",
+ "\n",
+ "# 添加图例\n",
+ "plt.legend()\n",
+ "\n",
+ "# 显示图形\n",
+ "plt.show()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "40a8bf94-cfea-48bd-a7b5-4ab956970d9d",
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 29,
+ "id": "d5bddee4-1758-435f-95b1-d23657417d16",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "survived 0 1\n",
+ "alone \n",
+ "False 175 179\n",
+ "True 374 163\n"
+ ]
+ },
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "aa7a5851-e261-4048-b85d-96af26ff8a4a",
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "base_wtl",
+ "language": "python",
+ "name": "base_wtl"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.9.13"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/2023/homework/TinglanWang/numpy_submit.txt b/2023/homework/TinglanWang/numpy_submit.txt
new file mode 100644
index 00000000..c557c51f
--- /dev/null
+++ b/2023/homework/TinglanWang/numpy_submit.txt
@@ -0,0 +1,10 @@
+C
+D
+B
+E
+C
+C
+E
+C
+C
+E
diff --git a/2023/homework/TinglanWang/pandas_submit.txt b/2023/homework/TinglanWang/pandas_submit.txt
new file mode 100644
index 00000000..bcf22b49
--- /dev/null
+++ b/2023/homework/TinglanWang/pandas_submit.txt
@@ -0,0 +1,12 @@
+B
+D
+C
+A
+B
+D
+A
+C
+C
+B
+B
+B
diff --git a/2023/homework/TinglanWang/python_submit.txt b/2023/homework/TinglanWang/python_submit.txt
new file mode 100644
index 00000000..79d54642
--- /dev/null
+++ b/2023/homework/TinglanWang/python_submit.txt
@@ -0,0 +1,108 @@
+C
+B
+C
+C
+C
+A
+A
+B
+B
+B
+B
+C
+A
+D
+A
+A
+A
+C
+D
+B
+A
+D
+C
+A
+D
+B
+C
+B
+A
+C
+C
+C
+B
+C
+D
+B
+A
+C
+D
+A
+B
+A
+C
+A
+C
+D
+B
+A
+B
+A
+A
+A
+B
+D
+D
+B
+A
+D
+B
+C
+B
+B
+A
+D
+A
+D
+A
+B
+A
+A
+D
+B
+C
+A
+C
+C
+B
+B
+C
+A
+B
+A
+C
+B
+A
+C
+C
+B
+B
+B
+A
+C
+A
+B
+B
+A
+A
+A
+C
+C
+A
+B
+B
+A
+A
+B
+A
+A
diff --git a/2023/homework/TinglanWang/readme.md b/2023/homework/TinglanWang/readme.md
new file mode 100644
index 00000000..71849ae2
--- /dev/null
+++ b/2023/homework/TinglanWang/readme.md
@@ -0,0 +1,3 @@
+# hello
+
+*好的*
diff --git a/2023/homework/Wencong_Hong/numpy_submit.txt b/2023/homework/Wencong_Hong/numpy_submit.txt
new file mode 100644
index 00000000..92869791
--- /dev/null
+++ b/2023/homework/Wencong_Hong/numpy_submit.txt
@@ -0,0 +1,10 @@
+C
+D
+B
+E
+C
+C
+E
+C
+C
+E
\ No newline at end of file
diff --git a/2023/homework/Wencong_Hong/pandas_submit.txt b/2023/homework/Wencong_Hong/pandas_submit.txt
new file mode 100644
index 00000000..18face45
--- /dev/null
+++ b/2023/homework/Wencong_Hong/pandas_submit.txt
@@ -0,0 +1,12 @@
+B
+D
+C
+A
+B
+D
+A
+C
+C
+B
+B
+B
\ No newline at end of file
diff --git a/2023/homework/Wencong_Hong/python_submit.txt b/2023/homework/Wencong_Hong/python_submit.txt
new file mode 100644
index 00000000..27593e1d
--- /dev/null
+++ b/2023/homework/Wencong_Hong/python_submit.txt
@@ -0,0 +1,108 @@
+C
+B
+C
+C
+C
+A
+A
+B
+B
+B
+B
+C
+A
+D
+A
+A
+A
+C
+D
+B
+A
+D
+C
+A
+D
+B
+C
+B
+A
+C
+C
+C
+B
+C
+D
+B
+A
+C
+D
+A
+B
+A
+C
+A
+C
+D
+B
+A
+B
+A
+A
+A
+B
+D
+D
+B
+A
+D
+B
+C
+B
+B
+A
+D
+A
+D
+A
+B
+A
+A
+D
+B
+C
+A
+C
+C
+B
+B
+C
+A
+B
+A
+C
+B
+A
+C
+C
+B
+B
+B
+A
+C
+A
+B
+B
+A
+A
+A
+C
+C
+A
+B
+B
+A
+A
+B
+A
+A
\ No newline at end of file
diff --git a/2023/homework/XuWeiZhang-UCAS/homework_credit_scoring.ipynb b/2023/homework/XuWeiZhang-UCAS/homework_credit_scoring.ipynb
new file mode 100644
index 00000000..55e120a6
--- /dev/null
+++ b/2023/homework/XuWeiZhang-UCAS/homework_credit_scoring.ipynb
@@ -0,0 +1,1006 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# 一起来打怪之 Credit Scoring 练习"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "---\n",
+ "## 作业说明\n",
+ "\n",
+ "- 答题步骤:\n",
+ " - 回答问题**请保留每一步**操作过程,请不要仅仅给出最后答案\n",
+ " - 请养成代码注释的好习惯\n",
+ "\n",
+ "- 解题思路:\n",
+ " - 为方便大家准确理解题目,在习题实战中有所收获,本文档提供了解题思路提示\n",
+ " - 解题思路**仅供参考**,鼓励原创解题方法\n",
+ " - 为督促同学们自己思考,解题思路内容设置为**白色**,必要时请从冒号后拖动鼠标查看\n",
+ "\n",
+ "- 所用数据\n",
+ " - 请注意导入数据库后先**查看和了解数据的基本性质**,后面的问题不再一一提醒"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## machine learning for credit scoring\n",
+ "\n",
+ "\n",
+ "Banks play a crucial role in market economies. They decide who can get finance and on what terms and can make or break investment decisions. For markets and society to function, individuals and companies need access to credit. \n",
+ "\n",
+ "Credit scoring algorithms, which make a guess at the probability of default, are the method banks use to determine whether or not a loan should be granted. This competition requires participants to improve on the state of the art in credit scoring, by predicting the probability that somebody will experience financial distress in the next two years. [Dataset](https://www.kaggle.com/c/GiveMeSomeCredit)\n",
+ "\n",
+ "Attribute Information:\n",
+ "\n",
+ "|Variable Name\t|\tDescription\t|\tType|\n",
+ "|----|----|----|\n",
+ "|SeriousDlqin2yrs\t|\tPerson experienced 90 days past due delinquency or worse \t|\tY/N|\n",
+ "|RevolvingUtilizationOfUnsecuredLines\t|\tTotal balance on credit divided by the sum of credit limits\t|\tpercentage|\n",
+ "|age\t|\tAge of borrower in years\t|\tinteger|\n",
+ "|NumberOfTime30-59DaysPastDueNotWorse\t|\tNumber of times borrower has been 30-59 days past due |\tinteger|\n",
+ "|DebtRatio\t|\tMonthly debt payments\t|\tpercentage|\n",
+ "|MonthlyIncome\t|\tMonthly income\t|\treal|\n",
+ "|NumberOfOpenCreditLinesAndLoans\t|\tNumber of Open loans |\tinteger|\n",
+ "|NumberOfTimes90DaysLate\t|\tNumber of times borrower has been 90 days or more past due.\t|\tinteger|\n",
+ "|NumberRealEstateLoansOrLines\t|\tNumber of mortgage and real estate loans\t|\tinteger|\n",
+ "|NumberOfTime60-89DaysPastDueNotWorse\t|\tNumber of times borrower has been 60-89 days past due |integer|\n",
+ "|NumberOfDependents\t|\tNumber of dependents in family\t|\tinteger|\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "----------\n",
+ "## Read the data into Pandas "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " SeriousDlqin2yrs \n",
+ " RevolvingUtilizationOfUnsecuredLines \n",
+ " age \n",
+ " NumberOfTime30-59DaysPastDueNotWorse \n",
+ " DebtRatio \n",
+ " MonthlyIncome \n",
+ " NumberOfOpenCreditLinesAndLoans \n",
+ " NumberOfTimes90DaysLate \n",
+ " NumberRealEstateLoansOrLines \n",
+ " NumberOfTime60-89DaysPastDueNotWorse \n",
+ " NumberOfDependents \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 1 \n",
+ " 0.766127 \n",
+ " 45.0 \n",
+ " 2.0 \n",
+ " 0.802982 \n",
+ " 9120.0 \n",
+ " 13.0 \n",
+ " 0.0 \n",
+ " 6.0 \n",
+ " 0.0 \n",
+ " 2.0 \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 0 \n",
+ " 0.957151 \n",
+ " 40.0 \n",
+ " 0.0 \n",
+ " 0.121876 \n",
+ " 2600.0 \n",
+ " 4.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 0 \n",
+ " 0.658180 \n",
+ " 38.0 \n",
+ " 1.0 \n",
+ " 0.085113 \n",
+ " 3042.0 \n",
+ " 2.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 0 \n",
+ " 0.233810 \n",
+ " 30.0 \n",
+ " 0.0 \n",
+ " 0.036050 \n",
+ " 3300.0 \n",
+ " 5.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " 0 \n",
+ " 0.907239 \n",
+ " 49.0 \n",
+ " 1.0 \n",
+ " 0.024926 \n",
+ " 63588.0 \n",
+ " 7.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " SeriousDlqin2yrs RevolvingUtilizationOfUnsecuredLines age \\\n",
+ "0 1 0.766127 45.0 \n",
+ "1 0 0.957151 40.0 \n",
+ "2 0 0.658180 38.0 \n",
+ "3 0 0.233810 30.0 \n",
+ "4 0 0.907239 49.0 \n",
+ "\n",
+ " NumberOfTime30-59DaysPastDueNotWorse DebtRatio MonthlyIncome \\\n",
+ "0 2.0 0.802982 9120.0 \n",
+ "1 0.0 0.121876 2600.0 \n",
+ "2 1.0 0.085113 3042.0 \n",
+ "3 0.0 0.036050 3300.0 \n",
+ "4 1.0 0.024926 63588.0 \n",
+ "\n",
+ " NumberOfOpenCreditLinesAndLoans NumberOfTimes90DaysLate \\\n",
+ "0 13.0 0.0 \n",
+ "1 4.0 0.0 \n",
+ "2 2.0 1.0 \n",
+ "3 5.0 0.0 \n",
+ "4 7.0 0.0 \n",
+ "\n",
+ " NumberRealEstateLoansOrLines NumberOfTime60-89DaysPastDueNotWorse \\\n",
+ "0 6.0 0.0 \n",
+ "1 0.0 0.0 \n",
+ "2 0.0 0.0 \n",
+ "3 0.0 0.0 \n",
+ "4 1.0 0.0 \n",
+ "\n",
+ " NumberOfDependents \n",
+ "0 2.0 \n",
+ "1 1.0 \n",
+ "2 0.0 \n",
+ "3 0.0 \n",
+ "4 0.0 "
+ ]
+ },
+ "execution_count": 2,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "import pandas as pd\n",
+ "pd.set_option('display.max_columns', 500)\n",
+ "import zipfile\n",
+ "with zipfile.ZipFile('/root/GWData-Bootcamp/2023/machine_learning/KaggleCredit2.csv.zip', 'r') as z:\n",
+ " f = z.open('KaggleCredit2.csv')\n",
+ " data = pd.read_csv(f, index_col=0)\n",
+ "data.head()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(112915, 11)"
+ ]
+ },
+ "execution_count": 3,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "data.shape"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "------------\n",
+ "## Drop na"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "SeriousDlqin2yrs 0\n",
+ "RevolvingUtilizationOfUnsecuredLines 0\n",
+ "age 4267\n",
+ "NumberOfTime30-59DaysPastDueNotWorse 0\n",
+ "DebtRatio 0\n",
+ "MonthlyIncome 0\n",
+ "NumberOfOpenCreditLinesAndLoans 0\n",
+ "NumberOfTimes90DaysLate 0\n",
+ "NumberRealEstateLoansOrLines 0\n",
+ "NumberOfTime60-89DaysPastDueNotWorse 0\n",
+ "NumberOfDependents 4267\n",
+ "dtype: int64"
+ ]
+ },
+ "execution_count": 4,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "data.isnull().sum(axis=0)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(108648, 11)"
+ ]
+ },
+ "execution_count": 5,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "data.dropna(inplace=True)\n",
+ "data.shape"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "---------\n",
+ "## Create X and y"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [],
+ "source": [
+ "y = data['SeriousDlqin2yrs']\n",
+ "X = data.drop('SeriousDlqin2yrs', axis=1)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0.06742876076872101"
+ ]
+ },
+ "execution_count": 7,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "y.mean()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "---\n",
+ "## 练习1:把数据切分成训练集和测试集\n",
+ "- 提示:from sklearn.model_selection import train_test_split('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [],
+ "source": [
+ "from sklearn.model_selection import train_test_split\n",
+ "\n",
+ "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # 百分之20用于测试"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "----\n",
+ "## 练习2:使用logistic regression/决策树/SVM/KNN...等sklearn分类算法进行分类\n",
+ "尝试查sklearn API了解模型参数含义,调整不同的参数"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import matplotlib.pyplot as plt\n",
+ "import seaborn as sns"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from sklearn.metrics import confusion_matrix\n",
+ "# 绘制混淆矩阵函数\n",
+ "def confusion_matrix_draw(cm):\n",
+ " sns.heatmap(cm, annot=True, fmt='d')\n",
+ " plt.xlabel('Predicted')\n",
+ " plt.ylabel('Actual')\n",
+ " plt.title('Confusion Matrix')\n",
+ " plt.show()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Logistic regression\n",
+ "- 提示:from sklearn import linear_model('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "/root/miniconda3/envs/ictp-ap/lib/python3.10/site-packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
+ "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
+ "\n",
+ "Increase the number of iterations (max_iter) or scale the data as shown in:\n",
+ " https://scikit-learn.org/stable/modules/preprocessing.html\n",
+ "Please also refer to the documentation for alternative solver options:\n",
+ " https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
+ " n_iter_i = _check_optimize_result(\n"
+ ]
+ },
+ {
+ "data": {
+ "image/png": "",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "from sklearn.linear_model import LogisticRegression\n",
+ "\n",
+ "\n",
+ "# 创建 Logistic Regression 分类器\n",
+ "classifier = LogisticRegression()\n",
+ "\n",
+ "# 使用训练集拟合分类器\n",
+ "classifier.fit(X_train, y_train)\n",
+ "\n",
+ "# 使用分类器进行预测\n",
+ "y_pred = classifier.predict(X_test)\n",
+ "\n",
+ "# 使用分类器进行预测并获得混淆矩阵\n",
+ "cm_Logisticregression = confusion_matrix(y_test, y_pred)\n",
+ "\n",
+ "# 绘制混淆矩阵\n",
+ "confusion_matrix_draw(cm_Logisticregression)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Decision Tree\n",
+ "- 提示:from sklearn.tree import DecisionTreeClassifier('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "image/png": "",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "from sklearn.tree import DecisionTreeClassifier\n",
+ "\n",
+ "# 创建决策树分类器\n",
+ "classifier = DecisionTreeClassifier()\n",
+ "\n",
+ "# 使用训练集拟合分类器\n",
+ "classifier.fit(X_train, y_train)\n",
+ "\n",
+ "# 使用分类器进行预测\n",
+ "y_pred = classifier.predict(X_test)\n",
+ "\n",
+ "# 使用分类器进行预测并获得混淆矩阵\n",
+ "cm_DecisionTreeClassifier = confusion_matrix(y_test, y_pred)\n",
+ "\n",
+ "# 绘制混淆矩阵\n",
+ "confusion_matrix_draw(cm_DecisionTreeClassifier)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Random Forest\n",
+ "- 提示:from sklearn.ensemble import RandomForestClassifier('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 13,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "image/png": "",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "from sklearn.ensemble import RandomForestClassifier\n",
+ "\n",
+ "# 创建随机森林分类器\n",
+ "classifier = RandomForestClassifier()\n",
+ "\n",
+ "# 使用训练集拟合分类器\n",
+ "classifier.fit(X_train, y_train)\n",
+ "\n",
+ "# 使用分类器进行预测\n",
+ "y_pred = classifier.predict(X_test)\n",
+ "\n",
+ "# 使用分类器进行预测并获得混淆矩阵\n",
+ "cm_RandomForestClassifier = confusion_matrix(y_test, y_pred)\n",
+ "\n",
+ "# 绘制混淆矩阵\n",
+ "confusion_matrix_draw(cm_RandomForestClassifier)\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### SVM\n",
+ "- 提示:from sklearn.svm import SVC('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 14,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "image/png": "",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "from sklearn.svm import SVC\n",
+ "\n",
+ "# 创建支持向量机分类器\n",
+ "classifier = SVC()\n",
+ "\n",
+ "# 使用训练集拟合分类器\n",
+ "classifier.fit(X_train, y_train)\n",
+ "\n",
+ "# 使用分类器进行预测\n",
+ "y_pred = classifier.predict(X_test)\n",
+ "\n",
+ "# 使用分类器进行预测并获得混淆矩阵\n",
+ "cm_SVC = confusion_matrix(y_test, y_pred)\n",
+ "\n",
+ "# 绘制混淆矩阵\n",
+ "confusion_matrix_draw(cm_SVC)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### KNN\n",
+ "- 提示:from sklearn.neighbors import KNeighborsClassifier('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 15,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "image/png": "",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "from sklearn.neighbors import KNeighborsClassifier\n",
+ "\n",
+ "# 创建K最近邻分类器,设置邻居数为3\n",
+ "classifier = KNeighborsClassifier(n_neighbors=3)\n",
+ "\n",
+ "# 使用训练集拟合分类器\n",
+ "classifier.fit(X_train, y_train)\n",
+ "\n",
+ "# 使用分类器进行预测\n",
+ "y_pred = classifier.predict(X_test)\n",
+ "\n",
+ "# 使用分类器进行预测并获得混淆矩阵\n",
+ "cm_KNN = confusion_matrix(y_test, y_pred)\n",
+ "\n",
+ "# 绘制混淆矩阵\n",
+ "confusion_matrix_draw(cm_KNN)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "---\n",
+ "\n",
+ "## 练习3:在测试集上进行预测,计算准确度"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Logistic regression\n",
+ "- 提示:y_pred_LR = clf_LR.predict(x_test)('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 16,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from sklearn.metrics import accuracy_score\n",
+ "# 封装计算函数\n",
+ "def calAccuracy(cm):\n",
+ " # 提取混淆矩阵中的真正例和真负例的数量\n",
+ " tn, fp, fn, tp = cm.ravel()\n",
+ "\n",
+ " # 计算准确率\n",
+ " accuracy = (tp + tn) / (tp + tn + fp + fn)\n",
+ "\n",
+ " # 打印准确率\n",
+ " print(\"准确率:\", accuracy)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 17,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "准确率: 0.9319374137137598\n"
+ ]
+ }
+ ],
+ "source": [
+ "calAccuracy(cm_Logisticregression)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Decision Tree\n",
+ "- 提示:y_pred_tree = tree.predict(x_test)('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 18,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "准确率: 0.8946157386102163\n"
+ ]
+ }
+ ],
+ "source": [
+ "calAccuracy(cm_DecisionTreeClassifier)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Random Forest\n",
+ "- 提示:y_pred_forest = forest.predict(x_test)('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 19,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "准确率: 0.9337321675103544\n"
+ ]
+ }
+ ],
+ "source": [
+ "calAccuracy(cm_RandomForestClassifier)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### SVM\n",
+ "- 提示:y_pred_SVC = clf_svc.predict(x_test)('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 20,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "准确率: 0.9317073170731708\n"
+ ]
+ }
+ ],
+ "source": [
+ "calAccuracy(cm_SVC)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### KNN\n",
+ "- 提示:y_pred_KNN = neigh.predict(x_test)('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 21,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "准确率: 0.926829268292683\n"
+ ]
+ }
+ ],
+ "source": [
+ "calAccuracy(cm_KNN)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "---\n",
+ "## 练习4:查看sklearn的官方说明,了解分类问题的评估标准,并对此例进行评估"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**混淆矩阵(Confusion Matrix)相关学习链接**\n",
+ "\n",
+ "- Blog: \n",
+ "http://blog.csdn.net/vesper305/article/details/44927047 \n",
+ "- WiKi: \n",
+ "http://en.wikipedia.org/wiki/Confusion_matrix \n",
+ "- sklearn doc: \n",
+ "http://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 22,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [],
+ "source": [
+ "## your code here"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 练习5:调整模型的标准\n",
+ "\n",
+ "银行通常会有更严格的要求,因为fraud带来的后果通常比较严重,一般我们会调整模型的标准。 \n",
+ "\n",
+ "比如在logistic regression当中,一般我们的概率判定边界为0.5,但是我们可以把阈值设定低一些,来提高模型的“敏感度”,试试看把阈值设定为0.3,再看看这时的评估指标(主要是准确率和召回率)。\n",
+ "\n",
+ "- 提示:sklearn的很多分类模型,predict_prob可以拿到预估的概率,可以根据它和设定的阈值大小去判断最终结果(分类类别)('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 23,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Accuracy: 0.9312931431201105\n",
+ "Recall: 0.03706199460916442\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "/root/miniconda3/envs/ictp-ap/lib/python3.10/site-packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
+ "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
+ "\n",
+ "Increase the number of iterations (max_iter) or scale the data as shown in:\n",
+ " https://scikit-learn.org/stable/modules/preprocessing.html\n",
+ "Please also refer to the documentation for alternative solver options:\n",
+ " https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
+ " n_iter_i = _check_optimize_result(\n"
+ ]
+ }
+ ],
+ "source": [
+ "from sklearn.linear_model import LogisticRegression\n",
+ "from sklearn.metrics import accuracy_score, recall_score\n",
+ "\n",
+ "# 训练Logistic Regression模型\n",
+ "lr_model = LogisticRegression()\n",
+ "lr_model.fit(X_train, y_train)\n",
+ "\n",
+ "# 根据阈值调整概率判定边界\n",
+ "threshold = 0.3\n",
+ "y_pred_prob = lr_model.predict_proba(X_test)[:, 1] # 获取正类的预测概率\n",
+ "y_pred = (y_pred_prob >= threshold).astype(int) # 根据阈值判断最终结果\n",
+ "\n",
+ "# 计算评估指标\n",
+ "accuracy = accuracy_score(y_test, y_pred)\n",
+ "recall = recall_score(y_test, y_pred)\n",
+ "\n",
+ "# 输出评估指标\n",
+ "print(\"Accuracy:\", accuracy)\n",
+ "print(\"Recall:\", recall)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "ictp-ap",
+ "language": "python",
+ "name": "ictp-ap"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.10.13"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/2023/homework/XuWeiZhang-UCAS/homework_credit_scoring_finetune_ensemble.ipynb b/2023/homework/XuWeiZhang-UCAS/homework_credit_scoring_finetune_ensemble.ipynb
new file mode 100644
index 00000000..ae710693
--- /dev/null
+++ b/2023/homework/XuWeiZhang-UCAS/homework_credit_scoring_finetune_ensemble.ipynb
@@ -0,0 +1,1908 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# 一起来打怪之 Credit Scoring 练习"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "-------\n",
+ "## >>>说明:\n",
+ "### 1. 答题步骤:\n",
+ "- 回答问题**请保留每一步**操作过程,请不要仅仅给出最后答案\n",
+ "- 请养成代码注释的好习惯\n",
+ "\n",
+ "### 2. 解题思路:\n",
+ "- 为方便大家准确理解题目,在习题实战中有所收获,本文档提供了解题思路提示\n",
+ "- 解题思路**仅供参考**,鼓励原创解题方法\n",
+ "- 为督促同学们自己思考,解题思路内容设置为**注释**,请注意查看\n",
+ "\n",
+ "### 3. 所用数据:\n",
+ "- 问题使用了多个数据库,请注意导入每个数据库后都先**查看和了解数据的基本性质**,后面的问题不再一一提醒"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "--------\n",
+ "## 操作题"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### 信用卡欺诈项目"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ " #### 前期数据导入,预览及处理(此部分勿修改,涉及的数据文件无需复制移动)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " SeriousDlqin2yrs \n",
+ " RevolvingUtilizationOfUnsecuredLines \n",
+ " age \n",
+ " NumberOfTime30-59DaysPastDueNotWorse \n",
+ " DebtRatio \n",
+ " MonthlyIncome \n",
+ " NumberOfOpenCreditLinesAndLoans \n",
+ " NumberOfTimes90DaysLate \n",
+ " NumberRealEstateLoansOrLines \n",
+ " NumberOfTime60-89DaysPastDueNotWorse \n",
+ " NumberOfDependents \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 1 \n",
+ " 0.766127 \n",
+ " 45.0 \n",
+ " 2.0 \n",
+ " 0.802982 \n",
+ " 9120.0 \n",
+ " 13.0 \n",
+ " 0.0 \n",
+ " 6.0 \n",
+ " 0.0 \n",
+ " 2.0 \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 0 \n",
+ " 0.957151 \n",
+ " 40.0 \n",
+ " 0.0 \n",
+ " 0.121876 \n",
+ " 2600.0 \n",
+ " 4.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 0 \n",
+ " 0.658180 \n",
+ " 38.0 \n",
+ " 1.0 \n",
+ " 0.085113 \n",
+ " 3042.0 \n",
+ " 2.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 0 \n",
+ " 0.233810 \n",
+ " 30.0 \n",
+ " 0.0 \n",
+ " 0.036050 \n",
+ " 3300.0 \n",
+ " 5.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " 0 \n",
+ " 0.907239 \n",
+ " 49.0 \n",
+ " 1.0 \n",
+ " 0.024926 \n",
+ " 63588.0 \n",
+ " 7.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " SeriousDlqin2yrs RevolvingUtilizationOfUnsecuredLines age \\\n",
+ "0 1 0.766127 45.0 \n",
+ "1 0 0.957151 40.0 \n",
+ "2 0 0.658180 38.0 \n",
+ "3 0 0.233810 30.0 \n",
+ "4 0 0.907239 49.0 \n",
+ "\n",
+ " NumberOfTime30-59DaysPastDueNotWorse DebtRatio MonthlyIncome \\\n",
+ "0 2.0 0.802982 9120.0 \n",
+ "1 0.0 0.121876 2600.0 \n",
+ "2 1.0 0.085113 3042.0 \n",
+ "3 0.0 0.036050 3300.0 \n",
+ "4 1.0 0.024926 63588.0 \n",
+ "\n",
+ " NumberOfOpenCreditLinesAndLoans NumberOfTimes90DaysLate \\\n",
+ "0 13.0 0.0 \n",
+ "1 4.0 0.0 \n",
+ "2 2.0 1.0 \n",
+ "3 5.0 0.0 \n",
+ "4 7.0 0.0 \n",
+ "\n",
+ " NumberRealEstateLoansOrLines NumberOfTime60-89DaysPastDueNotWorse \\\n",
+ "0 6.0 0.0 \n",
+ "1 0.0 0.0 \n",
+ "2 0.0 0.0 \n",
+ "3 0.0 0.0 \n",
+ "4 1.0 0.0 \n",
+ "\n",
+ " NumberOfDependents \n",
+ "0 2.0 \n",
+ "1 1.0 \n",
+ "2 0.0 \n",
+ "3 0.0 \n",
+ "4 0.0 "
+ ]
+ },
+ "execution_count": 1,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "import pandas as pd\n",
+ "pd.set_option('display.max_columns', 500)\n",
+ "import zipfile\n",
+ "with zipfile.ZipFile('/root/GWData-Bootcamp/2023/machine_learning/KaggleCredit2.csv.zip', 'r') as z:\n",
+ " f = z.open('KaggleCredit2.csv')\n",
+ " data = pd.read_csv(f, index_col=0)\n",
+ "data.head()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(112915, 11)"
+ ]
+ },
+ "execution_count": 2,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# 检查数据维度\n",
+ "data.shape"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "SeriousDlqin2yrs 0\n",
+ "RevolvingUtilizationOfUnsecuredLines 0\n",
+ "age 4267\n",
+ "NumberOfTime30-59DaysPastDueNotWorse 0\n",
+ "DebtRatio 0\n",
+ "MonthlyIncome 0\n",
+ "NumberOfOpenCreditLinesAndLoans 0\n",
+ "NumberOfTimes90DaysLate 0\n",
+ "NumberRealEstateLoansOrLines 0\n",
+ "NumberOfTime60-89DaysPastDueNotWorse 0\n",
+ "NumberOfDependents 4267\n",
+ "dtype: int64"
+ ]
+ },
+ "execution_count": 3,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# 查看数据缺失值情况\n",
+ "data.isnull().sum(axis=0)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "/tmp/ipykernel_3450/2980780030.py:3: UserWarning: Pandas doesn't allow columns to be created via a new attribute name - see https://pandas.pydata.org/pandas-docs/stable/indexing.html#attribute-access\n",
+ " data.shapey = data['SeriousDlqin2yrs']\n"
+ ]
+ }
+ ],
+ "source": [
+ "# 清除缺失值\n",
+ "data.dropna(inplace=True)\n",
+ "data.shapey = data['SeriousDlqin2yrs']\n",
+ "X = data.drop('SeriousDlqin2yrs', axis=1)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0.06742876076872101"
+ ]
+ },
+ "execution_count": 5,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# 取出对应的X和y\n",
+ "y = data['SeriousDlqin2yrs']\n",
+ "X = data.drop('SeriousDlqin2yrs', axis=1)\n",
+ "# 查看平均的欺诈率\n",
+ "y.mean()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0 1\n",
+ "1 0\n",
+ "2 0\n",
+ "3 0\n",
+ "4 0\n",
+ "Name: SeriousDlqin2yrs, dtype: int64"
+ ]
+ },
+ "execution_count": 6,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "y.head()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " RevolvingUtilizationOfUnsecuredLines \n",
+ " age \n",
+ " NumberOfTime30-59DaysPastDueNotWorse \n",
+ " DebtRatio \n",
+ " MonthlyIncome \n",
+ " NumberOfOpenCreditLinesAndLoans \n",
+ " NumberOfTimes90DaysLate \n",
+ " NumberRealEstateLoansOrLines \n",
+ " NumberOfTime60-89DaysPastDueNotWorse \n",
+ " NumberOfDependents \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 0.766127 \n",
+ " 45.0 \n",
+ " 2.0 \n",
+ " 0.802982 \n",
+ " 9120.0 \n",
+ " 13.0 \n",
+ " 0.0 \n",
+ " 6.0 \n",
+ " 0.0 \n",
+ " 2.0 \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 0.957151 \n",
+ " 40.0 \n",
+ " 0.0 \n",
+ " 0.121876 \n",
+ " 2600.0 \n",
+ " 4.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 0.658180 \n",
+ " 38.0 \n",
+ " 1.0 \n",
+ " 0.085113 \n",
+ " 3042.0 \n",
+ " 2.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 0.233810 \n",
+ " 30.0 \n",
+ " 0.0 \n",
+ " 0.036050 \n",
+ " 3300.0 \n",
+ " 5.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " 0.907239 \n",
+ " 49.0 \n",
+ " 1.0 \n",
+ " 0.024926 \n",
+ " 63588.0 \n",
+ " 7.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " RevolvingUtilizationOfUnsecuredLines age \\\n",
+ "0 0.766127 45.0 \n",
+ "1 0.957151 40.0 \n",
+ "2 0.658180 38.0 \n",
+ "3 0.233810 30.0 \n",
+ "4 0.907239 49.0 \n",
+ "\n",
+ " NumberOfTime30-59DaysPastDueNotWorse DebtRatio MonthlyIncome \\\n",
+ "0 2.0 0.802982 9120.0 \n",
+ "1 0.0 0.121876 2600.0 \n",
+ "2 1.0 0.085113 3042.0 \n",
+ "3 0.0 0.036050 3300.0 \n",
+ "4 1.0 0.024926 63588.0 \n",
+ "\n",
+ " NumberOfOpenCreditLinesAndLoans NumberOfTimes90DaysLate \\\n",
+ "0 13.0 0.0 \n",
+ "1 4.0 0.0 \n",
+ "2 2.0 1.0 \n",
+ "3 5.0 0.0 \n",
+ "4 7.0 0.0 \n",
+ "\n",
+ " NumberRealEstateLoansOrLines NumberOfTime60-89DaysPastDueNotWorse \\\n",
+ "0 6.0 0.0 \n",
+ "1 0.0 0.0 \n",
+ "2 0.0 0.0 \n",
+ "3 0.0 0.0 \n",
+ "4 1.0 0.0 \n",
+ "\n",
+ " NumberOfDependents \n",
+ "0 2.0 \n",
+ "1 1.0 \n",
+ "2 0.0 \n",
+ "3 0.0 \n",
+ "4 0.0 "
+ ]
+ },
+ "execution_count": 7,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "X.head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### 以下为操作题"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### 1.把数据切分成训练集和测试集"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [],
+ "source": [
+ "from sklearn.model_selection import train_test_split\n",
+ "\n",
+ "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # 百分之20用于测试"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "SeriousDlqin2yrs\n",
+ "0 101322\n",
+ "1 7326\n",
+ "Name: count, dtype: int64\n"
+ ]
+ },
+ {
+ "data": {
+ "image/png": "",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "#通过SeriousDlqin2yrs字段查看正负样本分布情况\n",
+ "#提示:value_counts\n",
+ "#绘制两种类别的柱状图\n",
+ "#提示:dataframe可以直接plot(kind='bar')\n",
+ "import matplotlib.pyplot as plt\n",
+ "\n",
+ "# 查看正负样本分布情况\n",
+ "class_counts = y.value_counts()\n",
+ "\n",
+ "# 打印结果\n",
+ "print(class_counts)\n",
+ "\n",
+ "# 绘制柱状图\n",
+ "class_counts.plot(kind='bar')\n",
+ "plt.xlabel('Class')\n",
+ "plt.ylabel('Count')\n",
+ "plt.title('Distribution of Classes')\n",
+ "plt.show()\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### 2.数据预处理之离散化"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 15,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "103\n",
+ "0 15\n",
+ "1 13\n",
+ "2 12\n",
+ "3 10\n",
+ "4 16\n",
+ " ..\n",
+ "112910 16\n",
+ "112911 24\n",
+ "112912 14\n",
+ "112913 10\n",
+ "112914 21\n",
+ "Name: age_discretized, Length: 108648, dtype: int64\n"
+ ]
+ },
+ {
+ "data": {
+ "image/png": "",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "# 请对年龄按照3岁一个区间进行离散化\n",
+ "# 提示:可以先计算出分桶边界,再基于pandas的cut函数进行离散化(分箱、分桶)\n",
+ "import pandas as pd\n",
+ "\n",
+ "# 转化成整数\n",
+ "\n",
+ "X['age'] = X['age'].astype(int)\n",
+ "\n",
+ "print(max(X['age']))\n",
+ "# 计算分桶边界\n",
+ "age_bins = range(0, max(X['age']) + 4, 3)\n",
+ "\n",
+ "# 进行离散化\n",
+ "X['age_discretized'] = pd.cut(X['age'], bins=age_bins, labels=False, right=False)\n",
+ "\n",
+ "# 打印离散化后的结果\n",
+ "print(X['age_discretized'])\n",
+ "\n",
+ "# 统计每个年龄段的计数\n",
+ "age_counts = X['age_discretized'].value_counts().sort_index()\n",
+ "\n",
+ "# 绘制柱状图\n",
+ "plt.bar(age_counts.index, age_counts.values)\n",
+ "plt.xlabel('Age Discretized')\n",
+ "plt.ylabel('Count')\n",
+ "plt.title('Distribution of Age Discretized')\n",
+ "plt.show()\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### 3.数据预处理之独热向量编码"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " RevolvingUtilizationOfUnsecuredLines \n",
+ " age \n",
+ " NumberOfTime30-59DaysPastDueNotWorse \n",
+ " DebtRatio \n",
+ " MonthlyIncome \n",
+ " NumberOfOpenCreditLinesAndLoans \n",
+ " NumberOfTimes90DaysLate \n",
+ " NumberRealEstateLoansOrLines \n",
+ " NumberOfTime60-89DaysPastDueNotWorse \n",
+ " NumberOfDependents \n",
+ " age_discretized \n",
+ " age_0 \n",
+ " age_7 \n",
+ " age_8 \n",
+ " age_9 \n",
+ " age_10 \n",
+ " age_11 \n",
+ " age_12 \n",
+ " age_13 \n",
+ " age_14 \n",
+ " age_15 \n",
+ " age_16 \n",
+ " age_17 \n",
+ " age_18 \n",
+ " age_19 \n",
+ " age_20 \n",
+ " age_21 \n",
+ " age_22 \n",
+ " age_23 \n",
+ " age_24 \n",
+ " age_25 \n",
+ " age_26 \n",
+ " age_27 \n",
+ " age_28 \n",
+ " age_29 \n",
+ " age_30 \n",
+ " age_31 \n",
+ " age_32 \n",
+ " age_33 \n",
+ " age_34 \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 0.766127 \n",
+ " 45 \n",
+ " 2.0 \n",
+ " 0.802982 \n",
+ " 9120.0 \n",
+ " 13.0 \n",
+ " 0.0 \n",
+ " 6.0 \n",
+ " 0.0 \n",
+ " 2.0 \n",
+ " 15 \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " True \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 0.957151 \n",
+ " 40 \n",
+ " 0.0 \n",
+ " 0.121876 \n",
+ " 2600.0 \n",
+ " 4.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 13 \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " True \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 0.658180 \n",
+ " 38 \n",
+ " 1.0 \n",
+ " 0.085113 \n",
+ " 3042.0 \n",
+ " 2.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 12 \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " True \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 0.233810 \n",
+ " 30 \n",
+ " 0.0 \n",
+ " 0.036050 \n",
+ " 3300.0 \n",
+ " 5.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 10 \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " True \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " 0.907239 \n",
+ " 49 \n",
+ " 1.0 \n",
+ " 0.024926 \n",
+ " 63588.0 \n",
+ " 7.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 16 \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " True \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " RevolvingUtilizationOfUnsecuredLines age \\\n",
+ "0 0.766127 45 \n",
+ "1 0.957151 40 \n",
+ "2 0.658180 38 \n",
+ "3 0.233810 30 \n",
+ "4 0.907239 49 \n",
+ "\n",
+ " NumberOfTime30-59DaysPastDueNotWorse DebtRatio MonthlyIncome \\\n",
+ "0 2.0 0.802982 9120.0 \n",
+ "1 0.0 0.121876 2600.0 \n",
+ "2 1.0 0.085113 3042.0 \n",
+ "3 0.0 0.036050 3300.0 \n",
+ "4 1.0 0.024926 63588.0 \n",
+ "\n",
+ " NumberOfOpenCreditLinesAndLoans NumberOfTimes90DaysLate \\\n",
+ "0 13.0 0.0 \n",
+ "1 4.0 0.0 \n",
+ "2 2.0 1.0 \n",
+ "3 5.0 0.0 \n",
+ "4 7.0 0.0 \n",
+ "\n",
+ " NumberRealEstateLoansOrLines NumberOfTime60-89DaysPastDueNotWorse \\\n",
+ "0 6.0 0.0 \n",
+ "1 0.0 0.0 \n",
+ "2 0.0 0.0 \n",
+ "3 0.0 0.0 \n",
+ "4 1.0 0.0 \n",
+ "\n",
+ " NumberOfDependents age_discretized age_0 age_7 age_8 age_9 age_10 \\\n",
+ "0 2.0 15 False False False False False \n",
+ "1 1.0 13 False False False False False \n",
+ "2 0.0 12 False False False False False \n",
+ "3 0.0 10 False False False False True \n",
+ "4 0.0 16 False False False False False \n",
+ "\n",
+ " age_11 age_12 age_13 age_14 age_15 age_16 age_17 age_18 age_19 \\\n",
+ "0 False False False False True False False False False \n",
+ "1 False False True False False False False False False \n",
+ "2 False True False False False False False False False \n",
+ "3 False False False False False False False False False \n",
+ "4 False False False False False True False False False \n",
+ "\n",
+ " age_20 age_21 age_22 age_23 age_24 age_25 age_26 age_27 age_28 \\\n",
+ "0 False False False False False False False False False \n",
+ "1 False False False False False False False False False \n",
+ "2 False False False False False False False False False \n",
+ "3 False False False False False False False False False \n",
+ "4 False False False False False False False False False \n",
+ "\n",
+ " age_29 age_30 age_31 age_32 age_33 age_34 \n",
+ "0 False False False False False False \n",
+ "1 False False False False False False \n",
+ "2 False False False False False False \n",
+ "3 False False False False False False \n",
+ "4 False False False False False False "
+ ]
+ },
+ "execution_count": 11,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# 请对上述分箱后的年龄段进行独热向量编码\n",
+ "# 提示:使用pandas的get_dummies完成\n",
+ "import pandas as pd\n",
+ "\n",
+ "# 对离散化后的年龄段进行独热向量编码\n",
+ "age_encoded = pd.get_dummies(X['age_discretized'], prefix='age')\n",
+ "\n",
+ "# 将编码结果与原始数据集合并\n",
+ "X_encoded = pd.concat([X, age_encoded], axis=1)\n",
+ "\n",
+ "# 打印编码后的结果\n",
+ "X_encoded.head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### 4.数据预处理之幅度缩放"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 13,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "[[-0.0201068 -0.44012195 0.45992706 ... -0.05427348 0.99468893\n",
+ " -0.37056727]\n",
+ " [-0.0193715 -0.78605882 -0.1074063 ... -0.05427348 0.12476948\n",
+ " -0.78509046]\n",
+ " [-0.02052231 -0.92443357 0.17626038 ... -0.05427348 -0.74514997\n",
+ " -0.99235205]\n",
+ " ...\n",
+ " [-0.02190201 -0.50930932 -0.1074063 ... -0.05427348 0.99468893\n",
+ " -0.57782887]\n",
+ " [-0.0230558 -1.47793256 -0.1074063 ... -0.05427348 -0.74514997\n",
+ " -1.40687523]\n",
+ " [-0.01978286 0.87443816 -0.1074063 ... -0.05427348 -0.74514997\n",
+ " 0.87300228]]\n"
+ ]
+ }
+ ],
+ "source": [
+ "# 请对连续值特征进行幅度缩放\n",
+ "# 提示:可以使用StandardScaler等幅度缩放器进行处理\n",
+ "from sklearn.preprocessing import StandardScaler\n",
+ "\n",
+ "# 创建一个StandardScaler对象\n",
+ "scaler = StandardScaler()\n",
+ "\n",
+ "# 对连续值特征进行幅度缩放\n",
+ "X_scaled = scaler.fit_transform(X)\n",
+ "\n",
+ "# 打印缩放后的结果\n",
+ "print(X_scaled)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### 5.使用logistic regression建模,并且输出一下系数,分析重要度。 "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 18,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Intercept: [-0.13710928]\n",
+ "Coefficients:\n",
+ "NumberOfTime60-89DaysPastDueNotWorse: -0.32889330018818197\n",
+ "NumberOfTime30-59DaysPastDueNotWorse: 0.31716574622696886\n",
+ "NumberOfDependents: 0.18901828399205395\n",
+ "DebtRatio: 0.05218217069190731\n",
+ "age: -0.0459144360578327\n",
+ "NumberOfTimes90DaysLate: 0.043856000455064506\n",
+ "NumberOfOpenCreditLinesAndLoans: -0.04309815541758128\n",
+ "NumberRealEstateLoansOrLines: 0.028029112770613533\n",
+ "RevolvingUtilizationOfUnsecuredLines: -0.009140646278000845\n",
+ "MonthlyIncome: -4.117863504515253e-05\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "/root/miniconda3/envs/ictp-ap/lib/python3.10/site-packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
+ "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
+ "\n",
+ "Increase the number of iterations (max_iter) or scale the data as shown in:\n",
+ " https://scikit-learn.org/stable/modules/preprocessing.html\n",
+ "Please also refer to the documentation for alternative solver options:\n",
+ " https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
+ " n_iter_i = _check_optimize_result(\n"
+ ]
+ }
+ ],
+ "source": [
+ "# 提示:fit建模,建完模之后可以取出coef属性\n",
+ "from sklearn.linear_model import LogisticRegression\n",
+ "\n",
+ "# 创建逻辑回归模型对象\n",
+ "model = LogisticRegression()\n",
+ "\n",
+ "# 使用训练数据拟合模型\n",
+ "model.fit(X_train, y_train)\n",
+ "\n",
+ "# 输出系数\n",
+ "coefficients = model.coef_\n",
+ "intercept = model.intercept_\n",
+ "\n",
+ "# 分析特征重要性\n",
+ "feature_importance = abs(coefficients[0])\n",
+ "sorted_indices = feature_importance.argsort()[::-1]\n",
+ "sorted_features = X_train.columns[sorted_indices]\n",
+ "\n",
+ "# 打印系数和特征重要性\n",
+ "print(\"Intercept:\", intercept)\n",
+ "print(\"Coefficients:\")\n",
+ "for feature, coef in zip(sorted_features, coefficients[0, sorted_indices]):\n",
+ " print(f\"{feature}: {coef}\")\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### 6.使用网格搜索交叉验证进行调参\n",
+ "调整penalty和C参数,其中penalty候选为\"l1\"和\"l2\",C的候选为[1,10,100,500]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 19,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "/root/miniconda3/envs/ictp-ap/lib/python3.10/site-packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
+ "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
+ "\n",
+ "Increase the number of iterations (max_iter) or scale the data as shown in:\n",
+ " https://scikit-learn.org/stable/modules/preprocessing.html\n",
+ "Please also refer to the documentation for alternative solver options:\n",
+ " https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
+ " n_iter_i = _check_optimize_result(\n",
+ "/root/miniconda3/envs/ictp-ap/lib/python3.10/site-packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
+ "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
+ "\n",
+ "Increase the number of iterations (max_iter) or scale the data as shown in:\n",
+ " https://scikit-learn.org/stable/modules/preprocessing.html\n",
+ "Please also refer to the documentation for alternative solver options:\n",
+ " https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
+ " n_iter_i = _check_optimize_result(\n",
+ "/root/miniconda3/envs/ictp-ap/lib/python3.10/site-packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
+ "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
+ "\n",
+ "Increase the number of iterations (max_iter) or scale the data as shown in:\n",
+ " https://scikit-learn.org/stable/modules/preprocessing.html\n",
+ "Please also refer to the documentation for alternative solver options:\n",
+ " https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
+ " n_iter_i = _check_optimize_result(\n",
+ "/root/miniconda3/envs/ictp-ap/lib/python3.10/site-packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
+ "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
+ "\n",
+ "Increase the number of iterations (max_iter) or scale the data as shown in:\n",
+ " https://scikit-learn.org/stable/modules/preprocessing.html\n",
+ "Please also refer to the documentation for alternative solver options:\n",
+ " https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
+ " n_iter_i = _check_optimize_result(\n",
+ "/root/miniconda3/envs/ictp-ap/lib/python3.10/site-packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
+ "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
+ "\n",
+ "Increase the number of iterations (max_iter) or scale the data as shown in:\n",
+ " https://scikit-learn.org/stable/modules/preprocessing.html\n",
+ "Please also refer to the documentation for alternative solver options:\n",
+ " https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
+ " n_iter_i = _check_optimize_result(\n",
+ "/root/miniconda3/envs/ictp-ap/lib/python3.10/site-packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
+ "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
+ "\n",
+ "Increase the number of iterations (max_iter) or scale the data as shown in:\n",
+ " https://scikit-learn.org/stable/modules/preprocessing.html\n",
+ "Please also refer to the documentation for alternative solver options:\n",
+ " https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
+ " n_iter_i = _check_optimize_result(\n",
+ "/root/miniconda3/envs/ictp-ap/lib/python3.10/site-packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
+ "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
+ "\n",
+ "Increase the number of iterations (max_iter) or scale the data as shown in:\n",
+ " https://scikit-learn.org/stable/modules/preprocessing.html\n",
+ "Please also refer to the documentation for alternative solver options:\n",
+ " https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
+ " n_iter_i = _check_optimize_result(\n",
+ "/root/miniconda3/envs/ictp-ap/lib/python3.10/site-packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
+ "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
+ "\n",
+ "Increase the number of iterations (max_iter) or scale the data as shown in:\n",
+ " https://scikit-learn.org/stable/modules/preprocessing.html\n",
+ "Please also refer to the documentation for alternative solver options:\n",
+ " https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
+ " n_iter_i = _check_optimize_result(\n",
+ "/root/miniconda3/envs/ictp-ap/lib/python3.10/site-packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
+ "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
+ "\n",
+ "Increase the number of iterations (max_iter) or scale the data as shown in:\n",
+ " https://scikit-learn.org/stable/modules/preprocessing.html\n",
+ "Please also refer to the documentation for alternative solver options:\n",
+ " https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
+ " n_iter_i = _check_optimize_result(\n",
+ "/root/miniconda3/envs/ictp-ap/lib/python3.10/site-packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
+ "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
+ "\n",
+ "Increase the number of iterations (max_iter) or scale the data as shown in:\n",
+ " https://scikit-learn.org/stable/modules/preprocessing.html\n",
+ "Please also refer to the documentation for alternative solver options:\n",
+ " https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
+ " n_iter_i = _check_optimize_result(\n",
+ "/root/miniconda3/envs/ictp-ap/lib/python3.10/site-packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
+ "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
+ "\n",
+ "Increase the number of iterations (max_iter) or scale the data as shown in:\n",
+ " https://scikit-learn.org/stable/modules/preprocessing.html\n",
+ "Please also refer to the documentation for alternative solver options:\n",
+ " https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
+ " n_iter_i = _check_optimize_result(\n",
+ "/root/miniconda3/envs/ictp-ap/lib/python3.10/site-packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
+ "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
+ "\n",
+ "Increase the number of iterations (max_iter) or scale the data as shown in:\n",
+ " https://scikit-learn.org/stable/modules/preprocessing.html\n",
+ "Please also refer to the documentation for alternative solver options:\n",
+ " https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
+ " n_iter_i = _check_optimize_result(\n",
+ "/root/miniconda3/envs/ictp-ap/lib/python3.10/site-packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
+ "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
+ "\n",
+ "Increase the number of iterations (max_iter) or scale the data as shown in:\n",
+ " https://scikit-learn.org/stable/modules/preprocessing.html\n",
+ "Please also refer to the documentation for alternative solver options:\n",
+ " https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
+ " n_iter_i = _check_optimize_result(\n",
+ "/root/miniconda3/envs/ictp-ap/lib/python3.10/site-packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
+ "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
+ "\n",
+ "Increase the number of iterations (max_iter) or scale the data as shown in:\n",
+ " https://scikit-learn.org/stable/modules/preprocessing.html\n",
+ "Please also refer to the documentation for alternative solver options:\n",
+ " https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
+ " n_iter_i = _check_optimize_result(\n",
+ "/root/miniconda3/envs/ictp-ap/lib/python3.10/site-packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
+ "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
+ "\n",
+ "Increase the number of iterations (max_iter) or scale the data as shown in:\n",
+ " https://scikit-learn.org/stable/modules/preprocessing.html\n",
+ "Please also refer to the documentation for alternative solver options:\n",
+ " https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
+ " n_iter_i = _check_optimize_result(\n",
+ "/root/miniconda3/envs/ictp-ap/lib/python3.10/site-packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
+ "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
+ "\n",
+ "Increase the number of iterations (max_iter) or scale the data as shown in:\n",
+ " https://scikit-learn.org/stable/modules/preprocessing.html\n",
+ "Please also refer to the documentation for alternative solver options:\n",
+ " https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
+ " n_iter_i = _check_optimize_result(\n",
+ "/root/miniconda3/envs/ictp-ap/lib/python3.10/site-packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
+ "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
+ "\n",
+ "Increase the number of iterations (max_iter) or scale the data as shown in:\n",
+ " https://scikit-learn.org/stable/modules/preprocessing.html\n",
+ "Please also refer to the documentation for alternative solver options:\n",
+ " https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
+ " n_iter_i = _check_optimize_result(\n",
+ "/root/miniconda3/envs/ictp-ap/lib/python3.10/site-packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
+ "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
+ "\n",
+ "Increase the number of iterations (max_iter) or scale the data as shown in:\n",
+ " https://scikit-learn.org/stable/modules/preprocessing.html\n",
+ "Please also refer to the documentation for alternative solver options:\n",
+ " https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
+ " n_iter_i = _check_optimize_result(\n",
+ "/root/miniconda3/envs/ictp-ap/lib/python3.10/site-packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
+ "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
+ "\n",
+ "Increase the number of iterations (max_iter) or scale the data as shown in:\n",
+ " https://scikit-learn.org/stable/modules/preprocessing.html\n",
+ "Please also refer to the documentation for alternative solver options:\n",
+ " https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
+ " n_iter_i = _check_optimize_result(\n",
+ "/root/miniconda3/envs/ictp-ap/lib/python3.10/site-packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
+ "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
+ "\n",
+ "Increase the number of iterations (max_iter) or scale the data as shown in:\n",
+ " https://scikit-learn.org/stable/modules/preprocessing.html\n",
+ "Please also refer to the documentation for alternative solver options:\n",
+ " https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
+ " n_iter_i = _check_optimize_result(\n",
+ "/root/miniconda3/envs/ictp-ap/lib/python3.10/site-packages/sklearn/model_selection/_validation.py:425: FitFailedWarning: \n",
+ "20 fits failed out of a total of 40.\n",
+ "The score on these train-test partitions for these parameters will be set to nan.\n",
+ "If these failures are not expected, you can try to debug them by setting error_score='raise'.\n",
+ "\n",
+ "Below are more details about the failures:\n",
+ "--------------------------------------------------------------------------------\n",
+ "20 fits failed with the following error:\n",
+ "Traceback (most recent call last):\n",
+ " File \"/root/miniconda3/envs/ictp-ap/lib/python3.10/site-packages/sklearn/model_selection/_validation.py\", line 729, in _fit_and_score\n",
+ " estimator.fit(X_train, y_train, **fit_params)\n",
+ " File \"/root/miniconda3/envs/ictp-ap/lib/python3.10/site-packages/sklearn/base.py\", line 1152, in wrapper\n",
+ " return fit_method(estimator, *args, **kwargs)\n",
+ " File \"/root/miniconda3/envs/ictp-ap/lib/python3.10/site-packages/sklearn/linear_model/_logistic.py\", line 1169, in fit\n",
+ " solver = _check_solver(self.solver, self.penalty, self.dual)\n",
+ " File \"/root/miniconda3/envs/ictp-ap/lib/python3.10/site-packages/sklearn/linear_model/_logistic.py\", line 56, in _check_solver\n",
+ " raise ValueError(\n",
+ "ValueError: Solver lbfgs supports only 'l2' or 'none' penalties, got l1 penalty.\n",
+ "\n",
+ " warnings.warn(some_fits_failed_message, FitFailedWarning)\n",
+ "/root/miniconda3/envs/ictp-ap/lib/python3.10/site-packages/sklearn/model_selection/_search.py:979: UserWarning: One or more of the test scores are non-finite: [ nan 0.93290228 nan 0.93292529 nan 0.9329483\n",
+ " nan 0.93293679]\n",
+ " warnings.warn(\n"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Best parameters: {'C': 100, 'penalty': 'l2'}\n",
+ "Best score: 0.9329482968361296\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "/root/miniconda3/envs/ictp-ap/lib/python3.10/site-packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
+ "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
+ "\n",
+ "Increase the number of iterations (max_iter) or scale the data as shown in:\n",
+ " https://scikit-learn.org/stable/modules/preprocessing.html\n",
+ "Please also refer to the documentation for alternative solver options:\n",
+ " https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
+ " n_iter_i = _check_optimize_result(\n"
+ ]
+ }
+ ],
+ "source": [
+ "# 提示:先按照上面要求准备好网格字典,再使用GridSearchCV进行调参\n",
+ "from sklearn.linear_model import LogisticRegression\n",
+ "from sklearn.model_selection import GridSearchCV\n",
+ "\n",
+ "# 创建逻辑回归模型对象\n",
+ "model = LogisticRegression()\n",
+ "\n",
+ "# 定义参数候选空间\n",
+ "param_grid = {\n",
+ " 'penalty': ['l1', 'l2'],\n",
+ " 'C': [1, 10, 100, 500]\n",
+ "}\n",
+ "\n",
+ "# 创建网格搜索交叉验证对象\n",
+ "grid_search = GridSearchCV(model, param_grid, cv=5)\n",
+ "\n",
+ "# 使用训练数据拟合模型\n",
+ "grid_search.fit(X_train, y_train)\n",
+ "\n",
+ "# 输出最佳参数和最佳得分\n",
+ "print(\"Best parameters:\", grid_search.best_params_)\n",
+ "print(\"Best score:\", grid_search.best_score_)\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 20,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Best parameters: {'C': 100, 'penalty': 'l2'}\n",
+ "Best model: LogisticRegression(C=100)\n"
+ ]
+ }
+ ],
+ "source": [
+ "# 输出最好的超参数\n",
+ "# 输出最好的模型\n",
+ "# 输出最佳超参数\n",
+ "best_params = grid_search.best_params_\n",
+ "print(\"Best parameters:\", best_params)\n",
+ "\n",
+ "# 输出最佳模型\n",
+ "best_model = grid_search.best_estimator_\n",
+ "print(\"Best model:\", best_model)\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### 7.在测试集上进行预测,计算 查准率/查全率/auc/混淆矩阵/f1值 等测试指标"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 21,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Precision: 0.5581395348837209\n",
+ "Recall: 0.016172506738544475\n",
+ "AUC: 0.6776398279597607\n",
+ "Confusion Matrix:\n",
+ "[[20227 19]\n",
+ " [ 1460 24]]\n",
+ "F1 Score: 0.031434184675834975\n"
+ ]
+ }
+ ],
+ "source": [
+ "# 提示:在测试集上预测可以使用predict\n",
+ "# 提示:各种指标可以在sklearn.metrics中查到各种评估指标,分别是accuracy_score、recall_score、auc、confusion_matrix、f1_score\n",
+ "from sklearn.metrics import precision_score, recall_score, roc_auc_score, confusion_matrix, f1_score\n",
+ "\n",
+ "# 在测试集上进行预测\n",
+ "y_pred = best_model.predict(X_test)\n",
+ "\n",
+ "# 计算查准率\n",
+ "precision = precision_score(y_test, y_pred)\n",
+ "print(\"Precision:\", precision)\n",
+ "\n",
+ "# 计算查全率\n",
+ "recall = recall_score(y_test, y_pred)\n",
+ "print(\"Recall:\", recall)\n",
+ "\n",
+ "# 计算AUC\n",
+ "y_pred_proba = best_model.predict_proba(X_test)[:, 1]\n",
+ "auc = roc_auc_score(y_test, y_pred_proba)\n",
+ "print(\"AUC:\", auc)\n",
+ "\n",
+ "# 计算混淆矩阵\n",
+ "confusion = confusion_matrix(y_test, y_pred)\n",
+ "print(\"Confusion Matrix:\")\n",
+ "print(confusion)\n",
+ "\n",
+ "# 计算F1值\n",
+ "f1 = f1_score(y_test, y_pred)\n",
+ "print(\"F1 Score:\", f1)\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### 8.更多优化\n",
+ "银行通常会有更严格的要求,因为欺诈带来的后果通常比较严重,一般我们会调整模型的标准。 \n",
+ "\n",
+ "比如在logistic regression当中,一般我们的概率判定边界为0.5,但是我们可以把阈值设定低一些,来提高模型的“敏感度” \n",
+ "试试看把阈值设定为0.3,再看看这个时候的混淆矩阵等评估指标。"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 22,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Threshold: 0.1\n",
+ "Confusion Matrix:\n",
+ "[[15428 4818]\n",
+ " [ 787 697]]\n",
+ "Precision: 0.12638259292837714\n",
+ "Recall: 0.4696765498652291\n",
+ "F1 Score: 0.19917131018716958\n",
+ "\n",
+ "Threshold: 0.2\n",
+ "Confusion Matrix:\n",
+ "[[19635 611]\n",
+ " [ 1250 234]]\n",
+ "Precision: 0.27692307692307694\n",
+ "Recall: 0.15768194070080863\n",
+ "F1 Score: 0.20094461142121078\n",
+ "\n",
+ "Threshold: 0.3\n",
+ "Confusion Matrix:\n",
+ "[[20147 99]\n",
+ " [ 1406 78]]\n",
+ "Precision: 0.4406779661016949\n",
+ "Recall: 0.05256064690026954\n",
+ "F1 Score: 0.09391932570740517\n",
+ "\n",
+ "Threshold: 0.4\n",
+ "Confusion Matrix:\n",
+ "[[20211 35]\n",
+ " [ 1451 33]]\n",
+ "Precision: 0.4852941176470588\n",
+ "Recall: 0.02223719676549865\n",
+ "F1 Score: 0.04252577319587628\n",
+ "\n",
+ "Threshold: 0.5\n",
+ "Confusion Matrix:\n",
+ "[[20227 19]\n",
+ " [ 1460 24]]\n",
+ "Precision: 0.5581395348837209\n",
+ "Recall: 0.016172506738544475\n",
+ "F1 Score: 0.031434184675834975\n",
+ "\n",
+ "Threshold: 0.6\n",
+ "Confusion Matrix:\n",
+ "[[20232 14]\n",
+ " [ 1469 15]]\n",
+ "Precision: 0.5172413793103449\n",
+ "Recall: 0.010107816711590296\n",
+ "F1 Score: 0.019828155981493716\n",
+ "\n",
+ "Threshold: 0.7\n",
+ "Confusion Matrix:\n",
+ "[[20237 9]\n",
+ " [ 1472 12]]\n",
+ "Precision: 0.5714285714285714\n",
+ "Recall: 0.008086253369272238\n",
+ "F1 Score: 0.015946843853820596\n",
+ "\n",
+ "Threshold: 0.8\n",
+ "Confusion Matrix:\n",
+ "[[20239 7]\n",
+ " [ 1478 6]]\n",
+ "Precision: 0.46153846153846156\n",
+ "Recall: 0.004043126684636119\n",
+ "F1 Score: 0.008016032064128256\n",
+ "\n",
+ "Threshold: 0.9\n",
+ "Confusion Matrix:\n",
+ "[[20246 0]\n",
+ " [ 1484 0]]\n",
+ "Precision: 0.0\n",
+ "Recall: 0.0\n",
+ "F1 Score: 0.0\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "/root/miniconda3/envs/ictp-ap/lib/python3.10/site-packages/sklearn/metrics/_classification.py:1471: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior.\n",
+ " _warn_prf(average, modifier, msg_start, len(result))\n"
+ ]
+ }
+ ],
+ "source": [
+ "# 提示:thresholds = [0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9]\n",
+ "# 根据predict_proba的结果和threshold的比较确定结果,再评估各种结果指标\n",
+ "from sklearn.metrics import confusion_matrix, precision_score, recall_score, f1_score\n",
+ "\n",
+ "# 定义阈值列表\n",
+ "thresholds = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]\n",
+ "\n",
+ "# 初始化结果字典\n",
+ "results = {}\n",
+ "\n",
+ "# 遍历阈值列表\n",
+ "for threshold in thresholds:\n",
+ " # 根据阈值确定预测结果\n",
+ " y_pred = (y_pred_proba > threshold).astype(int)\n",
+ " \n",
+ " # 计算混淆矩阵\n",
+ " confusion = confusion_matrix(y_test, y_pred)\n",
+ " \n",
+ " # 计算查准率\n",
+ " precision = precision_score(y_test, y_pred)\n",
+ " \n",
+ " # 计算查全率\n",
+ " recall = recall_score(y_test, y_pred)\n",
+ " \n",
+ " # 计算F1值\n",
+ " f1 = f1_score(y_test, y_pred)\n",
+ " \n",
+ " # 存储结果\n",
+ " results[threshold] = {\n",
+ " 'Confusion Matrix': confusion,\n",
+ " 'Precision': precision,\n",
+ " 'Recall': recall,\n",
+ " 'F1 Score': f1\n",
+ " }\n",
+ "\n",
+ "# 打印结果\n",
+ "for threshold, result in results.items():\n",
+ " print(\"Threshold:\", threshold)\n",
+ " print(\"Confusion Matrix:\")\n",
+ " print(result['Confusion Matrix'])\n",
+ " print(\"Precision:\", result['Precision'])\n",
+ " print(\"Recall:\", result['Recall'])\n",
+ " print(\"F1 Score:\", result['F1 Score'])\n",
+ " print()\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### 9.尝试对不同特征的重要度进行排序,通过特征选择的方式,对特征进行筛选。并重新建模,观察此时的模型准确率等评估指标。"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 27,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Feature Importance Ranking:\n",
+ "NumberOfTime60-89DaysPastDueNotWorse : 0.48487436893337343\n",
+ "NumberOfTime30-59DaysPastDueNotWorse : -7.52759672119782e-05\n",
+ "NumberOfTimes90DaysLate : -0.0399102786681723\n",
+ "Accuracy (New Model): 0.9329038196042337\n",
+ "Precision (New Model): 0.6160714285714286\n",
+ "Recall (New Model): 0.04649595687331536\n",
+ "F1 Score (New Model): 0.08646616541353384\n"
+ ]
+ }
+ ],
+ "source": [
+ "# 可以根据逻辑回归的系数绝对值大小进行排序,也可以基于树模型的特征重要度进行排序\n",
+ "# 特征选择可以使用RFE或者selectFromModel\n",
+ "from sklearn.metrics import accuracy_score\n",
+ "from sklearn.feature_selection import SelectFromModel\n",
+ "from sklearn.linear_model import LogisticRegression\n",
+ "\n",
+ "# 使用逻辑回归模型进行特征选择\n",
+ "selector = SelectFromModel(LogisticRegression(max_iter=1000))\n",
+ "selector.fit(X_train, y_train)\n",
+ "\n",
+ "# 获取选择的特征索引\n",
+ "selected_features = selector.get_support(indices=True)\n",
+ "\n",
+ "# 检查选择的特征数量是否大于0\n",
+ "if len(selected_features) == 0:\n",
+ " print(\"No features selected.\")\n",
+ "else:\n",
+ " # 获取选择的特征名称\n",
+ " selected_feature_names = X_train.columns[selected_features]\n",
+ "\n",
+ " # 获取选择的特征重要度排序\n",
+ " feature_importances = selector.estimator_.coef_[0]\n",
+ " sorted_indices = feature_importances.argsort()[::-1]\n",
+ "\n",
+ " # 输出特征重要度排序\n",
+ " print(\"Feature Importance Ranking:\")\n",
+ " for i in sorted_indices:\n",
+ " # 检查索引是否在选择的特征范围内\n",
+ " if i < len(selected_feature_names):\n",
+ " print(selected_feature_names[i], \":\", feature_importances[i])\n",
+ "\n",
+ " # 重新建模使用选择的特征\n",
+ " X_train_selected = X_train.iloc[:, selected_features]\n",
+ " X_test_selected = X_test.iloc[:, selected_features]\n",
+ "\n",
+ " # 在选择的特征上训练新的逻辑回归模型\n",
+ " new_model = LogisticRegression(max_iter=1000)\n",
+ " new_model.fit(X_train_selected, y_train)\n",
+ "\n",
+ " # 在测试集上进行预测和评估\n",
+ " y_pred_new = new_model.predict(X_test_selected)\n",
+ " accuracy_new = accuracy_score(y_test, y_pred_new)\n",
+ " precision_new = precision_score(y_test, y_pred_new)\n",
+ " recall_new = recall_score(y_test, y_pred_new)\n",
+ " f1_new = f1_score(y_test, y_pred_new)\n",
+ "\n",
+ " # 打印评估指标\n",
+ " print(\"Accuracy (New Model):\", accuracy_new)\n",
+ " print(\"Precision (New Model):\", precision_new)\n",
+ " print(\"Recall (New Model):\", recall_new)\n",
+ " print(\"F1 Score (New Model):\", f1_new)\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### 10.其他模型算法尝试\n",
+ "使用RandomForestClassifier/SVM/KNN等sklearn分类算法进行分类,尝试上述超参数调优算法过程。"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 30,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from sklearn.ensemble import RandomForestClassifier\n",
+ "from sklearn.svm import SVC\n",
+ "from sklearn.neighbors import KNeighborsClassifier\n",
+ "from sklearn.model_selection import GridSearchCV"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 31,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Random Forest - Best Parameters: {'max_depth': 5, 'min_samples_leaf': 1, 'min_samples_split': 5, 'n_estimators': 100}\n",
+ "Random Forest - Best Score: 0.9353298470034953\n"
+ ]
+ }
+ ],
+ "source": [
+ "# 随机森林\n",
+ "rf_model = RandomForestClassifier()\n",
+ "rf_param_grid = {\n",
+ " 'n_estimators': [100, 200],\n",
+ " 'max_depth': [None, 5],\n",
+ " 'min_samples_split': [2, 5],\n",
+ " 'min_samples_leaf': [1, 2]\n",
+ "}\n",
+ "rf_grid_search = GridSearchCV(rf_model, rf_param_grid, cv=3)\n",
+ "rf_grid_search.fit(X_train, y_train)\n",
+ "\n",
+ "# 输出最佳参数和对应的得分\n",
+ "print(\"Random Forest - Best Parameters:\", rf_grid_search.best_params_)\n",
+ "print(\"Random Forest - Best Score:\", rf_grid_search.best_score_)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 32,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# SVM\n",
+ "svm_model = SVC()\n",
+ "svm_param_grid = {\n",
+ " 'C': [1, 10],\n",
+ " 'kernel': ['linear', 'rbf'],\n",
+ " 'gamma': ['scale', 'auto']\n",
+ "}\n",
+ "svm_grid_search = GridSearchCV(svm_model, svm_param_grid, cv=3)\n",
+ "svm_grid_search.fit(X_train, y_train)\n",
+ "\n",
+ "# 输出最佳参数和对应的得分\n",
+ "print(\"SVM - Best Parameters:\", svm_grid_search.best_params_)\n",
+ "print(\"SVM - Best Score:\", svm_grid_search.best_score_)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# K最近邻\n",
+ "knn_model = KNeighborsClassifier()\n",
+ "knn_param_grid = {\n",
+ " 'n_neighbors': [3, 5],\n",
+ " 'weights': ['uniform', 'distance'],\n",
+ " 'p': [1, 2]\n",
+ "}\n",
+ "knn_grid_search = GridSearchCV(knn_model, knn_param_grid, cv=3)\n",
+ "knn_grid_search.fit(X_train, y_train)\n",
+ "\n",
+ "# 输出最佳参数和对应的得分\n",
+ "print(\"KNN - Best Parameters:\", knn_grid_search.best_params_)\n",
+ "print(\"KNN - Best Score:\", knn_grid_search.best_score_)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [],
+ "source": [
+ "# 随机森林\n",
+ "from sklearn.ensemble import RandomForestClassifier\n",
+ "# 支持向量机\n",
+ "from sklearn.svm import SVC\n",
+ "# K最近邻\n",
+ "from sklearn.neighbors import KNeighborsClassifier\n",
+ "\n"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "ictp-ap",
+ "language": "python",
+ "name": "ictp-ap"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.10.13"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/2023/homework/Yuhao_Dong-LZU/homework_credit_scoring_finetune_ensemble.ipynb b/2023/homework/Yuhao_Dong-LZU/homework_credit_scoring_finetune_ensemble.ipynb
new file mode 100644
index 00000000..aec85327
--- /dev/null
+++ b/2023/homework/Yuhao_Dong-LZU/homework_credit_scoring_finetune_ensemble.ipynb
@@ -0,0 +1,1937 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# 一起来打怪之 Credit Scoring 练习"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "-------\n",
+ "## >>>说明:\n",
+ "### 1. 答题步骤:\n",
+ "- 回答问题**请保留每一步**操作过程,请不要仅仅给出最后答案\n",
+ "- 请养成代码注释的好习惯\n",
+ "\n",
+ "### 2. 解题思路:\n",
+ "- 为方便大家准确理解题目,在习题实战中有所收获,本文档提供了解题思路提示\n",
+ "- 解题思路**仅供参考**,鼓励原创解题方法\n",
+ "- 为督促同学们自己思考,解题思路内容设置为**注释**,请注意查看\n",
+ "\n",
+ "### 3. 所用数据:\n",
+ "- 问题使用了多个数据库,请注意导入每个数据库后都先**查看和了解数据的基本性质**,后面的问题不再一一提醒"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "--------\n",
+ "## 操作题"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### 信用卡欺诈项目"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ " #### 前期数据导入,预览及处理(此部分勿修改,涉及的数据文件无需复制移动)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " SeriousDlqin2yrs \n",
+ " RevolvingUtilizationOfUnsecuredLines \n",
+ " age \n",
+ " NumberOfTime30-59DaysPastDueNotWorse \n",
+ " DebtRatio \n",
+ " MonthlyIncome \n",
+ " NumberOfOpenCreditLinesAndLoans \n",
+ " NumberOfTimes90DaysLate \n",
+ " NumberRealEstateLoansOrLines \n",
+ " NumberOfTime60-89DaysPastDueNotWorse \n",
+ " NumberOfDependents \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 1 \n",
+ " 0.766127 \n",
+ " 45.0 \n",
+ " 2.0 \n",
+ " 0.802982 \n",
+ " 9120.0 \n",
+ " 13.0 \n",
+ " 0.0 \n",
+ " 6.0 \n",
+ " 0.0 \n",
+ " 2.0 \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 0 \n",
+ " 0.957151 \n",
+ " 40.0 \n",
+ " 0.0 \n",
+ " 0.121876 \n",
+ " 2600.0 \n",
+ " 4.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 0 \n",
+ " 0.658180 \n",
+ " 38.0 \n",
+ " 1.0 \n",
+ " 0.085113 \n",
+ " 3042.0 \n",
+ " 2.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 0 \n",
+ " 0.233810 \n",
+ " 30.0 \n",
+ " 0.0 \n",
+ " 0.036050 \n",
+ " 3300.0 \n",
+ " 5.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " 0 \n",
+ " 0.907239 \n",
+ " 49.0 \n",
+ " 1.0 \n",
+ " 0.024926 \n",
+ " 63588.0 \n",
+ " 7.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " SeriousDlqin2yrs RevolvingUtilizationOfUnsecuredLines age \\\n",
+ "0 1 0.766127 45.0 \n",
+ "1 0 0.957151 40.0 \n",
+ "2 0 0.658180 38.0 \n",
+ "3 0 0.233810 30.0 \n",
+ "4 0 0.907239 49.0 \n",
+ "\n",
+ " NumberOfTime30-59DaysPastDueNotWorse DebtRatio MonthlyIncome \\\n",
+ "0 2.0 0.802982 9120.0 \n",
+ "1 0.0 0.121876 2600.0 \n",
+ "2 1.0 0.085113 3042.0 \n",
+ "3 0.0 0.036050 3300.0 \n",
+ "4 1.0 0.024926 63588.0 \n",
+ "\n",
+ " NumberOfOpenCreditLinesAndLoans NumberOfTimes90DaysLate \\\n",
+ "0 13.0 0.0 \n",
+ "1 4.0 0.0 \n",
+ "2 2.0 1.0 \n",
+ "3 5.0 0.0 \n",
+ "4 7.0 0.0 \n",
+ "\n",
+ " NumberRealEstateLoansOrLines NumberOfTime60-89DaysPastDueNotWorse \\\n",
+ "0 6.0 0.0 \n",
+ "1 0.0 0.0 \n",
+ "2 0.0 0.0 \n",
+ "3 0.0 0.0 \n",
+ "4 1.0 0.0 \n",
+ "\n",
+ " NumberOfDependents \n",
+ "0 2.0 \n",
+ "1 1.0 \n",
+ "2 0.0 \n",
+ "3 0.0 \n",
+ "4 0.0 "
+ ]
+ },
+ "execution_count": 1,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "import pandas as pd\n",
+ "pd.set_option('display.max_columns', 500)\n",
+ "import zipfile\n",
+ "import numpy as np\n",
+ "import matplotlib\n",
+ "import matplotlib.pyplot as plt\n",
+ "matplotlib.rc(\"font\",family=['YouYuan', 'Times New Roman']) #设置中文图例\n",
+ "with zipfile.ZipFile('KaggleCredit2.csv.zip', 'r') as z:\n",
+ " f = z.open('KaggleCredit2.csv')\n",
+ " data = pd.read_csv(f, index_col=0)\n",
+ "data.head()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(112915, 11)"
+ ]
+ },
+ "execution_count": 2,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# 检查数据维度\n",
+ "data.shape"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "SeriousDlqin2yrs 0\n",
+ "RevolvingUtilizationOfUnsecuredLines 0\n",
+ "age 4267\n",
+ "NumberOfTime30-59DaysPastDueNotWorse 0\n",
+ "DebtRatio 0\n",
+ "MonthlyIncome 0\n",
+ "NumberOfOpenCreditLinesAndLoans 0\n",
+ "NumberOfTimes90DaysLate 0\n",
+ "NumberRealEstateLoansOrLines 0\n",
+ "NumberOfTime60-89DaysPastDueNotWorse 0\n",
+ "NumberOfDependents 4267\n",
+ "dtype: int64"
+ ]
+ },
+ "execution_count": 3,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# 查看数据缺失值情况\n",
+ "data.isnull().sum(axis=0)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "C:\\Users\\Logic\\AppData\\Local\\Temp\\ipykernel_30524\\2980780030.py:3: UserWarning: Pandas doesn't allow columns to be created via a new attribute name - see https://pandas.pydata.org/pandas-docs/stable/indexing.html#attribute-access\n",
+ " data.shapey = data['SeriousDlqin2yrs']\n"
+ ]
+ }
+ ],
+ "source": [
+ "# 清除缺失值\n",
+ "data.dropna(inplace=True)\n",
+ "data.shapey = data['SeriousDlqin2yrs']\n",
+ "X = data.drop('SeriousDlqin2yrs', axis=1)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0.06742876076872101"
+ ]
+ },
+ "execution_count": 5,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# 取出对应的X和y\n",
+ "y = data['SeriousDlqin2yrs']\n",
+ "X = data.drop('SeriousDlqin2yrs', axis=1)\n",
+ "# 查看平均的欺诈率\n",
+ "y.mean()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### 以下为操作题"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### 1.把数据切分成训练集和测试集"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "((76053, 10), (32595, 10), (76053,), (32595,))"
+ ]
+ },
+ "execution_count": 6,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# 提示:查看train_test_split函数\n",
+ "from sklearn.model_selection import train_test_split\n",
+ "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, shuffle=True, random_state=0)\n",
+ "\n",
+ "# 查看数据的维度\n",
+ "X_train.shape, X_test.shape, y_train.shape, y_test.shape"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "image/png": "",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "# 通过SeriousDlqin2yrs字段查看正负样本分布情况\n",
+ "# 提示:value_counts\n",
+ "Positive_Negative = data['SeriousDlqin2yrs'].value_counts()\n",
+ "Positive_Negative\n",
+ "\n",
+ "# 绘制两种类别的柱状图\n",
+ "# 提示:dataframe可以直接plot(kind='bar')\n",
+ "plt.figure(figsize=(5, 3), dpi=200) #设置图片大小\n",
+ "Positive_Negative.plot(kind = 'bar', color='skyblue')\n",
+ "plt.title('Distribution of SeriousDlqin2yrs')\n",
+ "plt.xlabel('SeriousDlqin2yrs')\n",
+ "plt.ylabel('样本量')\n",
+ "plt.xticks(ticks=[0, 1], labels=['未发生严重逾期', '发生严重逾期'], rotation=0)\n",
+ "plt.show()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### 2.数据预处理之离散化"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "age\n",
+ "(0, 3] 0\n",
+ "(3, 6] 0\n",
+ "(6, 9] 0\n",
+ "(9, 12] 0\n",
+ "(12, 15] 0\n",
+ "(15, 18] 0\n",
+ "(18, 21] 104\n",
+ "(21, 24] 1313\n",
+ "(24, 27] 2620\n",
+ "(27, 30] 4083\n",
+ "(30, 33] 5053\n",
+ "(33, 36] 5272\n",
+ "(36, 39] 6288\n",
+ "(39, 42] 7085\n",
+ "(42, 45] 7697\n",
+ "(45, 48] 8443\n",
+ "(48, 51] 8438\n",
+ "(51, 54] 7960\n",
+ "(54, 57] 7490\n",
+ "(57, 60] 7049\n",
+ "(60, 63] 7467\n",
+ "(63, 66] 5524\n",
+ "(66, 69] 4439\n",
+ "(69, 72] 3334\n",
+ "(72, 75] 2708\n",
+ "(75, 78] 2166\n",
+ "(78, 81] 1718\n",
+ "(81, 84] 990\n",
+ "(84, 87] 725\n",
+ "(87, 90] 446\n",
+ "(90, 93] 173\n",
+ "(93, 96] 43\n",
+ "(96, 99] 15\n",
+ "(99, 102] 3\n",
+ "(102, 105] 1\n",
+ "Name: count, dtype: int64"
+ ]
+ },
+ "execution_count": 8,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# 请对年龄按照3岁一个区间进行离散化\n",
+ "# 提示:可以先计算出分桶边界,再基于pandas的cut函数进行离散化(分箱、分桶)\n",
+ "\n",
+ "# 计算年龄的分桶边界\n",
+ "age_min = data['age'].min()\n",
+ "age_max = data['age'].max()\n",
+ "bins = list(range(int(age_min), int(age_max) + 3, 3)) # 每3岁一个区间\n",
+ "\n",
+ "# 使用pandas的cut函数进行离散化\n",
+ "# data['age_bins'] = pd.cut(data['age'], bins=bins)\n",
+ "# X_train['age_bins'] = pd.cut(X_train['age'], bins=bins)\n",
+ "# X_test['age_bins'] = pd.cut(X_test['age'], bins=bins)\n",
+ "\n",
+ "data['age'] = pd.cut(data['age'], bins=bins)\n",
+ "X_train['age'] = pd.cut(X_train['age'], bins=bins)\n",
+ "X_test['age'] = pd.cut(X_test['age'], bins=bins)\n",
+ "\n",
+ "# 查看离散化后的结果\n",
+ "# data['age_bins'].value_counts().sort_index()\n",
+ "data['age'].value_counts().sort_index()\n",
+ "\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### 3.数据预处理之独热向量编码"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " SeriousDlqin2yrs \n",
+ " RevolvingUtilizationOfUnsecuredLines \n",
+ " NumberOfTime30-59DaysPastDueNotWorse \n",
+ " DebtRatio \n",
+ " MonthlyIncome \n",
+ " NumberOfOpenCreditLinesAndLoans \n",
+ " NumberOfTimes90DaysLate \n",
+ " NumberRealEstateLoansOrLines \n",
+ " NumberOfTime60-89DaysPastDueNotWorse \n",
+ " NumberOfDependents \n",
+ " age_(0, 3] \n",
+ " age_(3, 6] \n",
+ " age_(6, 9] \n",
+ " age_(9, 12] \n",
+ " age_(12, 15] \n",
+ " age_(15, 18] \n",
+ " age_(18, 21] \n",
+ " age_(21, 24] \n",
+ " age_(24, 27] \n",
+ " age_(27, 30] \n",
+ " age_(30, 33] \n",
+ " age_(33, 36] \n",
+ " age_(36, 39] \n",
+ " age_(39, 42] \n",
+ " age_(42, 45] \n",
+ " age_(45, 48] \n",
+ " age_(48, 51] \n",
+ " age_(51, 54] \n",
+ " age_(54, 57] \n",
+ " age_(57, 60] \n",
+ " age_(60, 63] \n",
+ " age_(63, 66] \n",
+ " age_(66, 69] \n",
+ " age_(69, 72] \n",
+ " age_(72, 75] \n",
+ " age_(75, 78] \n",
+ " age_(78, 81] \n",
+ " age_(81, 84] \n",
+ " age_(84, 87] \n",
+ " age_(87, 90] \n",
+ " age_(90, 93] \n",
+ " age_(93, 96] \n",
+ " age_(96, 99] \n",
+ " age_(99, 102] \n",
+ " age_(102, 105] \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 1 \n",
+ " 0.766127 \n",
+ " 2.0 \n",
+ " 0.802982 \n",
+ " 9120.0 \n",
+ " 13.0 \n",
+ " 0.0 \n",
+ " 6.0 \n",
+ " 0.0 \n",
+ " 2.0 \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " True \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 0 \n",
+ " 0.957151 \n",
+ " 0.0 \n",
+ " 0.121876 \n",
+ " 2600.0 \n",
+ " 4.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " True \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 0 \n",
+ " 0.658180 \n",
+ " 1.0 \n",
+ " 0.085113 \n",
+ " 3042.0 \n",
+ " 2.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " True \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 0 \n",
+ " 0.233810 \n",
+ " 0.0 \n",
+ " 0.036050 \n",
+ " 3300.0 \n",
+ " 5.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " True \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " 0 \n",
+ " 0.907239 \n",
+ " 1.0 \n",
+ " 0.024926 \n",
+ " 63588.0 \n",
+ " 7.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " True \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " False \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " SeriousDlqin2yrs RevolvingUtilizationOfUnsecuredLines \\\n",
+ "0 1 0.766127 \n",
+ "1 0 0.957151 \n",
+ "2 0 0.658180 \n",
+ "3 0 0.233810 \n",
+ "4 0 0.907239 \n",
+ "\n",
+ " NumberOfTime30-59DaysPastDueNotWorse DebtRatio MonthlyIncome \\\n",
+ "0 2.0 0.802982 9120.0 \n",
+ "1 0.0 0.121876 2600.0 \n",
+ "2 1.0 0.085113 3042.0 \n",
+ "3 0.0 0.036050 3300.0 \n",
+ "4 1.0 0.024926 63588.0 \n",
+ "\n",
+ " NumberOfOpenCreditLinesAndLoans NumberOfTimes90DaysLate \\\n",
+ "0 13.0 0.0 \n",
+ "1 4.0 0.0 \n",
+ "2 2.0 1.0 \n",
+ "3 5.0 0.0 \n",
+ "4 7.0 0.0 \n",
+ "\n",
+ " NumberRealEstateLoansOrLines NumberOfTime60-89DaysPastDueNotWorse \\\n",
+ "0 6.0 0.0 \n",
+ "1 0.0 0.0 \n",
+ "2 0.0 0.0 \n",
+ "3 0.0 0.0 \n",
+ "4 1.0 0.0 \n",
+ "\n",
+ " NumberOfDependents age_(0, 3] age_(3, 6] age_(6, 9] age_(9, 12] \\\n",
+ "0 2.0 False False False False \n",
+ "1 1.0 False False False False \n",
+ "2 0.0 False False False False \n",
+ "3 0.0 False False False False \n",
+ "4 0.0 False False False False \n",
+ "\n",
+ " age_(12, 15] age_(15, 18] age_(18, 21] age_(21, 24] age_(24, 27] \\\n",
+ "0 False False False False False \n",
+ "1 False False False False False \n",
+ "2 False False False False False \n",
+ "3 False False False False False \n",
+ "4 False False False False False \n",
+ "\n",
+ " age_(27, 30] age_(30, 33] age_(33, 36] age_(36, 39] age_(39, 42] \\\n",
+ "0 False False False False False \n",
+ "1 False False False False True \n",
+ "2 False False False True False \n",
+ "3 True False False False False \n",
+ "4 False False False False False \n",
+ "\n",
+ " age_(42, 45] age_(45, 48] age_(48, 51] age_(51, 54] age_(54, 57] \\\n",
+ "0 True False False False False \n",
+ "1 False False False False False \n",
+ "2 False False False False False \n",
+ "3 False False False False False \n",
+ "4 False False True False False \n",
+ "\n",
+ " age_(57, 60] age_(60, 63] age_(63, 66] age_(66, 69] age_(69, 72] \\\n",
+ "0 False False False False False \n",
+ "1 False False False False False \n",
+ "2 False False False False False \n",
+ "3 False False False False False \n",
+ "4 False False False False False \n",
+ "\n",
+ " age_(72, 75] age_(75, 78] age_(78, 81] age_(81, 84] age_(84, 87] \\\n",
+ "0 False False False False False \n",
+ "1 False False False False False \n",
+ "2 False False False False False \n",
+ "3 False False False False False \n",
+ "4 False False False False False \n",
+ "\n",
+ " age_(87, 90] age_(90, 93] age_(93, 96] age_(96, 99] age_(99, 102] \\\n",
+ "0 False False False False False \n",
+ "1 False False False False False \n",
+ "2 False False False False False \n",
+ "3 False False False False False \n",
+ "4 False False False False False \n",
+ "\n",
+ " age_(102, 105] \n",
+ "0 False \n",
+ "1 False \n",
+ "2 False \n",
+ "3 False \n",
+ "4 False "
+ ]
+ },
+ "execution_count": 9,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# 请对上述分箱后的年龄段进行独热向量编码\n",
+ "# 提示:使用pandas的get_dummies完成\n",
+ "\n",
+ "# 对分箱后的年龄段进行独热向量编码\n",
+ "# data_age_bins_dummies = pd.get_dummies(data, columns=['age_bins'], prefix='Age')\n",
+ "# X_train_age_bins_dummies = pd.get_dummies(X_train, columns=['age_bins'], prefix='Age')\n",
+ "# X_test_age_bins_dummies = pd.get_dummies(X_test, columns=['age_bins'], prefix='Age')\n",
+ "\n",
+ "# data = pd.get_dummies(data, columns=['age_bins'], prefix='Age')\n",
+ "# X_train = pd.get_dummies(X_train, columns=['age_bins'], prefix='Age')\n",
+ "# X_test = pd.get_dummies(X_test, columns=['age_bins'], prefix='Age')\n",
+ "\n",
+ "data = pd.get_dummies(data, columns=['age'], prefix_sep='_',dummy_na=False,drop_first=False)\n",
+ "X_train = pd.get_dummies(X_train, columns=['age'], prefix_sep='_',dummy_na=False,drop_first=False)\n",
+ "X_test = pd.get_dummies(X_test, columns=['age'], prefix_sep='_',dummy_na=False,drop_first=False)\n",
+ "\n",
+ "# 展示编码后的部分数据\n",
+ "# age_bins_dummies.head()\n",
+ "data.head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### 4.数据预处理之幅度缩放"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [],
+ "source": [
+ "# 请对连续值特征进行幅度缩放\n",
+ "# 提示:可以使用StandardScaler等幅度缩放器进行处理\n",
+ "from sklearn.preprocessing import StandardScaler\n",
+ "\n",
+ "# 选择连续值特征进行幅度缩放\n",
+ "continuous_features = ['RevolvingUtilizationOfUnsecuredLines', 'DebtRatio', 'MonthlyIncome','NumberOfTime30-59DaysPastDueNotWorse','NumberOfTimes90DaysLate',\n",
+ " 'NumberOfOpenCreditLinesAndLoans', 'NumberRealEstateLoansOrLines','NumberOfTime60-89DaysPastDueNotWorse']\n",
+ "\n",
+ "# 初始化幅度缩放器\n",
+ "sc = StandardScaler()\n",
+ "\n",
+ "# 对连续值特征进行幅度缩放\n",
+ "# data_scaled = data.copy()\n",
+ "# data_scaled[continuous_features] = sc.fit_transform(data[continuous_features])\n",
+ "# data_scaled = sc.fit_transform(data)\n",
+ "\n",
+ "# X_train_std = sc.fit_transform(X_train)\n",
+ "# X_test_std = sc.fit_transform(X_test)\n",
+ "\n",
+ "\n",
+ "X_train = sc.fit_transform(X_train[continuous_features])\n",
+ "X_test = sc.fit_transform(X_test[continuous_features])\n",
+ "X_train_std = X_train\n",
+ "X_test_std = X_test\n",
+ "\n",
+ "# 展示缩放后的部分数据\n",
+ "# data_scaled.head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### 5.使用logistic regression建模,并且输出一下系数,分析重要度。 "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " Feature \n",
+ " Coefficient \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " RevolvingUtilizationOfUnsecuredLines \n",
+ " -0.011995 \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " DebtRatio \n",
+ " 0.360023 \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " MonthlyIncome \n",
+ " -0.097519 \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " NumberOfTime30-59DaysPastDueNotWorse \n",
+ " 1.778568 \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " NumberOfTimes90DaysLate \n",
+ " 1.736049 \n",
+ " \n",
+ " \n",
+ " 5 \n",
+ " NumberOfOpenCreditLinesAndLoans \n",
+ " -0.180605 \n",
+ " \n",
+ " \n",
+ " 6 \n",
+ " NumberRealEstateLoansOrLines \n",
+ " -0.228602 \n",
+ " \n",
+ " \n",
+ " 7 \n",
+ " NumberOfTime60-89DaysPastDueNotWorse \n",
+ " -3.339019 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " Feature Coefficient\n",
+ "0 RevolvingUtilizationOfUnsecuredLines -0.011995\n",
+ "1 DebtRatio 0.360023\n",
+ "2 MonthlyIncome -0.097519\n",
+ "3 NumberOfTime30-59DaysPastDueNotWorse 1.778568\n",
+ "4 NumberOfTimes90DaysLate 1.736049\n",
+ "5 NumberOfOpenCreditLinesAndLoans -0.180605\n",
+ "6 NumberRealEstateLoansOrLines -0.228602\n",
+ "7 NumberOfTime60-89DaysPastDueNotWorse -3.339019"
+ ]
+ },
+ "execution_count": 11,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# 提示:fit建模,建完模之后可以取出coef属性\n",
+ "from sklearn.linear_model import LogisticRegression\n",
+ "\n",
+ "# X_scaled = data_scaled[continuous_features]\n",
+ "# y = data['SeriousDlqin2yrs']\n",
+ "\n",
+ "# 划分数据处理后的训练集和测试集\n",
+ "# X_train_std, X_test_std, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=0)\n",
+ "\n",
+ "# 初始化逻辑回归模型\n",
+ "# lr = LogisticRegression()\n",
+ "lr = LogisticRegression(solver='liblinear',max_iter=100) # 使用liblinear求解器,因为它支持L1正则化\n",
+ "\n",
+ "# 训练模型\n",
+ "lr.fit(X_train_std, y_train)\n",
+ "\n",
+ "# 获取模型的系数\n",
+ "coefficients = lr.coef_[0]\n",
+ "\n",
+ "# 创建系数和特征名称的DataFrame\n",
+ "feature_coefficients = pd.DataFrame({\n",
+ " # 'Feature': X_train.columns[0:],\n",
+ " 'Feature': continuous_features,\n",
+ " 'Coefficient': coefficients\n",
+ "})\n",
+ "\n",
+ "feature_coefficients\n",
+ "# coefficients"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### 6.使用网格搜索交叉验证进行调参\n",
+ "调整penalty和C参数,其中penalty候选为\"l1\"和\"l2\",C的候选为[1,10,100,500]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "GridSearchCV(cv=5, estimator=LogisticRegression(solver='liblinear'),\n",
+ " param_grid={'C': [1, 10, 100, 500], 'penalty': ['l1', 'l2']}) In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org. "
+ ],
+ "text/plain": [
+ "GridSearchCV(cv=5, estimator=LogisticRegression(solver='liblinear'),\n",
+ " param_grid={'C': [1, 10, 100, 500], 'penalty': ['l1', 'l2']})"
+ ]
+ },
+ "execution_count": 12,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# 提示:先按照上面要求准备好网格字典,再使用GridSearchCV进行调参\n",
+ "\n",
+ "from sklearn.model_selection import GridSearchCV\n",
+ "\n",
+ "# 按照要求准备网格字典\n",
+ "param_grid = {\n",
+ " 'penalty': ['l1', 'l2'],\n",
+ " 'C': [1, 10, 100, 500]\n",
+ "}\n",
+ "\n",
+ "# 初始化逻辑回归模型\n",
+ "# lr = LogisticRegression(solver='liblinear',max_iter=100) # 使用liblinear求解器,因为它支持L1正则化\n",
+ "\n",
+ "# 初始化网格搜索\n",
+ "# grid_search = GridSearchCV(lr, param_grid, cv=5, scoring='accuracy', verbose=1)\n",
+ "grid_search = GridSearchCV(lr, param_grid,cv = 5)\n",
+ "\n",
+ "# 执行网格搜索\n",
+ "grid_search.fit(X_train_std, y_train)\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 13,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "最好超参数: {'C': 10, 'penalty': 'l1'}\n",
+ "最好评分: 0.9332702219763709\n",
+ "最好模型: LogisticRegression(C=10, penalty='l1', solver='liblinear')\n"
+ ]
+ }
+ ],
+ "source": [
+ "# 输出最好的超参数\n",
+ "# 输出最好的模型\n",
+ "\n",
+ "best_params = grid_search.best_params_\n",
+ "best_score = grid_search.best_score_\n",
+ "best_model = grid_search.best_estimator_\n",
+ "\n",
+ "best_params, best_score, best_model\n",
+ "\n",
+ "print(\"最好超参数:\", best_params)\n",
+ "print(\"最好评分:\", best_score)\n",
+ "print(\"最好模型:\", best_model)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### 7.在测试集上进行预测,计算 查准率/查全率/auc/混淆矩阵/f1值 等测试指标"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 14,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "accuracy:\t 0.9337628470624328\n",
+ "recall:\t 0.5209430120321262\n",
+ "f1:\t 0.08244793880152997\n",
+ "conf_matrix:\n",
+ " [[30339 85]\n",
+ " [ 2074 97]]\n",
+ "auc:\t 0.6782203584699369\n"
+ ]
+ }
+ ],
+ "source": [
+ "# 提示:在测试集上预测可以使用predict\n",
+ "# 提示:各种指标可以在sklearn.metrics中查到各种评估指标,分别是accuracy_score、recall_score、auc、confusion_matrix、f1_score\n",
+ "\n",
+ "from sklearn.metrics import accuracy_score, recall_score, roc_auc_score, confusion_matrix, f1_score\n",
+ "\n",
+ "y_pred = best_model.predict(X_test_std)\n",
+ "\n",
+ "# 计算各种测试指标\n",
+ "accuracy = accuracy_score(y_test, y_pred)\n",
+ "recall = recall_score(y_test, y_pred,average='macro')\n",
+ "f1 = f1_score(y_test, y_pred)\n",
+ "conf_matrix = confusion_matrix(y_test, y_pred)\n",
+ "\n",
+ "# 计算auc\n",
+ "y_pred_proba = best_model.predict_proba(X_test_std)[:, 1]\n",
+ "auc = roc_auc_score(y_test, y_pred_proba)\n",
+ "\n",
+ "accuracy, recall, auc, conf_matrix, f1\n",
+ "\n",
+ "print(\"accuracy:\\t\", accuracy)\n",
+ "print(\"recall:\\t\", recall)\n",
+ "print(\"f1:\\t\", f1)\n",
+ "print(\"conf_matrix:\\n\", conf_matrix)\n",
+ "print(\"auc:\\t\", auc)\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### 8.更多优化\n",
+ "银行通常会有更严格的要求,因为欺诈带来的后果通常比较严重,一般我们会调整模型的标准。 \n",
+ "\n",
+ "比如在logistic regression当中,一般我们的概率判定边界为0.5,但是我们可以把阈值设定低一些,来提高模型的“敏感度” \n",
+ "试试看把阈值设定为0.3,再看看这个时候的混淆矩阵等评估指标。"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 15,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "0.8712685994784476\n",
+ "0.6495828101478226\n",
+ "[[27544 2880]\n",
+ " [ 1316 855]]\n",
+ "0.2895360650186251\n",
+ "[[27544 2880]\n",
+ " [ 1316 855]]\n",
+ "\n",
+ "\n",
+ "0.9283018867924528\n",
+ "0.5800412136143579\n",
+ "[[29871 553]\n",
+ " [ 1784 387]]\n",
+ "0.24879459980713595\n",
+ "[[29871 553]\n",
+ " [ 1784 387]]\n",
+ "\n",
+ "\n",
+ "0.9335480901978831\n",
+ "0.5497009909265794\n",
+ "[[30197 227]\n",
+ " [ 1939 232]]\n",
+ "0.176425855513308\n",
+ "[[30197 227]\n",
+ " [ 1939 232]]\n",
+ "\n",
+ "\n",
+ "0.9340696425832183\n",
+ "0.5324426896121792\n",
+ "[[30296 128]\n",
+ " [ 2021 150]]\n",
+ "0.12249897917517355\n",
+ "[[30296 128]\n",
+ " [ 2021 150]]\n",
+ "\n",
+ "\n",
+ "0.9337628470624328\n",
+ "0.5209430120321262\n",
+ "[[30339 85]\n",
+ " [ 2074 97]]\n",
+ "0.08244793880152997\n",
+ "[[30339 85]\n",
+ " [ 2074 97]]\n",
+ "\n",
+ "\n",
+ "0.9341003221352968\n",
+ "0.5159908090935991\n",
+ "[[30374 50]\n",
+ " [ 2098 73]]\n",
+ "0.06364428945074106\n",
+ "[[30374 50]\n",
+ " [ 2098 73]]\n",
+ "\n",
+ "\n",
+ "0.9337935266145114\n",
+ "0.5106934838831813\n",
+ "[[30388 36]\n",
+ " [ 2122 49]]\n",
+ "0.04343971631205674\n",
+ "[[30388 36]\n",
+ " [ 2122 49]]\n",
+ "\n",
+ "\n",
+ "0.9333946924374904\n",
+ "0.5029942390749963\n",
+ "[[30410 14]\n",
+ " [ 2157 14]]\n",
+ "0.01273306048203729\n",
+ "[[30410 14]\n",
+ " [ 2157 14]]\n",
+ "\n",
+ "\n",
+ "0.9333640128854118\n",
+ "0.5010529367043134\n",
+ "[[30418 6]\n",
+ " [ 2166 5]]\n",
+ "0.00458295142071494\n",
+ "[[30418 6]\n",
+ " [ 2166 5]]\n",
+ "\n",
+ "\n"
+ ]
+ }
+ ],
+ "source": [
+ "# 提示:thresholds = [0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9]\n",
+ "# 根据predict_proba的结果和threshold的比较确定结果,再评估各种结果指标\n",
+ "\n",
+ "from sklearn.metrics import classification_report, confusion_matrix\n",
+ "\n",
+ "y_pred_proba = best_model.predict_proba(X_test_std)\n",
+ "\n",
+ "# 设置阈值为0.3\n",
+ "threshold = [0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9]\n",
+ "\n",
+ "# 根据阈值判断最终结果\n",
+ "for i in range(len(threshold)):\n",
+ " y_pred_adjusted = (y_pred_proba[:, 1] > threshold[i]).astype(int)\n",
+ "\n",
+ " # 重新计算性能评估指标\n",
+ " # classification_report_adjusted = classification_report(y_test, y_pred_adjusted)\n",
+ " confusion_matrix_adjusted = confusion_matrix(y_test, y_pred_adjusted)\n",
+ "\n",
+ " # 计算各种测试指标\n",
+ " accuracy = accuracy_score(y_test, y_pred_adjusted)\n",
+ " recall = recall_score(y_test, y_pred_adjusted,average='macro')\n",
+ " f1 = f1_score(y_test, y_pred_adjusted)\n",
+ " conf_matrix = confusion_matrix(y_test, y_pred_adjusted)\n",
+ "\n",
+ " print(accuracy)\n",
+ " print(recall)\n",
+ " print(conf_matrix)\n",
+ " print(f1)\n",
+ "\n",
+ " # 分类报告 \n",
+ " # print(classification_report_adjusted)\n",
+ " # 混淆矩阵\n",
+ " print(confusion_matrix_adjusted)\n",
+ " print(\"\\n\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### 9.尝试对不同特征的重要度进行排序,通过特征选择的方式,对特征进行筛选。并重新建模,观察此时的模型准确率等评估指标。"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 24,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ " Feature Coefficient Absolute_Coefficient\n",
+ "7 NumberOfTime60-89DaysPastDueNotWorse -3.375595 3.375595\n",
+ "3 NumberOfTime30-59DaysPastDueNotWorse 1.790722 1.790722\n",
+ "4 NumberOfTimes90DaysLate 1.760637 1.760637\n",
+ "1 DebtRatio 0.360080 0.360080\n",
+ "6 NumberRealEstateLoansOrLines -0.228471 0.228471\n",
+ "5 NumberOfOpenCreditLinesAndLoans -0.180534 0.180534\n",
+ "2 MonthlyIncome -0.097630 0.097630\n",
+ "0 RevolvingUtilizationOfUnsecuredLines -0.011943 0.011943\n",
+ "\n",
+ "\n",
+ "Selected features: ['NumberOfTime30-59DaysPastDueNotWorse', 'NumberOfTimes90DaysLate', 'NumberOfTime60-89DaysPastDueNotWorse']\n",
+ "Accuracy: 0.9347826086956522\n",
+ "Recall: 0.5192533861174952\n",
+ "\n",
+ "\n",
+ " precision recall f1-score support\n",
+ "\n",
+ " 0 0.94 1.00 0.97 21309\n",
+ " 1 0.59 0.04 0.08 1507\n",
+ "\n",
+ " accuracy 0.93 22816\n",
+ " macro avg 0.76 0.52 0.52 22816\n",
+ "weighted avg 0.91 0.93 0.91 22816\n",
+ "\n",
+ "Confusion Matrix:\n",
+ " [[21267 42]\n",
+ " [ 1446 61]]\n",
+ "Score: 0.9347826086956522\n"
+ ]
+ }
+ ],
+ "source": [
+ "# 可以根据逻辑回归的系数绝对值大小进行排序,也可以基于树模型的特征重要度进行排序\n",
+ "# 特征选择可以使用RFE或者selectFromModel\n",
+ "\n",
+ "from sklearn.feature_selection import SelectFromModel\n",
+ "\n",
+ "# sorted_indices = abs(coefficients).argsort()[::-1]\n",
+ "# coefficients, sorted_indices\n",
+ "\n",
+ "# 计算系数的绝对值\n",
+ "feature_coefficients['Absolute_Coefficient'] = feature_coefficients['Coefficient'].abs()\n",
+ "\n",
+ "# 按照绝对值大小对特征进行排序\n",
+ "sorted_features = feature_coefficients.sort_values(by='Absolute_Coefficient', ascending=False)\n",
+ "sorted_features\n",
+ "print(sorted_features)\n",
+ "print(\"\\n\")\n",
+ "\n",
+ "# # lr_for_selection = LogisticRegression(max_iter=1000)\n",
+ "# lr_for_selection = LogisticRegression(C=10, penalty='l2' ,solver='liblinear')\n",
+ "# sfm_lr = SelectFromModel(lr_for_selection)\n",
+ "# sfm_lr.fit(X_train_std, y_train)\n",
+ "\n",
+ "# # 特征选择\n",
+ "# # selected_features_lr = X_train_std.columns[sfm_lr.get_support()]\n",
+ "# selected_features_lr = X_train_std\n",
+ "\n",
+ "# selected_features_lr\n",
+ "\n",
+ "# # Selecting only the features identified by SelectFromModel\n",
+ "# X_selected = X_train_std[selected_features_lr]\n",
+ "\n",
+ "# # Splitting the data into training and testing sets\n",
+ "# X_train_selected, X_test_selected, y_train_selected, y_test_selected = train_test_split(X_selected, y_train, test_size=0.3, random_state=0)\n",
+ "\n",
+ "# # Creating a new logistic regression model\n",
+ "# lr_new = LogisticRegression(C=10, penalty='l2' ,solver='liblinear')\n",
+ "# lr_new.fit(X_train_selected, y_train_selected)\n",
+ "\n",
+ "# # Predicting on the test set\n",
+ "# y_pred = lr_new.predict(X_test_selected)\n",
+ "\n",
+ "# # Calculating accuracy and other metrics\n",
+ "# accuracy = accuracy_score(y_test_selected, y_pred)\n",
+ "# recall = recall_score(y_test_selected, y_pred,average='macro')\n",
+ "# conf_matrix = confusion_matrix(y_test_selected, y_pred)\n",
+ "# class_report = classification_report(y_test_selected, y_pred)\n",
+ "\n",
+ "# accuracy, conf_matrix\n",
+ "\n",
+ "# print(accuracy)\n",
+ "# print(recall)\n",
+ "# print(class_report)\n",
+ "# print(conf_matrix)\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "# 训练逻辑回归模型\n",
+ "lr_for_selection = LogisticRegression(C=10, penalty='l2', solver='liblinear')\n",
+ "lr_for_selection.fit(X_train_std, y_train)\n",
+ "\n",
+ "# 从模型中获取系数\n",
+ "coefficients = lr_for_selection.coef_[0]\n",
+ "\n",
+ "# 创建一个DataFrame来保存系数及其绝对值\n",
+ "feature_coefficients = pd.DataFrame({'Feature': feature_coefficients['Feature'], 'Coefficient': coefficients})\n",
+ "feature_coefficients['Absolute_Coefficient'] = feature_coefficients['Coefficient'].abs()\n",
+ "\n",
+ "# 使用 SelectFromModel 进行特征选择\n",
+ "sfm_lr = SelectFromModel(lr_for_selection, threshold='mean') # 可以调整阈值\n",
+ "sfm_lr.fit(X_train_std, y_train)\n",
+ "\n",
+ "# 获取被选中的特征\n",
+ "# selected_features_lr = X_train_std.columns[sfm_lr.get_support()]\n",
+ "# 获取被选中的特征的布尔索引\n",
+ "selected_features_bool = sfm_lr.get_support()\n",
+ "\n",
+ "# 使用布尔索引选择特征\n",
+ "X_selected = X_train_std[:, selected_features_bool]\n",
+ "\n",
+ "# 查看哪些特征被选中\n",
+ "selected_feature_names = [feature_coefficients['Feature'][i] for i in range(len(feature_coefficients['Feature'])) if selected_features_bool[i]]\n",
+ "print(\"Selected features:\", selected_feature_names)\n",
+ "\n",
+ "# 仅选择被选中的特征\n",
+ "# X_selected = X_train_std[selected_features_lr]\n",
+ "\n",
+ "# 分割数据集\n",
+ "X_train_selected, X_test_selected, y_train_selected, y_test_selected = train_test_split(X_selected, y_train, test_size=0.3, random_state=0)\n",
+ "\n",
+ "# 创建一个新的逻辑回归模型\n",
+ "lr_new = LogisticRegression(C=10, penalty='l2', solver='liblinear')\n",
+ "lr_new.fit(X_train_selected, y_train_selected)\n",
+ "\n",
+ "# 在测试集上进行预测\n",
+ "y_pred = lr_new.predict(X_test_selected)\n",
+ "\n",
+ "# 计算准确率和其他度量\n",
+ "accuracy = accuracy_score(y_test_selected, y_pred)\n",
+ "recall = recall_score(y_test_selected, y_pred, average='macro')\n",
+ "conf_matrix = confusion_matrix(y_test_selected, y_pred)\n",
+ "class_report = classification_report(y_test_selected, y_pred)\n",
+ "score = lr_new.score(X_test_selected,y_test_selected)\n",
+ "\n",
+ "# 输出结果\n",
+ "print(\"Accuracy:\", accuracy)\n",
+ "print(\"Recall:\", recall)\n",
+ "print(\"\\n\")\n",
+ "print(class_report)\n",
+ "print(\"Confusion Matrix:\\n\", conf_matrix)\n",
+ "print(\"Score:\", score)\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### 10.其他模型算法尝试\n",
+ "使用RandomForestClassifier/SVM/KNN等sklearn分类算法进行分类,尝试上述超参数调优算法过程。"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 19,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [],
+ "source": [
+ "# 随机森林\n",
+ "from sklearn.ensemble import RandomForestClassifier\n",
+ "# 支持向量机\n",
+ "from sklearn.svm import SVC\n",
+ "# K最近邻\n",
+ "from sklearn.neighbors import KNeighborsClassifier\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 25,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Selected features: ['RevolvingUtilizationOfUnsecuredLines', 'DebtRatio', 'MonthlyIncome']\n",
+ "Accuracy: 0.9276823281907434\n",
+ "Recall: 0.5203853071917455\n",
+ "\n",
+ "\n",
+ " precision recall f1-score support\n",
+ "\n",
+ " 0 0.94 0.99 0.96 21309\n",
+ " 1 0.26 0.05 0.09 1507\n",
+ "\n",
+ " accuracy 0.93 22816\n",
+ " macro avg 0.60 0.52 0.52 22816\n",
+ "weighted avg 0.89 0.93 0.90 22816\n",
+ "\n",
+ "Confusion Matrix:\n",
+ " [[21089 220]\n",
+ " [ 1430 77]]\n",
+ "Score: 0.9276823281907434\n"
+ ]
+ }
+ ],
+ "source": [
+ "# 随机森林\n",
+ "\n",
+ "# 训练随机森林模型\n",
+ "forest_for_selection = RandomForestClassifier(criterion='entropy', \n",
+ " n_estimators=10, # The number of trees in the forest.\n",
+ " random_state=1,\n",
+ " n_jobs=2)\n",
+ "forest_for_selection.fit(X_train_std, y_train)\n",
+ "\n",
+ "# 从模型中获取系数\n",
+ "# coefficients = forest_for_selection.coef_[0]\n",
+ "importances = forest_for_selection.feature_importances_\n",
+ "\n",
+ "# # 创建一个DataFrame来保存系数及其绝对值\n",
+ "# feature_coefficients = pd.DataFrame({'Feature': feature_coefficients['Feature'], 'Coefficient': coefficients})\n",
+ "# feature_coefficients['Absolute_Coefficient'] = feature_coefficients['Coefficient'].abs()\n",
+ "\n",
+ "# 使用 SelectFromModel 进行特征选择\n",
+ "sfm_forest = SelectFromModel(forest_for_selection, threshold='mean') # 可以调整阈值\n",
+ "sfm_forest.fit(X_train_std, y_train)\n",
+ "\n",
+ "# 获取被选中的特征\n",
+ "# selected_features_lr = X_train_std.columns[sfm_lr.get_support()]\n",
+ "# 获取被选中的特征的布尔索引\n",
+ "selected_features_bool = sfm_forest.get_support()\n",
+ "\n",
+ "# 使用布尔索引选择特征\n",
+ "X_selected = X_train_std[:, selected_features_bool]\n",
+ "\n",
+ "# 查看哪些特征被选中\n",
+ "selected_feature_names = [feature_coefficients['Feature'][i] for i in range(len(feature_coefficients['Feature'])) if selected_features_bool[i]]\n",
+ "print(\"Selected features:\", selected_feature_names)\n",
+ "\n",
+ "# 仅选择被选中的特征\n",
+ "# X_selected = X_train_std[selected_features_lr]\n",
+ "\n",
+ "# 分割数据集\n",
+ "X_train_selected, X_test_selected, y_train_selected, y_test_selected = train_test_split(X_selected, y_train, test_size=0.3, random_state=0)\n",
+ "\n",
+ "# 创建一个新的随机森林模型\n",
+ "forest_new = RandomForestClassifier(criterion='entropy', \n",
+ " n_estimators=10, # The number of trees in the forest.\n",
+ " random_state=1,\n",
+ " n_jobs=2)\n",
+ "forest_new.fit(X_train_selected, y_train_selected)\n",
+ "\n",
+ "# 在测试集上进行预测\n",
+ "y_pred = forest_new.predict(X_test_selected)\n",
+ "\n",
+ "# 计算准确率和其他度量\n",
+ "accuracy = accuracy_score(y_test_selected, y_pred)\n",
+ "recall = recall_score(y_test_selected, y_pred, average='macro')\n",
+ "conf_matrix = confusion_matrix(y_test_selected, y_pred)\n",
+ "class_report = classification_report(y_test_selected, y_pred)\n",
+ "score = forest_new.score(X_test_selected,y_test_selected)\n",
+ "\n",
+ "# 输出结果\n",
+ "print(\"Accuracy:\", accuracy)\n",
+ "print(\"Recall:\", recall)\n",
+ "print(\"\\n\")\n",
+ "print(class_report)\n",
+ "print(\"Confusion Matrix:\\n\", conf_matrix)\n",
+ "print(\"Score:\", score)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 33,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Selected features: ['NumberOfTimes90DaysLate']\n",
+ "Accuracy: 0.934125175315568\n",
+ "Recall: 0.5056436303647567\n",
+ "\n",
+ "\n",
+ " precision recall f1-score support\n",
+ "\n",
+ " 0 0.93 1.00 0.97 21309\n",
+ " 1 0.56 0.01 0.02 1507\n",
+ "\n",
+ " accuracy 0.93 22816\n",
+ " macro avg 0.75 0.51 0.49 22816\n",
+ "weighted avg 0.91 0.93 0.90 22816\n",
+ "\n",
+ "Confusion Matrix:\n",
+ " [[21295 14]\n",
+ " [ 1489 18]]\n",
+ "Score: 0.934125175315568\n"
+ ]
+ }
+ ],
+ "source": [
+ "# 支持向量机\n",
+ "\n",
+ "# 训练向量机模型\n",
+ "svm_for_selection = SVC(kernel='linear', C=0.1, random_state=0, probability = True)\n",
+ "svm_for_selection.fit(X_train_std, y_train)\n",
+ "\n",
+ "# 从模型中获取系数\n",
+ "coefficients = svm_for_selection.coef_[0]\n",
+ "# importances = svm_for_selection.feature_importances_\n",
+ "\n",
+ "# # 创建一个DataFrame来保存系数及其绝对值\n",
+ "# feature_coefficients = pd.DataFrame({'Feature': feature_coefficients['Feature'], 'Coefficient': coefficients})\n",
+ "# feature_coefficients['Absolute_Coefficient'] = feature_coefficients['Coefficient'].abs()\n",
+ "\n",
+ "# 使用 SelectFromModel 进行特征选择\n",
+ "sfm_svm = SelectFromModel(svm_for_selection, threshold='mean') # 可以调整阈值\n",
+ "sfm_svm.fit(X_train_std, y_train)\n",
+ "\n",
+ "# 获取被选中的特征\n",
+ "# selected_features_lr = X_train_std.columns[sfm_lr.get_support()]\n",
+ "# 获取被选中的特征的布尔索引\n",
+ "selected_features_bool = sfm_svm.get_support()\n",
+ "\n",
+ "# 使用布尔索引选择特征\n",
+ "X_selected = X_train_std[:, selected_features_bool]\n",
+ "\n",
+ "# 查看哪些特征被选中\n",
+ "selected_feature_names = [feature_coefficients['Feature'][i] for i in range(len(feature_coefficients['Feature'])) if selected_features_bool[i]]\n",
+ "print(\"Selected features:\", selected_feature_names)\n",
+ "\n",
+ "# 仅选择被选中的特征\n",
+ "# X_selected = X_train_std[selected_features_lr]\n",
+ "\n",
+ "# 分割数据集\n",
+ "X_train_selected, X_test_selected, y_train_selected, y_test_selected = train_test_split(X_selected, y_train, test_size=0.3, random_state=0)\n",
+ "\n",
+ "# 创建一个新的向量机模型\n",
+ "svm_new = SVC(kernel='linear', C=0.1, random_state=0, probability = True)\n",
+ "svm_new.fit(X_train_selected, y_train_selected)\n",
+ "\n",
+ "# 在测试集上进行预测\n",
+ "y_pred = svm_new.predict(X_test_selected)\n",
+ "\n",
+ "# 计算准确率和其他度量\n",
+ "accuracy = accuracy_score(y_test_selected, y_pred)\n",
+ "recall = recall_score(y_test_selected, y_pred, average='macro')\n",
+ "conf_matrix = confusion_matrix(y_test_selected, y_pred)\n",
+ "class_report = classification_report(y_test_selected, y_pred)\n",
+ "score = svm_new.score(X_test_selected,y_test_selected)\n",
+ "\n",
+ "# 输出结果\n",
+ "print(\"Accuracy:\", accuracy)\n",
+ "print(\"Recall:\", recall)\n",
+ "print(\"\\n\")\n",
+ "print(class_report)\n",
+ "print(\"Confusion Matrix:\\n\", conf_matrix)\n",
+ "print(\"Score:\", score)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 32,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Selected features: ['RevolvingUtilizationOfUnsecuredLines', 'DebtRatio', 'MonthlyIncome']\n",
+ "Accuracy: 0.9293478260869565\n",
+ "Recall: 0.5024693841180347\n",
+ "\n",
+ "\n",
+ " precision recall f1-score support\n",
+ "\n",
+ " 0 0.93 0.99 0.96 21309\n",
+ " 1 0.12 0.01 0.02 1507\n",
+ "\n",
+ " accuracy 0.93 22816\n",
+ " macro avg 0.53 0.50 0.49 22816\n",
+ "weighted avg 0.88 0.93 0.90 22816\n",
+ "\n",
+ "Confusion Matrix:\n",
+ " [[21188 121]\n",
+ " [ 1491 16]]\n",
+ "Score: 0.9293478260869565\n"
+ ]
+ }
+ ],
+ "source": [
+ "# K最近邻\n",
+ "from sklearn.feature_selection import RFE\n",
+ "\n",
+ "# 使用随机森林进行特征选择\n",
+ "forest_for_selection = RandomForestClassifier(criterion='entropy', \n",
+ " n_estimators=10, # The number of trees in the forest.\n",
+ " random_state=1,\n",
+ " n_jobs=2)\n",
+ "forest_for_selection.fit(X_train_std, y_train)\n",
+ "\n",
+ "# 从模型中获取系数\n",
+ "# coefficients = forest_for_selection.coef_[0]\n",
+ "importances = forest_for_selection.feature_importances_\n",
+ "\n",
+ "# # 创建一个DataFrame来保存系数及其绝对值\n",
+ "# feature_coefficients = pd.DataFrame({'Feature': feature_coefficients['Feature'], 'Coefficient': coefficients})\n",
+ "# feature_coefficients['Absolute_Coefficient'] = feature_coefficients['Coefficient'].abs()\n",
+ "\n",
+ "# 使用 SelectFromModel 进行特征选择\n",
+ "sfm_forest = SelectFromModel(forest_for_selection, threshold='mean') # 可以调整阈值\n",
+ "sfm_forest.fit(X_train_std, y_train)\n",
+ "\n",
+ "# 获取被选中的特征\n",
+ "# selected_features_lr = X_train_std.columns[sfm_lr.get_support()]\n",
+ "# 获取被选中的特征的布尔索引\n",
+ "selected_features_bool = sfm_forest.get_support()\n",
+ "\n",
+ "# 使用布尔索引选择特征\n",
+ "X_selected = X_train_std[:, selected_features_bool]\n",
+ "\n",
+ "# 查看哪些特征被选中\n",
+ "selected_feature_names = [feature_coefficients['Feature'][i] for i in range(len(feature_coefficients['Feature'])) if selected_features_bool[i]]\n",
+ "print(\"Selected features:\", selected_feature_names)\n",
+ "\n",
+ "# 仅选择被选中的特征\n",
+ "# X_selected = X_train_std[selected_features_lr]\n",
+ "\n",
+ "# 分割数据集\n",
+ "X_train_selected, X_test_selected, y_train_selected, y_test_selected = train_test_split(X_selected, y_train, test_size=0.3, random_state=0)\n",
+ "\n",
+ "# 创建一个新的K最近邻模型\n",
+ "knn_new = KNeighborsClassifier(n_neighbors=5, p=2, metric='minkowski')\n",
+ "knn_new.fit(X_train_selected, y_train_selected)\n",
+ "\n",
+ "# 在测试集上进行预测\n",
+ "y_pred = knn_new.predict(X_test_selected)\n",
+ "\n",
+ "# 计算准确率和其他度量\n",
+ "accuracy = accuracy_score(y_test_selected, y_pred)\n",
+ "recall = recall_score(y_test_selected, y_pred, average='macro')\n",
+ "conf_matrix = confusion_matrix(y_test_selected, y_pred)\n",
+ "class_report = classification_report(y_test_selected, y_pred)\n",
+ "score = knn_new.score(X_test_selected,y_test_selected)\n",
+ "\n",
+ "# 输出结果\n",
+ "print(\"Accuracy:\", accuracy)\n",
+ "print(\"Recall:\", recall)\n",
+ "print(\"\\n\")\n",
+ "print(class_report)\n",
+ "print(\"Confusion Matrix:\\n\", conf_matrix)\n",
+ "print(\"Score:\", score)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3 (ipykernel)",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.9.13"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/2023/homework/Zou_Zenghui/homework_credit_scoring_Zou.ipynb b/2023/homework/Zou_Zenghui/homework_credit_scoring_Zou.ipynb
new file mode 100644
index 00000000..819fef60
--- /dev/null
+++ b/2023/homework/Zou_Zenghui/homework_credit_scoring_Zou.ipynb
@@ -0,0 +1,1533 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# 一起来打怪之 Credit Scoring 练习"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "---\n",
+ "## 作业说明\n",
+ "\n",
+ "- 答题步骤:\n",
+ " - 回答问题**请保留每一步**操作过程,请不要仅仅给出最后答案\n",
+ " - 请养成代码注释的好习惯\n",
+ "\n",
+ "- 解题思路:\n",
+ " - 为方便大家准确理解题目,在习题实战中有所收获,本文档提供了解题思路提示\n",
+ " - 解题思路**仅供参考**,鼓励原创解题方法\n",
+ " - 为督促同学们自己思考,解题思路内容设置为**白色**,必要时请从冒号后拖动鼠标查看\n",
+ "\n",
+ "- 所用数据\n",
+ " - 请注意导入数据库后先**查看和了解数据的基本性质**,后面的问题不再一一提醒"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## machine learning for credit scoring\n",
+ "\n",
+ "\n",
+ "Banks play a crucial role in market economies. They decide who can get finance and on what terms and can make or break investment decisions. For markets and society to function, individuals and companies need access to credit. \n",
+ "\n",
+ "Credit scoring algorithms, which make a guess at the probability of default, are the method banks use to determine whether or not a loan should be granted. This competition requires participants to improve on the state of the art in credit scoring, by predicting the probability that somebody will experience financial distress in the next two years. [Dataset](https://www.kaggle.com/c/GiveMeSomeCredit)\n",
+ "\n",
+ "Attribute Information:\n",
+ "\n",
+ "|Variable Name\t|\tDescription\t|\tType|\n",
+ "|----|----|----|\n",
+ "|SeriousDlqin2yrs\t|\tPerson experienced 90 days past due delinquency or worse \t|\tY/N|\n",
+ "|RevolvingUtilizationOfUnsecuredLines\t|\tTotal balance on credit divided by the sum of credit limits\t|\tpercentage|\n",
+ "|age\t|\tAge of borrower in years\t|\tinteger|\n",
+ "|NumberOfTime30-59DaysPastDueNotWorse\t|\tNumber of times borrower has been 30-59 days past due |\tinteger|\n",
+ "|DebtRatio\t|\tMonthly debt payments\t|\tpercentage|\n",
+ "|MonthlyIncome\t|\tMonthly income\t|\treal|\n",
+ "|NumberOfOpenCreditLinesAndLoans\t|\tNumber of Open loans |\tinteger|\n",
+ "|NumberOfTimes90DaysLate\t|\tNumber of times borrower has been 90 days or more past due.\t|\tinteger|\n",
+ "|NumberRealEstateLoansOrLines\t|\tNumber of mortgage and real estate loans\t|\tinteger|\n",
+ "|NumberOfTime60-89DaysPastDueNotWorse\t|\tNumber of times borrower has been 60-89 days past due |integer|\n",
+ "|NumberOfDependents\t|\tNumber of dependents in family\t|\tinteger|\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "import matplotlib.pyplot as plt\n",
+ "import numpy as np\n",
+ "import seaborn as sns"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "----------\n",
+ "## Read the data into Pandas "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " SeriousDlqin2yrs \n",
+ " RevolvingUtilizationOfUnsecuredLines \n",
+ " age \n",
+ " NumberOfTime30-59DaysPastDueNotWorse \n",
+ " DebtRatio \n",
+ " MonthlyIncome \n",
+ " NumberOfOpenCreditLinesAndLoans \n",
+ " NumberOfTimes90DaysLate \n",
+ " NumberRealEstateLoansOrLines \n",
+ " NumberOfTime60-89DaysPastDueNotWorse \n",
+ " NumberOfDependents \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 1 \n",
+ " 0.766127 \n",
+ " 45.0 \n",
+ " 2.0 \n",
+ " 0.802982 \n",
+ " 9120.0 \n",
+ " 13.0 \n",
+ " 0.0 \n",
+ " 6.0 \n",
+ " 0.0 \n",
+ " 2.0 \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 0 \n",
+ " 0.957151 \n",
+ " 40.0 \n",
+ " 0.0 \n",
+ " 0.121876 \n",
+ " 2600.0 \n",
+ " 4.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 0 \n",
+ " 0.658180 \n",
+ " 38.0 \n",
+ " 1.0 \n",
+ " 0.085113 \n",
+ " 3042.0 \n",
+ " 2.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 0 \n",
+ " 0.233810 \n",
+ " 30.0 \n",
+ " 0.0 \n",
+ " 0.036050 \n",
+ " 3300.0 \n",
+ " 5.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " 0 \n",
+ " 0.907239 \n",
+ " 49.0 \n",
+ " 1.0 \n",
+ " 0.024926 \n",
+ " 63588.0 \n",
+ " 7.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " SeriousDlqin2yrs RevolvingUtilizationOfUnsecuredLines age \\\n",
+ "0 1 0.766127 45.0 \n",
+ "1 0 0.957151 40.0 \n",
+ "2 0 0.658180 38.0 \n",
+ "3 0 0.233810 30.0 \n",
+ "4 0 0.907239 49.0 \n",
+ "\n",
+ " NumberOfTime30-59DaysPastDueNotWorse DebtRatio MonthlyIncome \\\n",
+ "0 2.0 0.802982 9120.0 \n",
+ "1 0.0 0.121876 2600.0 \n",
+ "2 1.0 0.085113 3042.0 \n",
+ "3 0.0 0.036050 3300.0 \n",
+ "4 1.0 0.024926 63588.0 \n",
+ "\n",
+ " NumberOfOpenCreditLinesAndLoans NumberOfTimes90DaysLate \\\n",
+ "0 13.0 0.0 \n",
+ "1 4.0 0.0 \n",
+ "2 2.0 1.0 \n",
+ "3 5.0 0.0 \n",
+ "4 7.0 0.0 \n",
+ "\n",
+ " NumberRealEstateLoansOrLines NumberOfTime60-89DaysPastDueNotWorse \\\n",
+ "0 6.0 0.0 \n",
+ "1 0.0 0.0 \n",
+ "2 0.0 0.0 \n",
+ "3 0.0 0.0 \n",
+ "4 1.0 0.0 \n",
+ "\n",
+ " NumberOfDependents \n",
+ "0 2.0 \n",
+ "1 1.0 \n",
+ "2 0.0 \n",
+ "3 0.0 \n",
+ "4 0.0 "
+ ]
+ },
+ "execution_count": 2,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "import pandas as pd\n",
+ "pd.set_option('display.max_columns', 500)\n",
+ "import zipfile\n",
+ "with zipfile.ZipFile('KaggleCredit2.csv.zip', 'r') as z:\n",
+ " f = z.open('KaggleCredit2.csv')\n",
+ " data = pd.read_csv(f, index_col=0)\n",
+ "data.head()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(112915, 11)"
+ ]
+ },
+ "execution_count": 3,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "data.shape"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "tags": []
+ },
+ "source": [
+ "------------\n",
+ "## Drop na"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "SeriousDlqin2yrs 0\n",
+ "RevolvingUtilizationOfUnsecuredLines 0\n",
+ "age 4267\n",
+ "NumberOfTime30-59DaysPastDueNotWorse 0\n",
+ "DebtRatio 0\n",
+ "MonthlyIncome 0\n",
+ "NumberOfOpenCreditLinesAndLoans 0\n",
+ "NumberOfTimes90DaysLate 0\n",
+ "NumberRealEstateLoansOrLines 0\n",
+ "NumberOfTime60-89DaysPastDueNotWorse 0\n",
+ "NumberOfDependents 4267\n",
+ "dtype: int64"
+ ]
+ },
+ "execution_count": 6,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "data.isnull().sum(axis=0)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(108648, 11)"
+ ]
+ },
+ "execution_count": 7,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "data.dropna(inplace=True)\n",
+ "data.shape"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "---------\n",
+ "## Create X and y"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "y = data['SeriousDlqin2yrs']\n",
+ "X = data.drop('SeriousDlqin2yrs', axis=1)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " RevolvingUtilizationOfUnsecuredLines \n",
+ " age \n",
+ " NumberOfTime30-59DaysPastDueNotWorse \n",
+ " DebtRatio \n",
+ " MonthlyIncome \n",
+ " NumberOfOpenCreditLinesAndLoans \n",
+ " NumberOfTimes90DaysLate \n",
+ " NumberRealEstateLoansOrLines \n",
+ " NumberOfTime60-89DaysPastDueNotWorse \n",
+ " NumberOfDependents \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 0.766127 \n",
+ " 45.0 \n",
+ " 2.0 \n",
+ " 0.802982 \n",
+ " 9120.0 \n",
+ " 13.0 \n",
+ " 0.0 \n",
+ " 6.0 \n",
+ " 0.0 \n",
+ " 2.0 \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 0.957151 \n",
+ " 40.0 \n",
+ " 0.0 \n",
+ " 0.121876 \n",
+ " 2600.0 \n",
+ " 4.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 0.658180 \n",
+ " 38.0 \n",
+ " 1.0 \n",
+ " 0.085113 \n",
+ " 3042.0 \n",
+ " 2.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 0.233810 \n",
+ " 30.0 \n",
+ " 0.0 \n",
+ " 0.036050 \n",
+ " 3300.0 \n",
+ " 5.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " 0.907239 \n",
+ " 49.0 \n",
+ " 1.0 \n",
+ " 0.024926 \n",
+ " 63588.0 \n",
+ " 7.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " \n",
+ " \n",
+ " 112910 \n",
+ " 0.385742 \n",
+ " 50.0 \n",
+ " 0.0 \n",
+ " 0.404293 \n",
+ " 3400.0 \n",
+ " 7.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 112911 \n",
+ " 0.040674 \n",
+ " 74.0 \n",
+ " 0.0 \n",
+ " 0.225131 \n",
+ " 2100.0 \n",
+ " 4.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 112912 \n",
+ " 0.299745 \n",
+ " 44.0 \n",
+ " 0.0 \n",
+ " 0.716562 \n",
+ " 5584.0 \n",
+ " 4.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 2.0 \n",
+ " \n",
+ " \n",
+ " 112913 \n",
+ " 0.000000 \n",
+ " 30.0 \n",
+ " 0.0 \n",
+ " 0.000000 \n",
+ " 5716.0 \n",
+ " 4.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 112914 \n",
+ " 0.850283 \n",
+ " 64.0 \n",
+ " 0.0 \n",
+ " 0.249908 \n",
+ " 8158.0 \n",
+ " 8.0 \n",
+ " 0.0 \n",
+ " 2.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
108648 rows × 10 columns
\n",
+ "
"
+ ],
+ "text/plain": [
+ " RevolvingUtilizationOfUnsecuredLines age \\\n",
+ "0 0.766127 45.0 \n",
+ "1 0.957151 40.0 \n",
+ "2 0.658180 38.0 \n",
+ "3 0.233810 30.0 \n",
+ "4 0.907239 49.0 \n",
+ "... ... ... \n",
+ "112910 0.385742 50.0 \n",
+ "112911 0.040674 74.0 \n",
+ "112912 0.299745 44.0 \n",
+ "112913 0.000000 30.0 \n",
+ "112914 0.850283 64.0 \n",
+ "\n",
+ " NumberOfTime30-59DaysPastDueNotWorse DebtRatio MonthlyIncome \\\n",
+ "0 2.0 0.802982 9120.0 \n",
+ "1 0.0 0.121876 2600.0 \n",
+ "2 1.0 0.085113 3042.0 \n",
+ "3 0.0 0.036050 3300.0 \n",
+ "4 1.0 0.024926 63588.0 \n",
+ "... ... ... ... \n",
+ "112910 0.0 0.404293 3400.0 \n",
+ "112911 0.0 0.225131 2100.0 \n",
+ "112912 0.0 0.716562 5584.0 \n",
+ "112913 0.0 0.000000 5716.0 \n",
+ "112914 0.0 0.249908 8158.0 \n",
+ "\n",
+ " NumberOfOpenCreditLinesAndLoans NumberOfTimes90DaysLate \\\n",
+ "0 13.0 0.0 \n",
+ "1 4.0 0.0 \n",
+ "2 2.0 1.0 \n",
+ "3 5.0 0.0 \n",
+ "4 7.0 0.0 \n",
+ "... ... ... \n",
+ "112910 7.0 0.0 \n",
+ "112911 4.0 0.0 \n",
+ "112912 4.0 0.0 \n",
+ "112913 4.0 0.0 \n",
+ "112914 8.0 0.0 \n",
+ "\n",
+ " NumberRealEstateLoansOrLines NumberOfTime60-89DaysPastDueNotWorse \\\n",
+ "0 6.0 0.0 \n",
+ "1 0.0 0.0 \n",
+ "2 0.0 0.0 \n",
+ "3 0.0 0.0 \n",
+ "4 1.0 0.0 \n",
+ "... ... ... \n",
+ "112910 0.0 0.0 \n",
+ "112911 1.0 0.0 \n",
+ "112912 1.0 0.0 \n",
+ "112913 0.0 0.0 \n",
+ "112914 2.0 0.0 \n",
+ "\n",
+ " NumberOfDependents \n",
+ "0 2.0 \n",
+ "1 1.0 \n",
+ "2 0.0 \n",
+ "3 0.0 \n",
+ "4 0.0 \n",
+ "... ... \n",
+ "112910 0.0 \n",
+ "112911 0.0 \n",
+ "112912 2.0 \n",
+ "112913 0.0 \n",
+ "112914 0.0 \n",
+ "\n",
+ "[108648 rows x 10 columns]"
+ ]
+ },
+ "execution_count": 9,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "X"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0.06742876076872101"
+ ]
+ },
+ "execution_count": 10,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "y.mean()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "tags": []
+ },
+ "source": [
+ "---\n",
+ "## 练习1:把数据切分成训练集和测试集\n",
+ "- 提示:from sklearn.model_selection import train_test_split('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "((86918, 10), (21730, 10), (86918,), (21730,))"
+ ]
+ },
+ "execution_count": 11,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "## your code here\n",
+ "from sklearn.model_selection import train_test_split\n",
+ "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=True, random_state=20)\n",
+ "\n",
+ "# 查看数据的维度\n",
+ "X_train.shape, X_test.shape, y_train.shape, y_test.shape"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "----\n",
+ "## 练习2:使用logistic regression/决策树/SVM/KNN...等sklearn分类算法进行分类\n",
+ "尝试查sklearn API了解模型参数含义,调整不同的参数"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Logistic regression\n",
+ "- 提示:from sklearn import linear_model('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "## your code here\n",
+ "from sklearn.linear_model import LogisticRegression\n",
+ "\n",
+ "model = LogisticRegression(max_iter = 1000)# default max_iter(100) is too small to converge.\n",
+ "model.fit(X_train, y_train)\n",
+ "\n",
+ "y_pred = model.predict(X_test)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### see confusion matrix\n",
+ "https://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 13,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "array([[20247, 53],\n",
+ " [ 1359, 71]])"
+ ]
+ },
+ "execution_count": 13,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "from sklearn.metrics import confusion_matrix\n",
+ "cm = confusion_matrix(y_test, y_pred)\n",
+ "cm"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 14,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "data": {
+ "image/png": "",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "sns.heatmap(cm, annot = True, fmt = 'd')# annot = True: show the numbers in each heatmap cell\n",
+ " # fmt = 'd': show numbers as integers. \n",
+ "plt.xlabel('Predicted')\n",
+ "plt.ylabel('Actual')\n",
+ "plt.title('Confusion Matrix')\n",
+ "plt.show()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 15,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0.935020708697653"
+ ]
+ },
+ "execution_count": 15,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "R = cm[0,0] + cm[1,1]\n",
+ "All = cm[0,0] + cm[0,1] + cm[1,0] + cm[1,1]\n",
+ "R/All ## calculate the right rate manually "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 16,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0.935020708697653"
+ ]
+ },
+ "execution_count": 16,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "model.score(X_test, y_test) # Return the mean accuracy on the given test data and labels"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Decision Tree\n",
+ "- 提示:from sklearn.tree import DecisionTreeClassifier('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 21,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "## your code here\n",
+ "from sklearn.tree import DecisionTreeClassifier\n",
+ "D_tree = DecisionTreeClassifier(criterion = 'entropy', random_state = 0, max_depth = 3)# max_depth prevent overfitting\n",
+ "D_tree.fit(X_train, y_train)\n",
+ "y_pred = D_tree.predict(X_test)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 22,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "array([[20091, 209],\n",
+ " [ 1196, 234]])"
+ ]
+ },
+ "execution_count": 22,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "D_tree_cm = confusion_matrix(y_test, y_pred)\n",
+ "D_tree_cm"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 23,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "data": {
+ "image/png": "",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "sns.heatmap(D_tree_cm, annot = True, fmt = 'd')# annot = True: show the numbers in each heatmap cell\n",
+ " # fmt = 'd': show numbers as integers. \n",
+ "plt.xlabel('Predicted')\n",
+ "plt.ylabel('Actual')\n",
+ "plt.title('Confusion Matrix')\n",
+ "plt.show()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 24,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0.9353428439944776"
+ ]
+ },
+ "execution_count": 24,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "D_tree.score(X_test, y_test) # Return the mean accuracy on the given test data and labels"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Random Forest\n",
+ "- 提示:from sklearn.ensemble import RandomForestClassifier('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 32,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "[0 0 0 ... 0 0 0]\n",
+ "The sore: 0.9319834330418776\n"
+ ]
+ }
+ ],
+ "source": [
+ "## your code here\n",
+ "from sklearn.ensemble import RandomForestClassifier\n",
+ "\n",
+ "forest = RandomForestClassifier(criterion='entropy', \n",
+ " n_estimators=4, # The number of trees in the forest.\n",
+ " random_state=1,\n",
+ " n_jobs=2)\n",
+ "forest.fit(X_train, y_train)\n",
+ "\n",
+ "y_pred = forest.predict(X_test)\n",
+ "sore_RF= forest.score(X_test, y_test) # Return the mean accuracy on the given test data and labels\n",
+ "print(y_predicted)\n",
+ "print('The sore:', sore_RF)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 33,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "[[20061 239]\n",
+ " [ 1239 191]]\n"
+ ]
+ },
+ {
+ "data": {
+ "image/png": "iVBORw0KGgoAAAANSUhEUgAAAiwAAAHHCAYAAACcHAM1AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8g+/7EAAAACXBIWXMAAA9hAAAPYQGoP6dpAABPCUlEQVR4nO3de1yO9/8H8Ndd6u6gg6TTUDksIjkNzZQmhZiwr9NGyIzlVA4tcwibLMv50GzIzHmjzWFIlmZyipxGX4dixp1DSOHucF+/P/y6vrsVyn1fdavX8/u4Hl/353pfn+tztZm39+fzuW6ZIAgCiIiIiHSYXkUPgIiIiOhVmLAQERGRzmPCQkRERDqPCQsRERHpPCYsREREpPOYsBAREZHOY8JCREREOo8JCxEREek8JixERESk85iwEEno0qVL8PX1hYWFBWQyGeLi4rTaf0ZGBmQyGWJjY7Xa75usY8eO6NixY0UPg4i0jAkLVXpXrlzBp59+inr16sHIyAjm5uZo3749Fi1ahCdPnkh678DAQJw9exZfffUV1q1bh9atW0t6v/I0ZMgQyGQymJubl/hzvHTpEmQyGWQyGb755psy93/z5k1EREQgNTVVC6MlojddtYoeAJGUdu3ahf/85z+Qy+UYPHgwmjZtiry8PBw6dAiTJk3C+fPnsXLlSknu/eTJEyQnJ+OLL77A6NGjJbmHo6Mjnjx5AgMDA0n6f5Vq1arh8ePH2LFjB/r27at2bv369TAyMsLTp09fq++bN29i5syZcHJyQvPmzUt93b59+17rfkSk25iwUKWVnp6O/v37w9HREQcOHIC9vb14Ljg4GJcvX8auXbsku/+dO3cAAJaWlpLdQyaTwcjISLL+X0Uul6N9+/bYuHFjsYRlw4YN8Pf3x88//1wuY3n8+DFMTExgaGhYLvcjovLFKSGqtKKiopCTk4NVq1apJStFGjRogHHjxomfCwoKMHv2bNSvXx9yuRxOTk6YMmUKlEql2nVOTk7o3r07Dh06hDZt2sDIyAj16tXDDz/8IMZERETA0dERADBp0iTIZDI4OTkBeDaVUvTrf4uIiIBMJlNri4+Px3vvvQdLS0tUr14dLi4umDJlinj+RWtYDhw4gA4dOsDU1BSWlpbo2bMnLly4UOL9Ll++jCFDhsDS0hIWFhYYOnQoHj9+/OIf7HMGDhyI3377DQ8ePBDbjh8/jkuXLmHgwIHF4rOysjBx4kS4ubmhevXqMDc3R9euXXH69GkxJjExEe+88w4AYOjQoeLUUtFzduzYEU2bNkVKSgo8PT1hYmIi/lyeX8MSGBgIIyOjYs/v5+eHGjVq4ObNm6V+ViKqOExYqNLasWMH6tWrh3fffbdU8cOHD8f06dPRsmVLLFiwAF5eXoiMjET//v2LxV6+fBkffvghOnfujOjoaNSoUQNDhgzB+fPnAQC9e/fGggULAAADBgzAunXrsHDhwjKN//z58+jevTuUSiVmzZqF6OhofPDBB/jzzz9fet3+/fvh5+eH27dvIyIiAqGhoTh8+DDat2+PjIyMYvF9+/bFo0ePEBkZib59+yI2NhYzZ84s9Th79+4NmUyGbdu2iW0bNmxAo0aN0LJly2LxV69eRVxcHLp374758+dj0qRJOHv2LLy8vMTkoXHjxpg1axYAYMSIEVi3bh3WrVsHT09PsZ979+6ha9euaN68ORYuXAhvb+8Sx7do0SLUqlULgYGBKCwsBAB8++232LdvH5YsWQIHB4dSPysRVSCBqBJ6+PChAEDo2bNnqeJTU1MFAMLw4cPV2idOnCgAEA4cOCC2OTo6CgCEpKQkse327duCXC4XJkyYILalp6cLAIR58+ap9RkYGCg4OjoWG8OMGTOEf/+WXLBggQBAuHPnzgvHXXSPNWvWiG3NmzcXbGxshHv37oltp0+fFvT09ITBgwcXu9+wYcPU+uzVq5dQs2bNF97z389hamoqCIIgfPjhh0KnTp0EQRCEwsJCwc7OTpg5c2aJP4OnT58KhYWFxZ5DLpcLs2bNEtuOHz9e7NmKeHl5CQCEmJiYEs95eXmpte3du1cAIHz55ZfC1atXherVqwsBAQGvfEYi0h2ssFCllJ2dDQAwMzMrVfzu3bsBAKGhoWrtEyZMAIBia11cXV3RoUMH8XOtWrXg4uKCq1evvvaYn1e09uWXX36BSqUq1TW3bt1CamoqhgwZAisrK7G9WbNm6Ny5s/ic/zZy5Ei1zx06dMC9e/fEn2FpDBw4EImJiVAoFDhw4AAUCkWJ00HAs3UvenrP/tNTWFiIe/fuidNdJ0+eLPU95XI5hg4dWqpYX19ffPrpp5g1axZ69+4NIyMjfPvtt6W+FxFVPCYsVCmZm5sDAB49elSq+GvXrkFPTw8NGjRQa7ezs4OlpSWuXbum1l63bt1ifdSoUQP3799/zREX169fP7Rv3x7Dhw+Hra0t+vfvjy1btrw0eSkap4uLS7FzjRs3xt27d5Gbm6vW/vyz1KhRAwDK9CzdunWDmZkZNm/ejPXr1+Odd94p9rMsolKpsGDBAjRs2BByuRzW1taoVasWzpw5g4cPH5b6nm+99VaZFth+8803sLKyQmpqKhYvXgwbG5tSX0tEFY8JC1VK5ubmcHBwwLlz58p03fOLXl9EX1+/xHZBEF77HkXrK4oYGxsjKSkJ+/fvx6BBg3DmzBn069cPnTt3LharCU2epYhcLkfv3r2xdu1abN++/YXVFQCYM2cOQkND4enpiR9//BF79+5FfHw8mjRpUupKEvDs51MWp06dwu3btwEAZ8+eLdO1RFTxmLBQpdW9e3dcuXIFycnJr4x1dHSESqXCpUuX1NozMzPx4MEDccePNtSoUUNtR02R56s4AKCnp4dOnTph/vz5+Ouvv/DVV1/hwIED+P3330vsu2icaWlpxc5dvHgR1tbWMDU11ewBXmDgwIE4deoUHj16VOJC5SI//fQTvL29sWrVKvTv3x++vr7w8fEp9jMpbfJYGrm5uRg6dChcXV0xYsQIREVF4fjx41rrn4ikx4SFKq3JkyfD1NQUw4cPR2ZmZrHzV65cwaJFiwA8m9IAUGwnz/z58wEA/v7+WhtX/fr18fDhQ5w5c0Zsu3XrFrZv364Wl5WVVezaoheoPb/Vuoi9vT2aN2+OtWvXqiUA586dw759+8TnlIK3tzdmz56NpUuXws7O7oVx+vr6xao3W7duxT///KPWVpRYlZTclVVYWBiuX7+OtWvXYv78+XByckJgYOALf45EpHv44jiqtOrXr48NGzagX79+aNy4sdqbbg8fPoytW7diyJAhAAB3d3cEBgZi5cqVePDgAby8vHDs2DGsXbsWAQEBL9wy+zr69++PsLAw9OrVC2PHjsXjx4+xYsUKvP3222qLTmfNmoWkpCT4+/vD0dERt2/fxvLly1G7dm289957L+x/3rx56Nq1Kzw8PBAUFIQnT55gyZIlsLCwQEREhNae43l6enqYOnXqK+O6d++OWbNmYejQoXj33Xdx9uxZrF+/HvXq1VOLq1+/PiwtLRETEwMzMzOYmpqibdu2cHZ2LtO4Dhw4gOXLl2PGjBniNus1a9agY8eOmDZtGqKiosrUHxFVkArepUQkuf/+97/CJ598Ijg5OQmGhoaCmZmZ0L59e2HJkiXC06dPxbj8/Hxh5syZgrOzs2BgYCDUqVNHCA8PV4sRhGfbmv39/Yvd5/nttC/a1iwIgrBv3z6hadOmgqGhoeDi4iL8+OOPxbY1JyQkCD179hQcHBwEQ0NDwcHBQRgwYIDw3//+t9g9nt/6u3//fqF9+/aCsbGxYG5uLvTo0UP466+/1GKK7vf8tuk1a9YIAIT09PQX/kwFQX1b84u8aFvzhAkTBHt7e8HY2Fho3769kJycXOJ25F9++UVwdXUVqlWrpvacXl5eQpMmTUq857/7yc7OFhwdHYWWLVsK+fn5anEhISGCnp6ekJyc/NJnICLdIBOEMqysIyIiIqoAXMNCREREOo8JCxEREek8JixERESk85iwEBERkc5jwkJERFQJRUZG4p133oGZmRlsbGwQEBBQ7KWST58+RXBwMGrWrInq1aujT58+xd5bdf36dfj7+8PExAQ2NjaYNGkSCgoK1GISExPRsmVLyOVyNGjQALGxscXGs2zZMjg5OcHIyAht27bFsWPHyvQ8TFiIiIgqoYMHDyI4OBhHjhxBfHw88vPz4evrq/Z9YiEhIdixYwe2bt2KgwcP4ubNm+jdu7d4vrCwEP7+/uL7q9auXYvY2FhMnz5djElPT4e/vz+8vb2RmpqK8ePHY/jw4di7d68Ys3nzZoSGhmLGjBk4efIk3N3d4efnJ35dRqlU9L5qIiIikt7t27cFAMLBgwcFQRCEBw8eCAYGBsLWrVvFmAsXLggAxPcT7d69W9DT0xMUCoUYs2LFCsHc3FxQKpWCIAjC5MmTi70XqV+/foKfn5/4uU2bNkJwcLD4ubCwUHBwcBAiIyNLPf5K+abb/LtXK3oIRDrJ2KFDRQ+BSOcU5P3z6iANaevPJZXZW8W+UkIul0Mul7/y2qJvQ7eysgIApKSkID8/Hz4+PmJMo0aNULduXSQnJ6Ndu3ZITk6Gm5sbbG1txRg/Pz+MGjUK58+fR4sWLZCcnKzWR1HM+PHjAQB5eXlISUlBeHi4eF5PTw8+Pj6l+q438ZpSRxIREVGFioyMhIWFhdoRGRn5yutUKhXGjx+P9u3bo2nTpgAAhUIBQ0NDWFpaqsXa2tpCoVCIMf9OVorOF517WUx2djaePHmCu3fvorCwsMSYoj5Ko1JWWIiIiHSKqlAr3YSHhyM0NFStrTTVleDgYJw7dw6HDh3SyjgqAhMWIiIiqQkqrXRT2umffxs9ejR27tyJpKQk1K5dW2y3s7NDXl4eHjx4oFZlyczMFL9x3c7OrthunqJdRP+OeX5nUWZmJszNzWFsbAx9fX3o6+uXGPOyb3Z/HqeEiIiIpKZSaecoA0EQMHr0aGzfvh0HDhwo9k3nrVq1goGBARISEsS2tLQ0XL9+HR4eHgAADw8PnD17Vm03T3x8PMzNzeHq6irG/LuPopiiPgwNDdGqVSu1GJVKhYSEBDGmNFhhISIiqoSCg4OxYcMG/PLLLzAzMxPXi1hYWMDY2BgWFhYICgpCaGgorKysYG5ujjFjxsDDwwPt2rUDAPj6+sLV1RWDBg1CVFQUFAoFpk6diuDgYLHSM3LkSCxduhSTJ0/GsGHDcODAAWzZsgW7du0SxxIaGorAwEC0bt0abdq0wcKFC5Gbm4uhQ4eW+nkq5bc1c5cQUcm4S4iouPLYJZR387xW+jF0aFLqWJlMVmL7mjVrMGTIEADPXhw3YcIEbNy4EUqlEn5+fli+fLnaVM21a9cwatQoJCYmwtTUFIGBgZg7dy6qVftfzSMxMREhISH466+/ULt2bUybNk28R5GlS5di3rx5UCgUaN68ORYvXoy2bduW/nmYsBBVHUxYiIorl4Tlxlmt9GNY200r/byJuIaFiIiIdB7XsBAREUlNS7uEqjImLERERFLT0ntYqjJOCREREZHOY4WFiIhIapwS0hgTFiIiIqmV8aVvVBynhIiIiEjnscJCREQkMYFTQhpjwkJERCQ1TglpjAkLERGR1Fhh0RjXsBAREZHOY4WFiIhIanxxnMaYsBAREUmNU0Ia45QQERER6TxWWIiIiKTGXUIaY8JCREQkNU4JaYxTQkRERKTzWGEhIiKSGqeENMaEhYiISGKCwG3NmuKUEBEREek8VliIiIikxkW3GmPCQkREJDWuYdEYExYiIiKpscKiMa5hISIiIp3HCgsREZHU+OWHGmPCQkREJDVOCWmMU0JERESk81hhISIikhp3CWmMCQsREZHUOCWkMU4JERERkc5jhYWIiEhqnBLSGBMWIiIiqTFh0RinhIiIiEjnscJCREQkMUHgi+M0xYSFiIhIapwS0hinhIiIiKQmqLRzlFFSUhJ69OgBBwcHyGQyxMXFqZ2XyWQlHvPmzRNjnJycip2fO3euWj9nzpxBhw4dYGRkhDp16iAqKqrYWLZu3YpGjRrByMgIbm5u2L17d5mehQkLERFRJZWbmwt3d3csW7asxPO3bt1SO1avXg2ZTIY+ffqoxc2aNUstbsyYMeK57Oxs+Pr6wtHRESkpKZg3bx4iIiKwcuVKMebw4cMYMGAAgoKCcOrUKQQEBCAgIADnzp0r9bNwSoiIiEhqFTQl1LVrV3Tt2vWF5+3s7NQ+//LLL/D29ka9evXU2s3MzIrFFlm/fj3y8vKwevVqGBoaokmTJkhNTcX8+fMxYsQIAMCiRYvQpUsXTJo0CQAwe/ZsxMfHY+nSpYiJiSnVs7DCQkREJLUKmhIqi8zMTOzatQtBQUHFzs2dOxc1a9ZEixYtMG/ePBQUFIjnkpOT4enpCUNDQ7HNz88PaWlpuH//vhjj4+Oj1qefnx+Sk5NLPT5WWIiIiN4QSqUSSqVSrU0ul0Mul2vc99q1a2FmZobevXurtY8dOxYtW7aElZUVDh8+jPDwcNy6dQvz588HACgUCjg7O6tdY2trK56rUaMGFAqF2PbvGIVCUerxscJCREQkNZVKK0dkZCQsLCzUjsjISK0McfXq1fjoo49gZGSk1h4aGoqOHTuiWbNmGDlyJKKjo7FkyZJiiZPUWGEhIiKSmpamc8LDwxEaGqrWpo3qyh9//IG0tDRs3rz5lbFt27ZFQUEBMjIy4OLiAjs7O2RmZqrFFH0uWvfyopgXrYspCSssREREbwi5XA5zc3O1QxsJy6pVq9CqVSu4u7u/MjY1NRV6enqwsbEBAHh4eCApKQn5+fliTHx8PFxcXFCjRg0xJiEhQa2f+Ph4eHh4lHqMrLAQERFJrYJ2CeXk5ODy5cvi5/T0dKSmpsLKygp169YF8Gxb8tatWxEdHV3s+uTkZBw9ehTe3t4wMzNDcnIyQkJC8PHHH4vJyMCBAzFz5kwEBQUhLCwM586dw6JFi7BgwQKxn3HjxsHLywvR0dHw9/fHpk2bcOLECbWtz6/ChIWIiEhqFZSwnDhxAt7e3uLnoumkwMBAxMbGAgA2bdoEQRAwYMCAYtfL5XJs2rQJERERUCqVcHZ2RkhIiNq0lIWFBfbt24fg4GC0atUK1tbWmD59urilGQDeffddbNiwAVOnTsWUKVPQsGFDxMXFoWnTpqV+FpkgCEJZfwC6Lv/u1YoeApFOMnboUNFDINI5BXn/SH6PJ7sWaqUfY//xWunnTcQKCxERkdQkfodKVcCEhYiISGr88kONMWEhIiKSGissGuO2ZiIiItJ5rLAQERFJjVNCGmPCQkREJDVOCWmMU0JERESk81hhISIikhqnhDTGhIWIiEhqTFg0xikhIiIi0nmssBAREUmt8n0LTrljwkJERCQ1TglpjFNCREREpPNYYSEiIpIaKywaY8JCREQkNb44TmNMWIiIiKTGCovGuIaFiIiIdB4rLERERFLjtmaNMWEhIiKSGqeENMYpISIiItJ5rLAQERFJjRUWjTFhISIikhq3NWuMU0JERESk81hhISIikpig4i4hTTFhISIikhrXsGiMU0JERESk81hhISIikhoX3WqMCQsREZHUuIZFY0xYiIiIpMY1LBrjGhYiIiLSeaywEBERSY0VFo0xYSEiIpIav61ZY5wSIiIiIp3HhIVE3/2wGf2CxqKNT294+vfH2M9nIf3aDbUYpTIPX0YvQ/uuffGOTy+Mn/Il7mbdV4u5pbiNUROno/X7AfD0749vln6PgoJCtZi8vDws+jYWnXsHokXHHvDtE4htO/eK5y9fvYbxU76Eb59ANG3fFes2b5fuwYm0IGzyaCQf3oX799Jw88Zp/PzTKrz9dn21mOXLvkbahT/x6OFl3PrnDLb9vBouLuox73u/hz8O/oL799Jw4/opRM6ZAn19/fJ8FJKCSqWdowpjwkKiE6lnMaB3D2xYuQArF85BfkEBRoR8gcdPnooxXy/+Fol/HsX8L6cgdmkU7ty9h/FTvhTPFxYW4rNJM5CfX4AfY6Lx1dQJ+OW3eCz9fp3avSZMi8TRE6mYFT4eOzd+j6iZn8Opbm3x/BPlU9R2sMP4UUNhXbOG9A9PpCHPDu2wYsVatO/QA126DYBBNQP8tmsDTEyMxZiTJ89g+CehaNqsI7r5D4RMJsNvuzZCT+/Zf4qbNXPFjl9/wN59v6N1Gz8M/GgUunf3ReRXUyrqsUhbVIJ2jipMJgiVb2It/+7Vih5CpZB1/wE8uw9A7LIotG7uhkc5uejg3x9REZPh690BAHD12t/4YOAIrP92PtybNsYfyccRPDkCB375EdZWzxKNzdt3YcGK1fhj1yYYGBjg0JETmDRjLvZsXQMLc7NXjsO3TyAG9Q3AoH69JH3eqsDYoUNFD6HKsLa2guLmWXi/3xt/HDpaYoybW2OcStmPtxu9i6tXr+HL2Z+jU6cO8HjXX4zp7t8ZGzesgP1b7sjJyS2v4VcpBXn/SH6Px98M10o/JhO/10o/b6IKrbDcvXsXUVFR6NWrFzw8PODh4YFevXph3rx5uHPnTkUOjQDk5D4GADGp+CvtEgoKCtCudQsxpp5jHdjb2uD0uYsAgNPnLqBhPScxWQGA9m1bISf3MS6nXwMA/H7oCJo0aojV67fi/Z4fw7//cMxb+h2eKpXl9WhEkrOwMAfwLPEviYmJMYYM7oerV6/h779vAgDkhoZQPlX/ffDkyVMYGxujVctmko6XJCaotHOUUVJSEnr06AEHBwfIZDLExcWpnR8yZAhkMpna0aVLF7WYrKwsfPTRRzA3N4elpSWCgoKQk5OjFnPmzBl06NABRkZGqFOnDqKiooqNZevWrWjUqBGMjIzg5uaG3bt3l+lZKixhOX78ON5++20sXrwYFhYW8PT0hKenJywsLLB48WI0atQIJ06cqKjhVXkqlQpzF32LFs1c0bCeEwDg7r37MDCoBnOz6mqxNa0scTcr61lM1n3UtLIsdr7oegC4cVOBk2fO49LVa1gUOQ1hY0cg/vdD+PKbZZI+E1F5kclkmP/NTPz55zGcP5+mdm7kp4F4kPVfZD+4DL8u3ujSbQDy8/MBAPviE+Hh0Rr9+vWEnp4eHBzsMPWL8QAAO3ub8n4M0qYKmhLKzc2Fu7s7li178X9fu3Tpglu3bonHxo0b1c5/9NFHOH/+POLj47Fz504kJSVhxIgR4vns7Gz4+vrC0dERKSkpmDdvHiIiIrBy5Uox5vDhwxgwYACCgoJw6tQpBAQEICAgAOfOnSv1s1TYtuYxY8bgP//5D2JiYiCTydTOCYKAkSNHYsyYMUhOTn5pP0qlEsrn/maup1RCLpdrfcxVyZfRy3D5agZ+WPGN1vtWqVSQQYavZ0yGWXVTAMCkvHyETv0KUycGw4j/7OgNt2TxHDRp4gIv7+LTmBs2bsP+hCTY29kgNHQkNm6IgadXAJRKJeL3JyHs8y+xfOlcrF2zGEplHr6asxAdOrSDqoqvX6DX07VrV3Tt2vWlMXK5HHZ2diWeu3DhAvbs2YPjx4+jdevWAIAlS5agW7du+Oabb+Dg4ID169cjLy8Pq1evhqGhIZo0aYLU1FTMnz9fTGwWLVqELl26YNKkSQCA2bNnIz4+HkuXLkVMTEypnqXCKiynT59GSEhIsWQFePa3k5CQEKSmpr6yn8jISFhYWKgdXy8q3cNTyb6KXo6Dh49h9ZKvYWdTS2y3rlkD+fkFyH6kXgq8l/UA1lZWz2KsauBe1oNi54uuB4BaNa1gU6ummKwAQD2nOhAEAZm370rwRETlZ9HCL+HfzQc+vv/BP//cKnY+O/sRLl9Oxx+HjqJvvxFo5NIAAQH/K8EvXLQSNWs1hnP9NrC1d8OvO57tnku/eq3cnoG0T1CptHIolUpkZ2erHc//pb2sEhMTYWNjAxcXF4waNQr37t0TzyUnJ8PS0lJMVgDAx8cHenp6OHr0qBjj6ekJQ0NDMcbPzw9paWm4f/++GOPj46N2Xz8/v1cWJf6twhIWOzs7HDt27IXnjx07Bltb21f2Ex4ejocPH6odYeNGanOoVYYgCPgqejkSkg5j9eK5qO2gnnG7ujREtWrVcPREqtiWfu0GbmXehnvTRgAA96aNcelqBu79a94++fhJVDc1QX2nugCAFs1cceduFh4/fiLGXPv7H+jp6cHWxlq6BySS2KKFXyKgZxd09uuLjIy/XxlftGZAbli8qnjrViaePn2K/v0CcP36Pzh56qwUQ6byoqUpoZL+kh4ZGfnaw+rSpQt++OEHJCQk4Ouvv8bBgwfRtWtXFBY+exWFQqGAjY36dGS1atVgZWUFhUIhxjz/53XR51fFFJ0vjQqbEpo4cSJGjBiBlJQUdOrUSXyQzMxMJCQk4LvvvsM337x6OkIulxeb/snP49/SX8eX0cuwOz4Ri+dOh6mJMe7ee7YupXp1UxjJ5TCrbore3X0RteQ7WJibwdTUBHMWrIB708Zwb9oYAPBum5ao71QX4bPmIfSzINzLuo8lK39A/949xOzbv7M3YmI3Yuqc+QgO+hj3H2Yjetkq9PL3FaeD8vPzcSX9+v//ugCZd+7h4n+vwMTEGHVrO1TAT4fo5ZYsnoMB/QPQu88wPHqUA1vbZ9XJhw8f4enTp3B2rou+//kA8fEHcefuPdR+ywGTJwfjyZOn+G1PgtjPhNCR2LsvESqVCr0CumHypGD0HzgSqir+Do433mssmC1JeHg4QkND1do0WQLRv39/8ddubm5o1qwZ6tevj8TERHTq1Om1+5VChSUswcHBsLa2xoIFC7B8+XIxm9PX10erVq0QGxuLvn37VtTwqqTN23cBAIaODlNr/3JKKAL8OwMAwsZ+Cj09PYz/4kvk5+fj3TatMG1isBirr6+PZfMiMHveUnz8aSiMjeX4oKsPRg8fJMaYmBjju4VzMGf+CvQLGgcLCzN0ed8TY0YMFmNu383Ch0NHi59jN/6M2I0/o3ULN8QuLb76nKiijRoZCAA4kPCzWvuwoBD8sG4Lnj5V4r32bTB2zHDUqGGBzMy7+OPQEXTw6ok7d/5Xgu/i9z7CPx8LudwQZ85cQO8+w7Bn7+/l+iyku0r6S7o21atXD9bW1rh8+TI6deoEOzs73L59Wy2moKAAWVlZ4roXOzs7ZGZmqsUUfX5VzIvWzpREJ97Dkp+fj7t3n1VFrK2tYWBgoFl/fA8LUYn4Hhai4srjPSy5sz7SSj+m09e/9rUymQzbt29HQEDAC2Nu3LiBunXrIi4uDh988AEuXLgAV1dXnDhxAq1atQIA7Nu3D126dMGNGzfg4OCAFStW4IsvvkBmZqb45/eUKVOwbds2XLz47JUX/fr1w+PHj7Fjxw7xXu+++y6aNWum+4tu/83AwAD29vawt7fXOFkhIiLSORX0av6cnBykpqaKm1jS09ORmpqK69evIycnB5MmTcKRI0eQkZGBhIQE9OzZEw0aNICfnx8AoHHjxujSpQs++eQTHDt2DH/++SdGjx6N/v37w8Hh2fT8wIEDYWhoiKCgIJw/fx6bN2/GokWL1Kauxo0bhz179iA6OhoXL15EREQETpw4gdGjRxcb84voRIVF21hhISoZKyxExZVLhSVigFb6MY3Y+Oqgf0lMTIS3t3ex9sDAQKxYsQIBAQE4deoUHjx4AAcHB/j6+mL27NlqC2SzsrIwevRo7NixA3p6eujTpw8WL16M6tX/906uM2fOIDg4GMePH4e1tTXGjBmDsDD15QVbt27F1KlTkZGRgYYNGyIqKgrdunUr9bMwYSGqQpiwEBVXLgnL9P6vDioF01mbtNLPm6jCFt0SERFVGVraJVSV6cQaFiIiIqKXYYWFiIhIavxqBY0xYSEiIpKYwBf/aYxTQkRERKTzWGEhIiKSGqeENMaEhYiISGpMWDTGhIWIiEhq3NasMa5hISIiIp3HCgsREZHUOCWkMSYsREREEhOYsGiMU0JERESk81hhISIikhorLBpjwkJERCQ1vulWY5wSIiIiIp3HCgsREZHUOCWkMSYsREREUmPCojFOCREREZHOY4WFiIhIYoLACoummLAQERFJjVNCGmPCQkREJDUmLBrjGhYiIiLSeaywEBERSYzfJaQ5JixERERSY8KiMU4JERERkc5jhYWIiEhq/CohjTFhISIikhjXsGiOU0JERESk81hhISIikhorLBpjwkJERCQ1rmHRGKeEiIiISOexwkJERCQxLrrVHBMWIiIiqXFKSGNMWIiIiCTGCovmuIaFiIiIdB4TFiIiIqmptHSUUVJSEnr06AEHBwfIZDLExcWJ5/Lz8xEWFgY3NzeYmprCwcEBgwcPxs2bN9X6cHJygkwmUzvmzp2rFnPmzBl06NABRkZGqFOnDqKiooqNZevWrWjUqBGMjIzg5uaG3bt3l+lZmLAQERFJTFBp5yir3NxcuLu7Y9myZcXOPX78GCdPnsS0adNw8uRJbNu2DWlpafjggw+Kxc6aNQu3bt0SjzFjxojnsrOz4evrC0dHR6SkpGDevHmIiIjAypUrxZjDhw9jwIABCAoKwqlTpxAQEICAgACcO3eu1M8iEwSh0k2s5d+9WtFDINJJxg4dKnoIRDqnIO8fye9xr4eXVvqpuePga18rk8mwfft2BAQEvDDm+PHjaNOmDa5du4a6desCeFZhGT9+PMaPH1/iNStWrMAXX3wBhUIBQ0NDAMDnn3+OuLg4XLx4EQDQr18/5ObmYufOneJ17dq1Q/PmzRETE1Oq8bPCQkREJLUKmhIqq4cPH0Imk8HS0lKtfe7cuahZsyZatGiBefPmoaCgQDyXnJwMT09PMVkBAD8/P6SlpeH+/ftijI+Pj1qffn5+SE5OLvXYuEuIiIhIYq8znVMSpVIJpVKp1iaXyyGXyzXu++nTpwgLC8OAAQNgbm4uto8dOxYtW7aElZUVDh8+jPDwcNy6dQvz588HACgUCjg7O6v1ZWtrK56rUaMGFAqF2PbvGIVCUerxscJCRET0hoiMjISFhYXaERkZqXG/+fn56Nu3LwRBwIoVK9TOhYaGomPHjmjWrBlGjhyJ6OhoLFmypFjiJDVWWIiIiKSmpQpLeHg4QkND1do0ra4UJSvXrl3DgQMH1KorJWnbti0KCgqQkZEBFxcX2NnZITMzUy2m6LOdnZ34/yXFFJ0vDVZYiIiIJKatXUJyuRzm5uZqhyYJS1GycunSJezfvx81a9Z85TWpqanQ09ODjY0NAMDDwwNJSUnIz88XY+Lj4+Hi4oIaNWqIMQkJCWr9xMfHw8PDo9RjZYWFiIhIYtpaw1JWOTk5uHz5svg5PT0dqampsLKygr29PT788EOcPHkSO3fuRGFhobimxMrKCoaGhkhOTsbRo0fh7e0NMzMzJCcnIyQkBB9//LGYjAwcOBAzZ85EUFAQwsLCcO7cOSxatAgLFiwQ7ztu3Dh4eXkhOjoa/v7+2LRpE06cOKG29flVuK2ZqArhtmai4spjW/PtTtrZ1myTULZtzYmJifD29i7WHhgYiIiIiGKLZYv8/vvv6NixI06ePInPPvsMFy9ehFKphLOzMwYNGoTQ0FC1ys6ZM2cQHByM48ePw9raGmPGjEFYWJhan1u3bsXUqVORkZGBhg0bIioqCt26dSv1szBhIapCmLAQFVceCUumt3YSFtvfX/89LG86TgkRERFJTZBV9AjeeFx0S0RERDqPFRYiIiKJVdSi28qECQsREZHEBBWnhDTFKSEiIiLSeaywEBERSYxTQppjwkJERCQxgbuENMYpISIiItJ5rLAQERFJjFNCmmPCQkREJDHuEtIcExYiIiKJVb4vwSl/XMNCREREOo8VFiIiIolxSkhzTFiIiIgkxoRFc5wSIiIiIp3HCgsREZHEuOhWc0xYiIiIJMYpIc1xSoiIiIh0HissREREEuN3CWmuVAnLr7/+WuoOP/jgg9ceDBERUWXEV/NrrlQJS0BAQKk6k8lkKCws1GQ8RERERMWUKmFRqZgaEhERvS4Vp4Q0xjUsREREEuMaFs29VsKSm5uLgwcP4vr168jLy1M7N3bsWK0MjIiIqLLgtmbNlTlhOXXqFLp164bHjx8jNzcXVlZWuHv3LkxMTGBjY8OEhYiIiLSuzO9hCQkJQY8ePXD//n0YGxvjyJEjuHbtGlq1aoVvvvlGijESERG90QRBO0dVVuaEJTU1FRMmTICenh709fWhVCpRp04dREVFYcqUKVKMkYiI6I0mqGRaOaqyMicsBgYG0NN7dpmNjQ2uX78OALCwsMDff/+t3dERERER4TXWsLRo0QLHjx9Hw4YN4eXlhenTp+Pu3btYt24dmjZtKsUYiYiI3mjc1qy5MldY5syZA3t7ewDAV199hRo1amDUqFG4c+cOVq5cqfUBEhERvekEQaaVoyorc4WldevW4q9tbGywZ88erQ6IiIiI6Hl8cRwREZHEqvoOH20oc8Li7OwMmezFZamrV69qNCAiIqLKhmtYNFfmhGX8+PFqn/Pz83Hq1Cns2bMHkyZN0ta4iIiIiERlTljGjRtXYvuyZctw4sQJjQdERERU2VT1BbPaUOZdQi/StWtX/Pzzz9rqjoiIqNLgm241p7WE5aeffoKVlZW2uiMiIqo0VIJMK0dZJSUloUePHnBwcIBMJkNcXJzaeUEQMH36dNjb28PY2Bg+Pj64dOmSWkxWVhY++ugjmJubw9LSEkFBQcjJyVGLOXPmDDp06AAjIyPx7ffP27p1Kxo1agQjIyO4ublh9+7dZXqWMicsLVq0QMuWLcWjRYsWsLe3x5QpU/hqfiIiIh2Sm5sLd3d3LFu2rMTzUVFRWLx4MWJiYnD06FGYmprCz88PT58+FWM++ugjnD9/HvHx8di5cyeSkpIwYsQI8Xx2djZ8fX3h6OiIlJQUzJs3DxEREWrvZjt8+DAGDBiAoKAgnDp1CgEBAQgICMC5c+dK/SwyQShbkSkiIkJtl5Cenh5q1aqFjh07olGjRmXpSjK1LFwqeghEOun+k5xXBxFVMQV5/0h+j+Nv9dJKP+/8s/21r5XJZNi+fTsCAgIAPKuuODg4YMKECZg4cSIA4OHDh7C1tUVsbCz69++PCxcuwNXVFcePHxffw7Znzx5069YNN27cgIODA1asWIEvvvgCCoUChoaGAIDPP/8ccXFxuHjxIgCgX79+yM3Nxc6dO8XxtGvXDs2bN0dMTEypxl/mRbcRERFlvYSIiKhK09a2ZqVSCaVSqdYml8shl8vL3Fd6ejoUCgV8fHzENgsLC7Rt2xbJycno378/kpOTYWlpqfbSWB8fH+jp6eHo0aPo1asXkpOT4enpKSYrAODn54evv/4a9+/fR40aNZCcnIzQ0FC1+/v5+RWbonqZMk8J6evr4/bt28Xa7927B319/bJ2R0RERKUUGRkJCwsLtSMyMvK1+lIoFAAAW1tbtXZbW1vxnEKhgI2Njdr5atWqwcrKSi2mpD7+fY8XxRSdL40yV1heNIOkVCrVsisiIiJ6RlsbfMLDw4tVKl6nuvImKnXCsnjxYgDP5sC+//57VK9eXTxXWFiIpKQknVnDQkREpEu0NSX0utM/JbGzswMAZGZmil9qXPS5efPmYszzsyoFBQXIysoSr7ezs0NmZqZaTNHnV8UUnS+NUicsCxYsAPCswhITE6M2/WNoaAgnJ6dSL5whIiKiiuXs7Aw7OzskJCSICUp2djaOHj2KUaNGAQA8PDzw4MEDpKSkoFWrVgCAAwcOQKVSoW3btmLMF198gfz8fBgYGAAA4uPj4eLigho1aogxCQkJam/Lj4+Ph4eHR6nHW+qEJT09HQDg7e2Nbdu2iYMgIiKil6uoN93m5OTg8uXL4uf09HSkpqbCysoKdevWxfjx4/Hll1+iYcOGcHZ2xrRp0+Dg4CDuJGrcuDG6dOmCTz75BDExMcjPz8fo0aPRv39/ODg4AAAGDhyImTNnIigoCGFhYTh37hwWLVokFjqAZ2/J9/LyQnR0NPz9/bFp0yacOHFCbevzq5R5W/ObgNuaiUrGbc1ExZXHtuY/7D7USj8dFD+VKT4xMRHe3t7F2gMDAxEbGwtBEDBjxgysXLkSDx48wHvvvYfly5fj7bffFmOzsrIwevRo7NixA3p6eujTpw8WL16stjTkzJkzCA4OxvHjx2FtbY0xY8YgLCxM7Z5bt27F1KlTkZGRgYYNGyIqKgrdunUr9bOUOWHp06cP2rRpU2wgUVFROH78OLZu3VqW7iTBhIWoZExYiIqrzAlLZVLmbc1JSUklZkRdu3ZFUlKSVgZFRERUmQiQaeWoysq8rTknJ6fE7csGBgbIzs7WyqCIiIgqE1WlW3xR/spcYXFzc8PmzZuLtW/atAmurq5aGRQREVFlooJMK0dVVuYKy7Rp09C7d29cuXIF77//PgAgISEBGzZswE8/Vd25NSIiIpJOmROWHj16IC4uDnPmzMFPP/0EY2NjuLu748CBA7CyspJijERERG+0qr7+RBvKnLAAgL+/P/z9/QE8e8nMxo0bMXHiRKSkpKCwsFCrAyQiInrTqSp6AJVAmdewFElKSkJgYCAcHBwQHR2N999/H0eOHNHm2IiIiIgAlLHColAoEBsbi1WrViE7Oxt9+/aFUqlEXFwcF9wSERG9AKeENFfqCkuPHj3g4uKCM2fOYOHChbh58yaWLFki5diIiIgqBZWWjqqs1BWW3377DWPHjsWoUaPQsGFDKcdEREREpKbUFZZDhw7h0aNHaNWqFdq2bYulS5fi7t27Uo6NiIioUmCFRXOlTljatWuH7777Drdu3cKnn36KTZs2wcHBASqVCvHx8Xj06JGU4yQiInpj8dX8mivzLiFTU1MMGzYMhw4dwtmzZzFhwgTMnTsXNjY2+OCDD6QYIxEREVVxr72tGQBcXFwQFRWFGzduYOPGjdoaExERUaWikmnnqMpe68Vxz9PX10dAQAACAgK00R0REVGlUtW/B0gbtJKwEBER0Yvxy5o1p9GUEBEREVF5YIWFiIhIYlV9S7I2MGEhIiKSmErGNSya4pQQERER6TxWWIiIiCTGRbeaY8JCREQkMa5h0RynhIiIiEjnscJCREQksar+llptYMJCREQkMb7pVnOcEiIiIiKdxwoLERGRxLhLSHNMWIiIiCTGNSyaY8JCREQkMW5r1hzXsBAREZHOY4WFiIhIYlzDojkmLERERBLjGhbNcUqIiIiIdB4rLERERBLjolvNMWEhIiKSGBMWzXFKiIiIqBJycnKCTCYrdgQHBwMAOnbsWOzcyJEj1fq4fv06/P39YWJiAhsbG0yaNAkFBQVqMYmJiWjZsiXkcjkaNGiA2NhYSZ6HFRYiIiKJCRWw6Pb48eMoLCwUP587dw6dO3fGf/7zH7Htk08+waxZs8TPJiYm4q8LCwvh7+8POzs7HD58GLdu3cLgwYNhYGCAOXPmAADS09Ph7++PkSNHYv369UhISMDw4cNhb28PPz8/rT4PExYiIiKJVcSUUK1atdQ+z507F/Xr14eXl5fYZmJiAjs7uxKv37dvH/766y/s378ftra2aN68OWbPno2wsDBERETA0NAQMTExcHZ2RnR0NACgcePGOHToEBYsWKD1hIVTQkRERJVcXl4efvzxRwwbNgwy2f/KPevXr4e1tTWaNm2K8PBwPH78WDyXnJwMNzc32Nraim1+fn7Izs7G+fPnxRgfHx+1e/n5+SE5OVnrz8AKCxERkcS0VWFRKpVQKpVqbXK5HHK5/KXXxcXF4cGDBxgyZIjYNnDgQDg6OsLBwQFnzpxBWFgY0tLSsG3bNgCAQqFQS1YAiJ8VCsVLY7Kzs/HkyRMYGxu/1nOWhAkLERGRxLT1ptvIyEjMnDlTrW3GjBmIiIh46XWrVq1C165d4eDgILaNGDFC/LWbmxvs7e3RqVMnXLlyBfXr19fSiLWHCQsREZHEtPWm2/DwcISGhqq1vaq6cu3aNezfv1+snLxI27ZtAQCXL19G/fr1YWdnh2PHjqnFZGZmAoC47sXOzk5s+3eMubm5VqsrANewEBERvTHkcjnMzc3VjlclLGvWrIGNjQ38/f1fGpeamgoAsLe3BwB4eHjg7NmzuH37thgTHx8Pc3NzuLq6ijEJCQlq/cTHx8PDw6Osj/ZKTFiIiIgkptLSUeb7qlRYs2YNAgMDUa3a/yZVrly5gtmzZyMlJQUZGRn49ddfMXjwYHh6eqJZs2YAAF9fX7i6umLQoEE4ffo09u7di6lTpyI4OFhMkkaOHImrV69i8uTJuHjxIpYvX44tW7YgJCTkNUb7ckxYiIiIJFZRCcv+/ftx/fp1DBs2TK3d0NAQ+/fvh6+vLxo1aoQJEyagT58+2LFjhxijr6+PnTt3Ql9fHx4eHvj4448xePBgtfe2ODs7Y9euXYiPj4e7uzuio6Px/fffa31LMwDIBEGodN96XcvCpaKHQKST7j/JqeghEOmcgrx/JL9HdN2PtdLPhOs/aqWfNxEX3RIREUms0lUGKgATFiIiIolpa5dQVcY1LERERKTzWGEhIiKSWEV8l1Blw4SFiIhIYlzDojlOCREREZHOY4WFiIhIYirWWDTGhIWIiEhiXMOiOSYsREREEmN9RXNcw0JEREQ6jxUWIiIiiXFKSHNMWIiIiCTGN91qjlNCREREpPNYYSEiIpIYtzVrjgkLERGRxJiuaI5TQkRERKTzWGEhIiKSGHcJaY4JCxERkcS4hkVznBIiIiIinccKCxERkcRYX9EcExYiIiKJcQ2L5piwEBERSYxrWDTHNSxERESk81hhISIikhjrK5pjwkJERCQxrmHRHKeEiIiISOexwkJERCQxgZNCGmPCQkREJDFOCWmOU0JERESk81hhISIikhjfw6I5JixEREQSY7qiOU4JERERkc5jwkIv5fFua/y4aQXOXvwDdx6moat/J/FctWrVMG3mRBw8/Csybp7C2Yt/YGnM17C1s1HrY93GFTh17nf8nXkG59L+wLJvo4rF9OzVFb//EYdrt1Jx8uwBBI8NKpfnI9KWDu+1Rdz2WFzPSEFB3j/44AM/tfM2NtZY9f0CXM9IQfaDy9i140c0aOCsFjM86CMkxG9F1t2LKMj7BxYW5uX5CCQhFQStHFUZExZ6KRMTE5w/l4awiTOLnTM2MUIzd1fMn7cCnTx7Y8jHo9GgoTN+3LRCLe7QH0cwfMh4eLTugqGDxsLJuQ5W/7BIPN/JxxMrvpuH2DWb4OnRHWETZmLkZ0MQ9MlHkj8fkbaYmprgzJm/MGbcFyWe3/bTatRzrovefYahdRs/XLv+D/b+tgkmJsZijImJMfbuS8Tcr5eU17CpnKi0dFRlMkEQKl3KVsvCpaKHUCndeZiGwQM/w2+7El4Y07ylG+J//wnNm3TEPzdulRjj1/V9/LBhGd6q5YaCggLEfP8NDAwMEBQ4TowZPuJjjB43HM2bdNT2Y1Rp95/kVPQQqoSCvH/Q+8Nh+PXXvQCAhg3r4cL5P9CsuTf++uu/AACZTIZ//k7F1GlzsXrNRrXrvTw9kLD/J9Ss1RgPH2aX+/irmoK8fyS/x3CnD7XSz/cZP2mlnzcRKyykVebm1aFSqV74H1nLGhb4sG8PHD96CgUFBQAAudwQT58q1eKePn2Kt2rbo07dtyQfM5HU5HJDAFD791wQBCiVeWjfvk1FDYvojaLTCcvff/+NYcOGvTRGqVQiOztb7RCEql44qxhyuSGmz5yIbT/tQs6jXLVz02ZORMbNU7iUcQxv1bbHoAGfiecOJByCf4/O6ODVDjKZDPXqO2HU6Gf/3G1ta5XrMxBJ4eLFy7h27Qa++jIclpYWMDAwwKSJn6FOHQfYP7eeiyqnipgSioiIgEwmUzsaNWoknn/69CmCg4NRs2ZNVK9eHX369EFmZqZaH9evX4e/vz9MTExgY2ODSZMmiX/ZLJKYmIiWLVtCLpejQYMGiI2NLeNIS0enE5asrCysXbv2pTGRkZGwsLBQOx4rs8pphFSkWrVq+D52EWQyGSaFzih2ftmiVXi/Qy98GDAUqkIVln37tXhuXewWrPpuPdZv/hY3757DnoTN2P7zLgCASsXkk958BQUF+E/f4WjYsB7u3v4Ljx5eRkevd/Hbbwn8d7yKELT0v7Jq0qQJbt26JR6HDh0Sz4WEhGDHjh3YunUrDh48iJs3b6J3797i+cLCQvj7+yMvLw+HDx/G2rVrERsbi+nTp4sx6enp8Pf3h7e3N1JTUzF+/HgMHz4ce/fu1ewHVoIKfQ/Lr7/++tLzV69efWUf4eHhCA0NVWurV7uVRuOisnmWrCxE7ToO6N0jsFh1BQCysu4jK+s+rl7JwH/TruDMhSS0fqc5ThxPBQDMnvENvpo5Hza21rh39z48vTwAANcy/i7PRyGSzMlTZ9H6HV+Ym5vB0NAAd+9m4fChHTiRcqaih0aVWLVq1WBnZ1es/eHDh1i1ahU2bNiA999/HwCwZs0aNG7cGEeOHEG7du2wb98+/PXXX9i/fz9sbW3RvHlzzJ49G2FhYYiIiIChoSFiYmLg7OyM6OhoAEDjxo1x6NAhLFiwAH5+fsXuq9GzaLW3MgoICIBMJsPL1v3KZLKX9iGXyyGXy5+7RqcLR5VKUbJSr74jenUfjPv3H7zyGj29Z/98DP9/Xr+ISqWC4tZtAECvD/1x7OhJ3Lt3X+tjJqpI2dmPAAANGjijVSt3zIiYV8EjovKgrTqaUqmEUqm+5q+kPweLXLp0CQ4ODjAyMoKHhwciIyNRt25dpKSkID8/Hz4+PmJso0aNULduXSQnJ6Ndu3ZITk6Gm5sbbG1txRg/Pz+MGjUK58+fR4sWLZCcnKzWR1HM+PHjtfTE/1Ohf7Lb29tj27ZtUKlUJR4nT56syOERnm3VbOrWCE3dns171nWsjaZujfBWbXtUq1YNq39YjOYtmmLUJxOhr68PGxtr2NhYw8DAAADQslUzBH3yEZq6NULtOg54z7MdVq6aj/Sr13Di2CkAgJVVDQQO648GDeuhqVsjfDX3C3wQ0AVTP59TYc9NVFampiZwd28Cd/cmAABnp7pwd2+COnUcAAB9+nSHl6cHnJ3rokcPX+zZvRG//LoH8fuTxD5sbWvB3b0J6td3AgC4NW0Ed/cmqFHDsrwfh7RMJQhaOUpaBhEZGVniPdu2bYvY2Fjs2bMHK1asQHp6Ojp06IBHjx5BoVDA0NAQlpaWatfY2tpCoVAAABQKhVqyUnS+6NzLYrKzs/HkyRNt/OhEFVphadWqFVJSUtCzZ88Sz7+q+kLSc2/RFL/sWid+/jJyCgBg0/ptiJq7VHyRXOKf6tN7Pf0H4fChY3jy5Cn8P/DF5CljYGJigszMOziw/w/MH7IceXn5Yny/AQGYOXsyIJPhxPFUBPgPwqmTZ8vhCYm0o3UrdyTs/9+W0+hvIgAAa3/YgqDhIbC3s8E3UTNga2uNW7du48f1P+HLrxaq9fHpiEGYPm2C+Dnx9+0AgGFBIfhh3RbJn4F0X0nLIF5UXenatav462bNmqFt27ZwdHTEli1bYGxsXOI1uqxCE5ZJkyYhN7f4eociDRo0wO+//16OI6LnHT507KXvtXnVO28u/PVf9O4R+NKYrKz76Na5/2uNj0hXHExKRjXDF2/DX7psNZYuW/3SPmbNno9Zs+dre2ikA7T1V++XTf+8iqWlJd5++21cvnwZnTt3Rl5eHh48eKBWZcnMzBTXvNjZ2eHYsWNqfRTtIvp3zPM7izIzM2Fubq71pKhCp4Q6dOiALl26vPC8qakpvLy8ynFERERE2qcLr+bPycnBlStXYG9vj1atWsHAwAAJCf97EWhaWhquX78OD49nmx48PDxw9uxZ3L59W4yJj4+Hubk5XF1dxZh/91EUU9SHNnF1KhERUSU0ceJEHDx4EBkZGTh8+DB69eoFfX19DBgwABYWFggKCkJoaCh+//13pKSkYOjQofDw8EC7du0AAL6+vnB1dcWgQYNw+vRp7N27F1OnTkVwcLBY5Rk5ciSuXr2KyZMn4+LFi1i+fDm2bNmCkJAQrT9PhU4JERERVQWv8w4VTd24cQMDBgzAvXv3UKtWLbz33ns4cuQIatV69kLOBQsWQE9PD3369IFSqYSfnx+WL18uXq+vr4+dO3di1KhR8PDwgKmpKQIDAzFr1iwxxtnZGbt27UJISAgWLVqE2rVr4/vvv9f6lmaA3yVEVKXwu4SIiiuP7xLq5xiglX42X4vTSj9vIlZYiIiIJKbp+hPiGhYiIiJ6A7DCQkREJLGKWMNS2TBhISIikhi/4lJznBIiIiIinccKCxERkcQq4YbccseEhYiISGLcJaQ5TgkRERGRzmOFhYiISGJcdKs5JixEREQS47ZmzXFKiIiIiHQeKyxEREQS46JbzTFhISIikhi3NWuOCQsREZHEuOhWc1zDQkRERDqPFRYiIiKJcZeQ5piwEBERSYyLbjXHKSEiIiLSeaywEBERSYy7hDTHhIWIiEhinBLSHKeEiIiISOexwkJERCQx7hLSHBMWIiIiiam4hkVjnBIiIiIinccKCxERkcRYX9EcExYiIiKJcZeQ5piwEBERSYwJi+a4hoWIiIh0HissREREEuObbjXHhIWIiEhinBLSHKeEiIiISOexwkJERCQxvulWc0xYiIiIJMY1LJrjlBARERHpPFZYiIiIJMZFt5pjhYWIiEhigiBo5SiLyMhIvPPOOzAzM4ONjQ0CAgKQlpamFtOxY0fIZDK1Y+TIkWox169fh7+/P0xMTGBjY4NJkyahoKBALSYxMREtW7aEXC5HgwYNEBsb+1o/p5dhwkJERFQJHTx4EMHBwThy5Aji4+ORn58PX19f5ObmqsV98sknuHXrlnhERUWJ5woLC+Hv74+8vDwcPnwYa9euRWxsLKZPny7GpKenw9/fH97e3khNTcX48eMxfPhw7N27V6vPIxMq4UqgWhYuFT0EIp10/0lORQ+BSOcU5P0j+T3c7d7VSj+nFYdf+9o7d+7AxsYGBw8ehKenJ4BnFZbmzZtj4cKFJV7z22+/oXv37rh58yZsbW0BADExMQgLC8OdO3dgaGiIsLAw7Nq1C+fOnROv69+/Px48eIA9e/a89nifxwoLERGRxAQt/U8TDx8+BABYWVmpta9fvx7W1tZo2rQpwsPD8fjxY/FccnIy3NzcxGQFAPz8/JCdnY3z58+LMT4+Pmp9+vn5ITk5WaPxPo+LbomIiCSm0tJkhlKphFKpVGuTy+WQy+Uvv79KhfHjx6N9+/Zo2rSp2D5w4EA4OjrCwcEBZ86cQVhYGNLS0rBt2zYAgEKhUEtWAIifFQrFS2Oys7Px5MkTGBsbv97DPocJCxER0RsiMjISM2fOVGubMWMGIiIiXnpdcHAwzp07h0OHDqm1jxgxQvy1m5sb7O3t0alTJ1y5cgX169fX2ri1gQkLERGRxLT1ptvw8HCEhoaqtb2qujJ69Gjs3LkTSUlJqF279ktj27ZtCwC4fPky6tevDzs7Oxw7dkwtJjMzEwBgZ2cn/n9R279jzM3NtVZdAbiGhYiISHIqQdDKIZfLYW5urna8KGERBAGjR4/G9u3bceDAATg7O79ynKmpqQAAe3t7AICHhwfOnj2L27dvizHx8fEwNzeHq6urGJOQkKDWT3x8PDw8PF7nR/VCTFiIiIgqoeDgYPz444/YsGEDzMzMoFAooFAo8OTJEwDAlStXMHv2bKSkpCAjIwO//vorBg8eDE9PTzRr1gwA4OvrC1dXVwwaNAinT5/G3r17MXXqVAQHB4uJ0siRI3H16lVMnjwZFy9exPLly7FlyxaEhIRo9Xm4rZmoCuG2ZqLiymNbcyObd7TSz8Xbx0sdK5PJSmxfs2YNhgwZgr///hsff/wxzp07h9zcXNSpUwe9evXC1KlTYW5uLsZfu3YNo0aNQmJiIkxNTREYGIi5c+eiWrX/rSpJTExESEgI/vrrL9SuXRvTpk3DkCFDXvs5S3weJixEVQcTFqLiyiNhebtWa6308987J7TSz5uIU0JERESk87hLiIiISGLa2iVUlTFhISIikpi2XhxXlXFKiIiIiHQeKyxEREQS45SQ5piwEBERSUwQVBU9hDceExYiIiKJqVhh0RjXsBAREZHOY4WFiIhIYpXwHa3ljgkLERGRxDglpDlOCREREZHOY4WFiIhIYpwS0hwTFiIiIonxTbea45QQERER6TxWWIiIiCTGN91qjgkLERGRxLiGRXOcEiIiIiKdxwoLERGRxPgeFs0xYSEiIpIYp4Q0x4SFiIhIYtzWrDmuYSEiIiKdxwoLERGRxDglpDkmLERERBLjolvNcUqIiIiIdB4rLERERBLjlJDmmLAQERFJjLuENMcpISIiItJ5rLAQERFJjF9+qDkmLERERBLjlJDmOCVEREREOo8VFiIiIolxl5DmmLAQERFJjGtYNMeEhYiISGKssGiOa1iIiIhI57HCQkREJDFWWDTHhIWIiEhiTFc0xykhIiIi0nkygXUqkohSqURkZCTCw8Mhl8srejhEOoO/N4jKjgkLSSY7OxsWFhZ4+PAhzM3NK3o4RDqDvzeIyo5TQkRERKTzmLAQERGRzmPCQkRERDqPCQtJRi6XY8aMGVxUSPQc/t4gKjsuuiUiIiKdxwoLERER6TwmLERERKTzmLAQERGRzmPCQkRERDqPCQtJZtmyZXBycoKRkRHatm2LY8eOVfSQiCpUUlISevToAQcHB8hkMsTFxVX0kIjeGExYSBKbN29GaGgoZsyYgZMnT8Ld3R1+fn64fft2RQ+NqMLk5ubC3d0dy5Ytq+ihEL1xuK2ZJNG2bVu88847WLp0KQBApVKhTp06GDNmDD7//PMKHh1RxZPJZNi+fTsCAgIqeihEbwRWWEjr8vLykJKSAh8fH7FNT08PPj4+SE5OrsCRERHRm4oJC2nd3bt3UVhYCFtbW7V2W1tbKBSKChoVERG9yZiwEBERkc5jwkJaZ21tDX19fWRmZqq1Z2Zmws7OroJGRUREbzImLKR1hoaGaNWqFRISEsQ2lUqFhIQEeHh4VODIiIjoTVWtogdAlVNoaCgCAwPRunVrtGnTBgsXLkRubi6GDh1a0UMjqjA5OTm4fPmy+Dk9PR2pqamwsrJC3bp1K3BkRLqP25pJMkuXLsW8efOgUCjQvHlzLF68GG3btq3oYRFVmMTERHh7exdrDwwMRGxsbPkPiOgNwoSFiIiIdB7XsBAREZHOY8JCREREOo8JCxEREek8JixERESk85iwEBERkc5jwkJEREQ6jwkLERER6TwmLESV0JAhQxAQECB+7tixI8aPH1/u40hMTIRMJsODBw/K/d5EVLkwYSEqR0OGDIFMJoNMJoOhoSEaNGiAWbNmoaCgQNL7btu2DbNnzy5VLJMMItJF/C4honLWpUsXrFmzBkqlErt370ZwcDAMDAwQHh6uFpeXlwdDQ0Ot3NPKykor/RARVRRWWIjKmVwuh52dHRwdHTFq1Cj4+Pjg119/FadxvvrqKzg4OMDFxQUA8Pfff6Nv376wtLSElZUVevbsiYyMDLG/wsJChIaGwtLSEjVr1sTkyZPx/DduPD8lpFQqERYWhjp16kAul6NBgwZYtWoVMjIyxO+6qVGjBmQyGYYMGQLg2TduR0ZGwtnZGcbGxnB3d8dPP/2kdp/du3fj7bffhrGxMby9vdXGSUSkCSYsRBXM2NgYeXl5AICEhASkpaUhPj4eO3fuRH5+Pvz8/GBmZoY//vgDf/75J6pXr44uXbqI10RHRyM2NharV6/GoUOHkJWVhe3bt7/0noMHD8bGjRuxePFiXLhwAd9++y2qV6+OOnXq4OeffwYApKWl4datW1i0aBEAIDIyEj/88ANiYmJw/vx5hISE4OOPP8bBgwcBPEusevfujR49eiA1NRXDhw/H559/LtWPjYiqGoGIyk1gYKDQs2dPQRAEQaVSCfHx8YJcLhcmTpwoBAYGCra2toJSqRTj161bJ7i4uAgqlUpsUyqVgrGxsbB3715BEATB3t5eiIqKEs/n5+cLtWvXFu8jCILg5eUljBs3ThAEQUhLSxMACPHx8SWO8ffffxcACPfv3xfbnj59KpiYmAiHDx9Wiw0KChIGDBggCIIghIeHC66urmrnw8LCivVFRPQ6uIaFqJzt3LkT1atXR35+PlQqFQYOHIiIiAgEBwfDzc1Nbd3K6dOncfnyZZiZman18fTpU1y5cgUPHz7ErVu30LZtW/FctWrV0Lp162LTQkVSU1Ohr68PLy+vUo/58uXLePz4MTp37qzWnpeXhxYtWgAALly4oDYOAPDw8Cj1PYiIXoYJC1E58/b2xooVK2BoaAgHBwdUq/a/34ampqZqsTk5OWjVqhXWr19frJ9atWq91v2NjY3LfE1OTg4AYNeuXXjrrbfUzsnl8tcaBxFRWTBhISpnpqamaNCgQaliW7Zsic2bN8PGxgbm5uYlxtjb2+Po0aPw9PQEABQUFCAlJQUtW7YsMd7NzQ0qlQoHDx6Ej49PsfNFFZ7CwkKxzdXVFXK5HNevX39hZaZx48b49ddf1dqOHDny6ockIioFLrol0mEfffQRrK2t0bNnT/zxxx9IT09HYmIixo4dixs3bgAAxo0bh7lz5yIuLg4XL17EZ5999tJ3qDg5OSEwMBDDhg1DXFyc2OeWLVsAAI6OjpDJZNi5cyfu3LmDnJwcmJmZYeLEiQgJCcHatWtx5coVnDx5EkuWLMHatWsBACNHjsSlS5cwadIkpKWlYcOGDYiNjZX6R0REVQQTFiIdZmJigqSkJNStWxe9e/dG48aNERQUhKdPn4oVlwkTJmDQoEEIDAyEh4cHzMzM0KtXr5f2u2LFCnz44Yf47LPP0KhRI3zyySfIzc0FALz11luYOXMmPv/8c9ja2mL06NEAgNmzZ2PatGmIjIxE48aN0aVLF+zatQvOzs4AgLp16+Lnn39GXFwc3N3dERMTgzlz5kj40yGiqkQmvGhlHhEREZGOYIWFiIiIdB4TFiIiItJ5TFiIiIhI5zFhISIiIp3HhIWIiIh0HhMWIiIi0nlMWIiIiEjnMWEhIiIinceEhYiIiHQeExYiIiLSeUxYiIiISOcxYSEiIiKd93+RCc/IigAfVAAAAABJRU5ErkJggg==",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "forest_cm = confusion_matrix(y_test, y_pred)\n",
+ "print(forest_cm)\n",
+ "\n",
+ "sns.heatmap(forest_cm, annot = True, fmt = 'd')# annot = True: show the numbers in each heatmap cell\n",
+ " # fmt = 'd': show numbers as integers. \n",
+ "plt.xlabel('Predicted')\n",
+ "plt.ylabel('Actual')\n",
+ "plt.title('Confusion Matrix')\n",
+ "plt.show()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### SVM\n",
+ "- 提示:from sklearn.svm import SVC('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 30,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "[0 0 0 ... 0 0 0]\n",
+ "The sore: 0.9341923607915325\n"
+ ]
+ }
+ ],
+ "source": [
+ "## your code here\n",
+ "from sklearn.svm import SVC\n",
+ "from sklearn.metrics import confusion_matrix\n",
+ "\n",
+ "svm = SVC()\n",
+ "svm.fit(X_train, y_train)\n",
+ "\n",
+ "y_pred = svm.predict(X_test)\n",
+ "sore_svm= svm.score(X_test, y_test) # Return the mean accuracy on the given test data and labels\n",
+ "print(y_predicted)\n",
+ "print('The sore:', sore_svm)\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 31,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "[[20300 0]\n",
+ " [ 1430 0]]\n"
+ ]
+ },
+ {
+ "data": {
+ "image/png": "",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "svm_cm = confusion_matrix(y_test, y_pred)\n",
+ "print(svm_cm)\n",
+ "\n",
+ "sns.heatmap(svm_cm, annot = True, fmt = 'd')# annot = True: show the numbers in each heatmap cell\n",
+ " # fmt = 'd': show numbers as integers. \n",
+ "plt.xlabel('Predicted')\n",
+ "plt.ylabel('Actual')\n",
+ "plt.title('Confusion Matrix')\n",
+ "plt.show()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### KNN\n",
+ "- 提示:from sklearn.neighbors import KNeighborsClassifier('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 38,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "The sore: 0.9332719742291763\n"
+ ]
+ }
+ ],
+ "source": [
+ "## your code here\n",
+ "from sklearn.neighbors import KNeighborsClassifier\n",
+ "\n",
+ "knn = KNeighborsClassifier()\n",
+ "knn.fit(X_train, y_train)\n",
+ "\n",
+ "y_pred = knn.predict(X_test)\n",
+ "sore_knn= knn.score(X_test, y_test) # Return the mean accuracy on the given test data and labels\n",
+ "print('The sore:', sore_knn)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 39,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "[[20251 49]\n",
+ " [ 1401 29]]\n"
+ ]
+ },
+ {
+ "data": {
+ "image/png": "",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "knn_cm = confusion_matrix(y_test, y_pred)\n",
+ "print(knn_cm)\n",
+ "\n",
+ "sns.heatmap(knn_cm, annot = True, fmt = 'd')# annot = True: show the numbers in each heatmap cell\n",
+ " # fmt = 'd': show numbers as integers. \n",
+ "plt.xlabel('Predicted')\n",
+ "plt.ylabel('Actual')\n",
+ "plt.title('Confusion Matrix')\n",
+ "plt.show()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "---\n",
+ "\n",
+ "## 练习3:在测试集上进行预测,计算准确度"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Logistic regression\n",
+ "- 提示:y_pred_LR = clf_LR.predict(x_test)('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 42,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0.935020708697653"
+ ]
+ },
+ "execution_count": 42,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "## your code here\n",
+ "from sklearn.metrics import accuracy_score\n",
+ "accuracy_score(y_test, model.predict(X_test))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Decision Tree\n",
+ "- 提示:y_pred_tree = tree.predict(x_test)('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 43,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0.9353428439944776"
+ ]
+ },
+ "execution_count": 43,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "## your code here\n",
+ "accuracy_score(y_test, D_tree.predict(X_test))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Random Forest\n",
+ "- 提示:y_pred_forest = forest.predict(x_test)('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 44,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0.9319834330418776"
+ ]
+ },
+ "execution_count": 44,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "## your code here\n",
+ "accuracy_score(y_test, forest.predict(X_test))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### SVM\n",
+ "- 提示:y_pred_SVC = clf_svc.predict(x_test)('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 46,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0.9341923607915325"
+ ]
+ },
+ "execution_count": 46,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "## your code here\n",
+ "accuracy_score(y_test, svm.predict(X_test))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### KNN\n",
+ "- 提示:y_pred_KNN = neigh.predict(x_test)('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 47,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0.9332719742291763"
+ ]
+ },
+ "execution_count": 47,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "## your code here\n",
+ "accuracy_score(y_test, knn.predict(X_test))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "---\n",
+ "## 练习4:查看sklearn的官方说明,了解分类问题的评估标准,并对此例进行评估"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**混淆矩阵(Confusion Matrix)相关学习链接**\n",
+ "\n",
+ "- Blog: \n",
+ "http://blog.csdn.net/vesper305/article/details/44927047 \n",
+ "- WiKi: \n",
+ "http://en.wikipedia.org/wiki/Confusion_matrix \n",
+ "- sklearn doc: \n",
+ "http://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 50,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Confusion Matrix of logistic regression:\n",
+ "[[20247 53]\n",
+ " [ 1359 71]]\n",
+ "Confusion Matrix of Decision Tree:\n",
+ "[[20091 209]\n",
+ " [ 1196 234]]\n",
+ "Confusion Matrix of Random Forest:\n",
+ "[[20061 239]\n",
+ " [ 1239 191]]\n",
+ "Confusion Matrix of SVM:\n",
+ "[[20300 0]\n",
+ " [ 1430 0]]\n",
+ "Confusion Matrix of KNN:\n",
+ "[[20251 49]\n",
+ " [ 1401 29]]\n"
+ ]
+ }
+ ],
+ "source": [
+ "## your code here\n",
+ "## confusion Matrix 已经写在练习2中\n",
+ "\n",
+ "# def \n",
+ "print('Confusion Matrix of logistic regression:')\n",
+ "print(cm)\n",
+ "\n",
+ "print('Confusion Matrix of Decision Tree:')\n",
+ "print(D_tree_cm)\n",
+ "\n",
+ "print('Confusion Matrix of Random Forest:')\n",
+ "print(forest_cm)\n",
+ "\n",
+ "print('Confusion Matrix of SVM:')\n",
+ "print(svm_cm)\n",
+ "\n",
+ "print('Confusion Matrix of KNN:')\n",
+ "print(knn_cm)\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 练习5:调整模型的标准\n",
+ "\n",
+ "银行通常会有更严格的要求,因为fraud带来的后果通常比较严重,一般我们会调整模型的标准。 \n",
+ "\n",
+ "比如在logistic regression当中,一般我们的概率判定边界为0.5,但是我们可以把阈值设定低一些,来提高模型的“敏感度”,试试看把阈值设定为0.3,再看看这时的评估指标(主要是准确率和召回率)。\n",
+ "\n",
+ "- 提示:sklearn的很多分类模型,predict_prob可以拿到预估的概率,可以根据它和设定的阈值大小去判断最终结果(分类类别)('Gender') "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Precision:\n",
+ "The precision is the ratio tp / (tp + fp) where tp is the number of true positives and fp the number of false positives. The precision is intuitively the ability of the classifier not to label as positive a sample that is negative.\n",
+ "\n",
+ "Accuracy: Number of correct predictions/Total number of predictions\n",
+ "\n",
+ "Recall:\n",
+ "The recall is the measure of our model correctly identifying True Positives. Thus, for all the patients who actually have heart disease, recall tells us how many we correctly identified as having a heart disease. Mathematically:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 62,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Default values\n",
+ "0.935020708697653\n",
+ "0.5725806451612904\n",
+ "0.04965034965034965\n"
+ ]
+ }
+ ],
+ "source": [
+ "## your code here\n",
+ "## your code here\n",
+ "from sklearn.linear_model import LogisticRegression\n",
+ "from sklearn.metrics import accuracy_score, recall_score, precision_score\n",
+ "\n",
+ "lr = LogisticRegression(max_iter = 1000)# default max_iter(100) is too small to converge.\n",
+ "lr.fit(X_train, y_train)\n",
+ "\n",
+ "y_pred = lr.predict(X_test)\n",
+ "accurracy = accuracy_score(y_test, y_pred)\n",
+ "precision = precision_score(y_test, y_pred)\n",
+ "recall = recall_score(y_test, y_pred)\n",
+ "print('Default values')\n",
+ "print(accurracy)\n",
+ "print(precision)\n",
+ "print(recall)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 63,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "[0 0 0 ... 0 0 0]\n"
+ ]
+ }
+ ],
+ "source": [
+ "threshold = 0.3\n",
+ "y_pred_threshold = (lr.predict_proba(X_test)[:,1] > 0.3).astype(int)\n",
+ "print(y_pred_threshold)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 64,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Values corresponding to threshold = 0.3\n",
+ "0.934054302807179\n",
+ "0.4954128440366973\n",
+ "0.11328671328671329\n"
+ ]
+ }
+ ],
+ "source": [
+ "accurracy = accuracy_score(y_test, y_pred_threshold)\n",
+ "precision = precision_score(y_test, y_pred_threshold)\n",
+ "recall = recall_score(y_test, y_pred_threshold)\n",
+ "\n",
+ "print('Values corresponding to threshold = 0.3')\n",
+ "print(accurracy)\n",
+ "print(precision)\n",
+ "print(recall)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### conclusion:\n",
+ "Using 0.3 as threhold increases the recall significantly, does not change accuracy significantly and decreases precision a little.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "ictp-ap",
+ "language": "python",
+ "name": "ictp-ap"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.10.13"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/2023/homework/Zun_Wang/homework_credit_scoring.ipynb b/2023/homework/Zun_Wang/homework_credit_scoring.ipynb
new file mode 100644
index 00000000..d79910f7
--- /dev/null
+++ b/2023/homework/Zun_Wang/homework_credit_scoring.ipynb
@@ -0,0 +1,1291 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# 一起来打怪之 Credit Scoring 练习"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "---\n",
+ "## 作业说明\n",
+ "\n",
+ "- 答题步骤:\n",
+ " - 回答问题**请保留每一步**操作过程,请不要仅仅给出最后答案\n",
+ " - 请养成代码注释的好习惯\n",
+ "\n",
+ "- 解题思路:\n",
+ " - 为方便大家准确理解题目,在习题实战中有所收获,本文档提供了解题思路提示\n",
+ " - 解题思路**仅供参考**,鼓励原创解题方法\n",
+ " - 为督促同学们自己思考,解题思路内容设置为**白色**,必要时请从冒号后拖动鼠标查看\n",
+ "\n",
+ "- 所用数据\n",
+ " - 请注意导入数据库后先**查看和了解数据的基本性质**,后面的问题不再一一提醒"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## machine learning for credit scoring\n",
+ "\n",
+ "\n",
+ "Banks play a crucial role in market economies. They decide who can get finance and on what terms and can make or break investment decisions. For markets and society to function, individuals and companies need access to credit. \n",
+ "\n",
+ "Credit scoring algorithms, which make a guess at the probability of default, are the method banks use to determine whether or not a loan should be granted. This competition requires participants to improve on the state of the art in credit scoring, by predicting the probability that somebody will experience financial distress in the next two years. [Dataset](https://www.kaggle.com/c/GiveMeSomeCredit)\n",
+ "\n",
+ "Attribute Information:\n",
+ "\n",
+ "|Variable Name\t|\tDescription\t|\tType|\n",
+ "|----|----|----|\n",
+ "|SeriousDlqin2yrs\t|\tPerson experienced 90 days past due delinquency or worse \t|\tY/N|\n",
+ "|RevolvingUtilizationOfUnsecuredLines\t|\tTotal balance on credit divided by the sum of credit limits\t|\tpercentage|\n",
+ "|age\t|\tAge of borrower in years\t|\tinteger|\n",
+ "|NumberOfTime30-59DaysPastDueNotWorse\t|\tNumber of times borrower has been 30-59 days past due |\tinteger|\n",
+ "|DebtRatio\t|\tMonthly debt payments\t|\tpercentage|\n",
+ "|MonthlyIncome\t|\tMonthly income\t|\treal|\n",
+ "|NumberOfOpenCreditLinesAndLoans\t|\tNumber of Open loans |\tinteger|\n",
+ "|NumberOfTimes90DaysLate\t|\tNumber of times borrower has been 90 days or more past due.\t|\tinteger|\n",
+ "|NumberRealEstateLoansOrLines\t|\tNumber of mortgage and real estate loans\t|\tinteger|\n",
+ "|NumberOfTime60-89DaysPastDueNotWorse\t|\tNumber of times borrower has been 60-89 days past due |integer|\n",
+ "|NumberOfDependents\t|\tNumber of dependents in family\t|\tinteger|\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "----------\n",
+ "## Read the data into Pandas "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " SeriousDlqin2yrs \n",
+ " RevolvingUtilizationOfUnsecuredLines \n",
+ " age \n",
+ " NumberOfTime30-59DaysPastDueNotWorse \n",
+ " DebtRatio \n",
+ " MonthlyIncome \n",
+ " NumberOfOpenCreditLinesAndLoans \n",
+ " NumberOfTimes90DaysLate \n",
+ " NumberRealEstateLoansOrLines \n",
+ " NumberOfTime60-89DaysPastDueNotWorse \n",
+ " NumberOfDependents \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 1 \n",
+ " 0.766127 \n",
+ " 45.0 \n",
+ " 2.0 \n",
+ " 0.802982 \n",
+ " 9120.0 \n",
+ " 13.0 \n",
+ " 0.0 \n",
+ " 6.0 \n",
+ " 0.0 \n",
+ " 2.0 \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 0 \n",
+ " 0.957151 \n",
+ " 40.0 \n",
+ " 0.0 \n",
+ " 0.121876 \n",
+ " 2600.0 \n",
+ " 4.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 0 \n",
+ " 0.658180 \n",
+ " 38.0 \n",
+ " 1.0 \n",
+ " 0.085113 \n",
+ " 3042.0 \n",
+ " 2.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 0 \n",
+ " 0.233810 \n",
+ " 30.0 \n",
+ " 0.0 \n",
+ " 0.036050 \n",
+ " 3300.0 \n",
+ " 5.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " 0 \n",
+ " 0.907239 \n",
+ " 49.0 \n",
+ " 1.0 \n",
+ " 0.024926 \n",
+ " 63588.0 \n",
+ " 7.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " SeriousDlqin2yrs RevolvingUtilizationOfUnsecuredLines age \\\n",
+ "0 1 0.766127 45.0 \n",
+ "1 0 0.957151 40.0 \n",
+ "2 0 0.658180 38.0 \n",
+ "3 0 0.233810 30.0 \n",
+ "4 0 0.907239 49.0 \n",
+ "\n",
+ " NumberOfTime30-59DaysPastDueNotWorse DebtRatio MonthlyIncome \\\n",
+ "0 2.0 0.802982 9120.0 \n",
+ "1 0.0 0.121876 2600.0 \n",
+ "2 1.0 0.085113 3042.0 \n",
+ "3 0.0 0.036050 3300.0 \n",
+ "4 1.0 0.024926 63588.0 \n",
+ "\n",
+ " NumberOfOpenCreditLinesAndLoans NumberOfTimes90DaysLate \\\n",
+ "0 13.0 0.0 \n",
+ "1 4.0 0.0 \n",
+ "2 2.0 1.0 \n",
+ "3 5.0 0.0 \n",
+ "4 7.0 0.0 \n",
+ "\n",
+ " NumberRealEstateLoansOrLines NumberOfTime60-89DaysPastDueNotWorse \\\n",
+ "0 6.0 0.0 \n",
+ "1 0.0 0.0 \n",
+ "2 0.0 0.0 \n",
+ "3 0.0 0.0 \n",
+ "4 1.0 0.0 \n",
+ "\n",
+ " NumberOfDependents \n",
+ "0 2.0 \n",
+ "1 1.0 \n",
+ "2 0.0 \n",
+ "3 0.0 \n",
+ "4 0.0 "
+ ]
+ },
+ "execution_count": 2,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "import pandas as pd\n",
+ "pd.set_option('display.max_columns', 500)\n",
+ "import zipfile\n",
+ "with zipfile.ZipFile('KaggleCredit2.csv.zip', 'r') as z:\n",
+ " f = z.open('KaggleCredit2.csv')\n",
+ " data = pd.read_csv(f, index_col=0)\n",
+ "data.head()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(112915, 11)"
+ ]
+ },
+ "execution_count": 3,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "data.shape"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "------------\n",
+ "## Drop na"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "SeriousDlqin2yrs 0\n",
+ "RevolvingUtilizationOfUnsecuredLines 0\n",
+ "age 4267\n",
+ "NumberOfTime30-59DaysPastDueNotWorse 0\n",
+ "DebtRatio 0\n",
+ "MonthlyIncome 0\n",
+ "NumberOfOpenCreditLinesAndLoans 0\n",
+ "NumberOfTimes90DaysLate 0\n",
+ "NumberRealEstateLoansOrLines 0\n",
+ "NumberOfTime60-89DaysPastDueNotWorse 0\n",
+ "NumberOfDependents 4267\n",
+ "dtype: int64"
+ ]
+ },
+ "execution_count": 4,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "data.isnull().sum(axis=0)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(108648, 11)"
+ ]
+ },
+ "execution_count": 5,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "data.dropna(inplace=True)\n",
+ "data.shape"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "---------\n",
+ "## Create X and y"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [],
+ "source": [
+ "y = data['SeriousDlqin2yrs']\n",
+ "X = data.drop('SeriousDlqin2yrs', axis=1)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0.06742876076872101"
+ ]
+ },
+ "execution_count": 7,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "y.mean()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "---\n",
+ "## 练习1:把数据切分成训练集和测试集\n",
+ "- 提示:from sklearn.model_selection import train_test_split('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "((76053, 10), (32595, 10), (76053,), (32595,))"
+ ]
+ },
+ "execution_count": 8,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "## your code here\n",
+ "from sklearn.model_selection import train_test_split\n",
+ "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, shuffle=True, random_state=0)\n",
+ "\n",
+ "# 查看数据的维度\n",
+ "X_train.shape, X_test.shape, y_train.shape, y_test.shape"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "107294 0\n",
+ "39317 0\n",
+ "40606 0\n",
+ "68931 0\n",
+ "20745 0\n",
+ " ..\n",
+ "65166 0\n",
+ "109192 0\n",
+ "85812 0\n",
+ "50213 0\n",
+ "23557 0\n",
+ "Name: SeriousDlqin2yrs, Length: 32595, dtype: int64"
+ ]
+ },
+ "execution_count": 9,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "y_test"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "----\n",
+ "## 练习2:使用logistic regression/决策树/SVM/KNN...等sklearn分类算法进行分类\n",
+ "尝试查sklearn API了解模型参数含义,调整不同的参数"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Logistic regression\n",
+ "- 提示:from sklearn import linear_model('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "/home/dk/anaconda3/envs/igwn-py39/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:763: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
+ "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
+ "\n",
+ "Increase the number of iterations (max_iter) or scale the data as shown in:\n",
+ " https://scikit-learn.org/stable/modules/preprocessing.html\n",
+ "Please also refer to the documentation for alternative solver options:\n",
+ " https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
+ " n_iter_i = _check_optimize_result(\n"
+ ]
+ },
+ {
+ "data": {
+ "text/plain": [
+ "LogisticRegression()"
+ ]
+ },
+ "execution_count": 10,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "## your code here\n",
+ "from sklearn.linear_model import LogisticRegression\n",
+ "from sklearn.metrics import accuracy_score, classification_report\n",
+ "\n",
+ "# 创建 LogisticRegression 模型实例\n",
+ "lr_model = LogisticRegression(penalty='l2', # 正则化类型\n",
+ " dual=False, # 不使用对偶形式\n",
+ " tol=0.0001, # 收敛容忍度\n",
+ " C=1.0, # 正则化强度\n",
+ " fit_intercept=True, # 计算截距\n",
+ " class_weight=None, # 类别权重\n",
+ " random_state=None, # 随机数种子\n",
+ " solver='lbfgs', # 优化算法\n",
+ " max_iter=100, # 最大迭代次数\n",
+ " multi_class='auto', # 多分类策略\n",
+ " verbose=0, # 不输出过程\n",
+ " warm_start=False, # 不使用前次结果初始化\n",
+ " n_jobs=None) # CPU核数\n",
+ "\n",
+ "# 训练模型\n",
+ "lr_model.fit(X_train, y_train)\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Decision Tree\n",
+ "- 提示:from sklearn.tree import DecisionTreeClassifier('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "DecisionTreeClassifier(random_state=42)"
+ ]
+ },
+ "execution_count": 11,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "## your code here\n",
+ "from sklearn.tree import DecisionTreeClassifier\n",
+ "from sklearn.metrics import accuracy_score, classification_report\n",
+ "\n",
+ "# 创建决策树模型实例\n",
+ "dt_model = DecisionTreeClassifier(random_state=42)\n",
+ "\n",
+ "# 训练模型\n",
+ "dt_model.fit(X_train, y_train)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Random Forest\n",
+ "- 提示:from sklearn.ensemble import RandomForestClassifier('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "RandomForestClassifier(random_state=42)"
+ ]
+ },
+ "execution_count": 12,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "## your code here\n",
+ "from sklearn.ensemble import RandomForestClassifier\n",
+ "\n",
+ "# 创建随机森林模型实例\n",
+ "rf_model = RandomForestClassifier(n_estimators=100, random_state=42)\n",
+ "\n",
+ "# 训练模型\n",
+ "rf_model.fit(X_train, y_train)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### SVM\n",
+ "- 提示:from sklearn.svm import SVC('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 13,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "SVC(gamma='auto', random_state=42)"
+ ]
+ },
+ "execution_count": 13,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "## your code here\n",
+ "from sklearn.svm import SVC\n",
+ "\n",
+ "# 创建SVM模型实例\n",
+ "svm_model = SVC(gamma='auto', random_state=42)\n",
+ "\n",
+ "# 训练模型\n",
+ "svm_model.fit(X_train, y_train)\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### KNN\n",
+ "- 提示:from sklearn.neighbors import KNeighborsClassifier('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 14,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "KNeighborsClassifier()"
+ ]
+ },
+ "execution_count": 14,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "## your code here\n",
+ "from sklearn.neighbors import KNeighborsClassifier\n",
+ "\n",
+ "# 创建KNN模型实例,这里假设我们使用5个邻居\n",
+ "knn_model = KNeighborsClassifier(n_neighbors=5)\n",
+ "\n",
+ "# 训练模型\n",
+ "knn_model.fit(X_train, y_train)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "---\n",
+ "\n",
+ "## 练习3:在测试集上进行预测,计算准确度"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Logistic regression\n",
+ "- 提示:y_pred_LR = clf_LR.predict(x_test)('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 15,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Accuracy: 0.9333640128854118\n",
+ " precision recall f1-score support\n",
+ "\n",
+ " 0 0.93 1.00 0.97 30424\n",
+ " 1 0.49 0.02 0.04 2171\n",
+ "\n",
+ " accuracy 0.93 32595\n",
+ " macro avg 0.71 0.51 0.50 32595\n",
+ "weighted avg 0.91 0.93 0.90 32595\n",
+ "\n"
+ ]
+ }
+ ],
+ "source": [
+ "## your code here\n",
+ "\n",
+ "# 在测试集上进行预测\n",
+ "y_pred_lr = lr_model.predict(X_test)\n",
+ "\n",
+ "# 计算准确率\n",
+ "accuracy_lr = accuracy_score(y_test, y_pred_lr)\n",
+ "\n",
+ "# 打印准确率和分类报告\n",
+ "print(f'Accuracy: {accuracy_lr}')\n",
+ "print(classification_report(y_test, y_pred_lr))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Decision Tree\n",
+ "- 提示:y_pred_tree = tree.predict(x_test)('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 16,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Decision Tree Accuracy: 0.8932658383187605\n",
+ " precision recall f1-score support\n",
+ "\n",
+ " 0 0.95 0.94 0.94 30424\n",
+ " 1 0.24 0.27 0.25 2171\n",
+ "\n",
+ " accuracy 0.89 32595\n",
+ " macro avg 0.59 0.60 0.60 32595\n",
+ "weighted avg 0.90 0.89 0.90 32595\n",
+ "\n"
+ ]
+ }
+ ],
+ "source": [
+ "## your code here\n",
+ "# 在测试集上进行预测\n",
+ "y_pred_dt = dt_model.predict(X_test)\n",
+ "\n",
+ "# 计算准确率\n",
+ "accuracy_dt = accuracy_score(y_test, y_pred_dt)\n",
+ "\n",
+ "# 打印准确率和分类报告\n",
+ "print(f'Decision Tree Accuracy: {accuracy_dt}')\n",
+ "print(classification_report(y_test, y_pred_dt))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Random Forest\n",
+ "- 提示:y_pred_forest = forest.predict(x_test)('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 17,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Random Forest Accuracy: 0.9352047860101242\n",
+ " precision recall f1-score support\n",
+ "\n",
+ " 0 0.94 0.99 0.97 30424\n",
+ " 1 0.54 0.17 0.26 2171\n",
+ "\n",
+ " accuracy 0.94 32595\n",
+ " macro avg 0.74 0.58 0.61 32595\n",
+ "weighted avg 0.92 0.94 0.92 32595\n",
+ "\n"
+ ]
+ }
+ ],
+ "source": [
+ "## your code here\n",
+ "# 在测试集上进行预测\n",
+ "y_pred_rf = rf_model.predict(X_test)\n",
+ "\n",
+ "# 计算准确率\n",
+ "accuracy_rf = accuracy_score(y_test, y_pred_rf)\n",
+ "\n",
+ "# 打印准确率和分类报告\n",
+ "print(f'Random Forest Accuracy: {accuracy_rf}')\n",
+ "print(classification_report(y_test, y_pred_rf))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### SVM\n",
+ "- 提示:y_pred_SVC = clf_svc.predict(x_test)('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 18,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "SVM Accuracy: 0.9333333333333333\n",
+ " precision recall f1-score support\n",
+ "\n",
+ " 0 0.93 1.00 0.97 30424\n",
+ " 1 0.38 0.00 0.00 2171\n",
+ "\n",
+ " accuracy 0.93 32595\n",
+ " macro avg 0.65 0.50 0.48 32595\n",
+ "weighted avg 0.90 0.93 0.90 32595\n",
+ "\n"
+ ]
+ }
+ ],
+ "source": [
+ "## your code here\n",
+ "# 在测试集上进行预测\n",
+ "y_pred_svm = svm_model.predict(X_test)\n",
+ "\n",
+ "# 计算准确率\n",
+ "accuracy_svm = accuracy_score(y_test, y_pred_svm)\n",
+ "\n",
+ "# 打印准确率和分类报告\n",
+ "print(f'SVM Accuracy: {accuracy_svm}')\n",
+ "print(classification_report(y_test, y_pred_svm))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### KNN\n",
+ "- 提示:y_pred_KNN = neigh.predict(x_test)('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 19,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "/home/dk/anaconda3/envs/igwn-py39/lib/python3.9/site-packages/sklearn/neighbors/_classification.py:211: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning.\n",
+ " mode, _ = stats.mode(_y[neigh_ind, k], axis=1)\n"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "KNN Accuracy: 0.9321061512501917\n",
+ " precision recall f1-score support\n",
+ "\n",
+ " 0 0.93 1.00 0.96 30424\n",
+ " 1 0.30 0.01 0.03 2171\n",
+ "\n",
+ " accuracy 0.93 32595\n",
+ " macro avg 0.62 0.51 0.50 32595\n",
+ "weighted avg 0.89 0.93 0.90 32595\n",
+ "\n"
+ ]
+ }
+ ],
+ "source": [
+ "## your code here\n",
+ "# 在测试集上进行预测\n",
+ "y_pred_knn = knn_model.predict(X_test)\n",
+ "\n",
+ "# 计算准确率\n",
+ "accuracy_knn = accuracy_score(y_test, y_pred_knn)\n",
+ "\n",
+ "# 打印准确率和分类报告\n",
+ "print(f'KNN Accuracy: {accuracy_knn}')\n",
+ "print(classification_report(y_test, y_pred_knn))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "---\n",
+ "## 练习4:查看sklearn的官方说明,了解分类问题的评估标准,并对此例进行评估"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**混淆矩阵(Confusion Matrix)相关学习链接**\n",
+ "\n",
+ "- Blog: \n",
+ "http://blog.csdn.net/vesper305/article/details/44927047 \n",
+ "- WiKi: \n",
+ "http://en.wikipedia.org/wiki/Confusion_matrix \n",
+ "- sklearn doc: \n",
+ "http://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 31,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "image/png": "",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "## your code here\n",
+ "from sklearn.metrics import confusion_matrix\n",
+ "import seaborn as sns\n",
+ "import matplotlib.pyplot as plt\n",
+ "\n",
+ "# 假设 y_test 是测试集的真实标签,y_pred 是模型的预测标签\n",
+ "\n",
+ "# 生成混淆矩阵\n",
+ "conf_matrix = confusion_matrix(y_test, y_pred_lr)\n",
+ "\n",
+ "# 使用Seaborn来可视化混淆矩阵\n",
+ "plt.figure(figsize=(10, 7))\n",
+ "sns.heatmap(conf_matrix, annot=True, fmt='d', cmap=\"Blues\")\n",
+ "plt.title('Confusion Matrix')\n",
+ "plt.ylabel('Actual Label')\n",
+ "plt.xlabel('Predicted Label')\n",
+ "plt.show()\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 34,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "## your code here\n",
+ "from sklearn.metrics import confusion_matrix\n",
+ "import seaborn as sns\n",
+ "import matplotlib.pyplot as plt\n",
+ "\n",
+ "# 假设 y_test 是测试集的真实标签,y_pred 是模型的预测标签\n",
+ "\n",
+ "# 生成混淆矩阵\n",
+ "conf_matrix = confusion_matrix(y_test, y_pred_dt)\n",
+ "\n",
+ "# 使用Seaborn来可视化混淆矩阵\n",
+ "plt.figure(figsize=(10, 7))\n",
+ "sns.heatmap(conf_matrix, annot=True, fmt='d', cmap=\"Blues\")\n",
+ "plt.title('Confusion Matrix')\n",
+ "plt.ylabel('Actual Label')\n",
+ "plt.xlabel('Predicted Label')\n",
+ "plt.show()\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 22,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "## your code here\n",
+ "from sklearn.metrics import confusion_matrix\n",
+ "import seaborn as sns\n",
+ "import matplotlib.pyplot as plt\n",
+ "\n",
+ "# 假设 y_test 是测试集的真实标签,y_pred 是模型的预测标签\n",
+ "\n",
+ "# 生成混淆矩阵\n",
+ "conf_matrix = confusion_matrix(y_test, y_pred_rf)\n",
+ "\n",
+ "# 使用Seaborn来可视化混淆矩阵\n",
+ "plt.figure(figsize=(10, 7))\n",
+ "sns.heatmap(conf_matrix, annot=True, fmt='d', cmap=\"Blues\")\n",
+ "plt.title('Confusion Matrix')\n",
+ "plt.ylabel('Actual Label')\n",
+ "plt.xlabel('Predicted Label')\n",
+ "plt.show()\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 23,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "## your code here\n",
+ "from sklearn.metrics import confusion_matrix\n",
+ "import seaborn as sns\n",
+ "import matplotlib.pyplot as plt\n",
+ "\n",
+ "# 假设 y_test 是测试集的真实标签,y_pred 是模型的预测标签\n",
+ "\n",
+ "# 生成混淆矩阵\n",
+ "conf_matrix = confusion_matrix(y_test, y_pred_svm)\n",
+ "\n",
+ "# 使用Seaborn来可视化混淆矩阵\n",
+ "plt.figure(figsize=(10, 7))\n",
+ "sns.heatmap(conf_matrix, annot=True, fmt='d', cmap=\"Blues\")\n",
+ "plt.title('Confusion Matrix')\n",
+ "plt.ylabel('Actual Label')\n",
+ "plt.xlabel('Predicted Label')\n",
+ "plt.show()\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 24,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "## your code here\n",
+ "from sklearn.metrics import confusion_matrix\n",
+ "import seaborn as sns\n",
+ "import matplotlib.pyplot as plt\n",
+ "\n",
+ "# 假设 y_test 是测试集的真实标签,y_pred 是模型的预测标签\n",
+ "\n",
+ "# 生成混淆矩阵\n",
+ "conf_matrix = confusion_matrix(y_test, y_pred_knn)\n",
+ "\n",
+ "# 使用Seaborn来可视化混淆矩阵\n",
+ "plt.figure(figsize=(10, 7))\n",
+ "sns.heatmap(conf_matrix, annot=True, fmt='d', cmap=\"Blues\")\n",
+ "plt.title('Confusion Matrix')\n",
+ "plt.ylabel('Actual Label')\n",
+ "plt.xlabel('Predicted Label')\n",
+ "plt.show()\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 练习5:调整模型的标准\n",
+ "\n",
+ "银行通常会有更严格的要求,因为fraud带来的后果通常比较严重,一般我们会调整模型的标准。 \n",
+ "\n",
+ "比如在logistic regression当中,一般我们的概率判定边界为0.5,但是我们可以把阈值设定低一些,来提高模型的“敏感度”,试试看把阈值设定为0.3,再看看这时的评估指标(主要是准确率和召回率)。\n",
+ "\n",
+ "- 提示:sklearn的很多分类模型,predict_prob可以拿到预估的概率,可以根据它和设定的阈值大小去判断最终结果(分类类别)('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 25,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Custom Threshold Accuracy: 0.9312778033440712\n",
+ " precision recall f1-score support\n",
+ "\n",
+ " 0 0.94 0.99 0.96 30424\n",
+ " 1 0.40 0.07 0.12 2171\n",
+ "\n",
+ " accuracy 0.93 32595\n",
+ " macro avg 0.67 0.53 0.54 32595\n",
+ "weighted avg 0.90 0.93 0.91 32595\n",
+ "\n",
+ "Confusion Matrix:\n",
+ "[[30208 216]\n",
+ " [ 2024 147]]\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "/home/dk/anaconda3/envs/igwn-py39/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:763: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
+ "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
+ "\n",
+ "Increase the number of iterations (max_iter) or scale the data as shown in:\n",
+ " https://scikit-learn.org/stable/modules/preprocessing.html\n",
+ "Please also refer to the documentation for alternative solver options:\n",
+ " https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
+ " n_iter_i = _check_optimize_result(\n"
+ ]
+ }
+ ],
+ "source": [
+ "## your code here\n",
+ "from sklearn.linear_model import LogisticRegression\n",
+ "from sklearn.metrics import accuracy_score, classification_report, confusion_matrix\n",
+ "import numpy as np\n",
+ "\n",
+ "# 假设 X_train, X_test, y_train, y_test 已经在您的环境中定义好了\n",
+ "\n",
+ "# 创建 LogisticRegression 模型实例\n",
+ "lr_model = LogisticRegression(penalty='l2', # 正则化类型\n",
+ " dual=False, # 不使用对偶形式\n",
+ " tol=0.0001, # 收敛容忍度\n",
+ " C=1.0, # 正则化强度\n",
+ " fit_intercept=True, # 计算截距\n",
+ " class_weight=None, # 类别权重\n",
+ " random_state=None, # 随机数种子\n",
+ " solver='lbfgs', # 优化算法\n",
+ " max_iter=100, # 最大迭代次数\n",
+ " multi_class='auto', # 多分类策略\n",
+ " verbose=0, # 不输出过程\n",
+ " warm_start=False, # 不使用前次结果初始化\n",
+ " n_jobs=None) # CPU核数\n",
+ "\n",
+ "# 训练模型\n",
+ "lr_model.fit(X_train, y_train)\n",
+ "\n",
+ "# 获取测试集上的预测概率\n",
+ "y_proba = lr_model.predict_proba(X_test)[:, 1]\n",
+ "\n",
+ "# 定义一个阈值\n",
+ "threshold = 0.3\n",
+ "\n",
+ "# 根据阈值生成自定义的预测结果\n",
+ "y_pred_custom = np.where(y_proba >= threshold, 1, 0)\n",
+ "\n",
+ "# 计算准确率\n",
+ "accuracy_custom = accuracy_score(y_test, y_pred_custom)\n",
+ "\n",
+ "# 计算混淆矩阵\n",
+ "conf_matrix_custom = confusion_matrix(y_test, y_pred_custom)\n",
+ "\n",
+ "# 打印准确率和分类报告\n",
+ "print(f'Custom Threshold Accuracy: {accuracy_custom}')\n",
+ "print(classification_report(y_test, y_pred_custom))\n",
+ "\n",
+ "# 打印混淆矩阵\n",
+ "print('Confusion Matrix:')\n",
+ "print(conf_matrix_custom)\n"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "base",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.9.13"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/2023/homework/shanleilei/homework_credit_scoring .ipynb b/2023/homework/shanleilei/homework_credit_scoring .ipynb
new file mode 100644
index 00000000..8064e864
--- /dev/null
+++ b/2023/homework/shanleilei/homework_credit_scoring .ipynb
@@ -0,0 +1,2706 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "QATykvBElCro"
+ },
+ "source": [
+ "# 一起来打怪之 Credit Scoring 练习"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "from google.colab import drive\n",
+ "drive.mount('/content/drive')"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "oAnvRrtMyTiF",
+ "outputId": "b5d533bd-f9ed-4e40-b1cf-8d63b679571d"
+ },
+ "execution_count": 2,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount(\"/content/drive\", force_remount=True).\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "HF0X_fp8lCrq"
+ },
+ "source": [
+ "---\n",
+ "## 作业说明\n",
+ "\n",
+ "- 答题步骤:\n",
+ " - 回答问题**请保留每一步**操作过程,请不要仅仅给出最后答案\n",
+ " - 请养成代码注释的好习惯\n",
+ "\n",
+ "- 解题思路:\n",
+ " - 为方便大家准确理解题目,在习题实战中有所收获,本文档提供了解题思路提示\n",
+ " - 解题思路**仅供参考**,鼓励原创解题方法\n",
+ " - 为督促同学们自己思考,解题思路内容设置为**白色**,必要时请从冒号后拖动鼠标查看\n",
+ "\n",
+ "- 所用数据\n",
+ " - 请注意导入数据库后先**查看和了解数据的基本性质**,后面的问题不再一一提醒"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "UZJUuzR9lCrr"
+ },
+ "source": [
+ "## machine learning for credit scoring\n",
+ "\n",
+ "\n",
+ "Banks play a crucial role in market economies. They decide who can get finance and on what terms and can make or break investment decisions. For markets and society to function, individuals and companies need access to credit.\n",
+ "\n",
+ "Credit scoring algorithms, which make a guess at the probability of default, are the method banks use to determine whether or not a loan should be granted. This competition requires participants to improve on the state of the art in credit scoring, by predicting the probability that somebody will experience financial distress in the next two years. [Dataset](https://www.kaggle.com/c/GiveMeSomeCredit)\n",
+ "\n",
+ "Attribute Information:\n",
+ "\n",
+ "|Variable Name\t|\tDescription\t|\tType|\n",
+ "|----|----|----|\n",
+ "|SeriousDlqin2yrs\t|\tPerson experienced 90 days past due delinquency or worse \t|\tY/N|\n",
+ "|RevolvingUtilizationOfUnsecuredLines\t|\tTotal balance on credit divided by the sum of credit limits\t|\tpercentage|\n",
+ "|age\t|\tAge of borrower in years\t|\tinteger|\n",
+ "|NumberOfTime30-59DaysPastDueNotWorse\t|\tNumber of times borrower has been 30-59 days past due |\tinteger|\n",
+ "|DebtRatio\t|\tMonthly debt payments\t|\tpercentage|\n",
+ "|MonthlyIncome\t|\tMonthly income\t|\treal|\n",
+ "|NumberOfOpenCreditLinesAndLoans\t|\tNumber of Open loans |\tinteger|\n",
+ "|NumberOfTimes90DaysLate\t|\tNumber of times borrower has been 90 days or more past due.\t|\tinteger|\n",
+ "|NumberRealEstateLoansOrLines\t|\tNumber of mortgage and real estate loans\t|\tinteger|\n",
+ "|NumberOfTime60-89DaysPastDueNotWorse\t|\tNumber of times borrower has been 60-89 days past due |integer|\n",
+ "|NumberOfDependents\t|\tNumber of dependents in family\t|\tinteger|\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "%cd /home\n",
+ "!pwd"
+ ],
+ "metadata": {
+ "id": "d8z8eYJrpM3V",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "1b0c4c22-feab-49d6-ecdb-9f273425d9c2"
+ },
+ "execution_count": 3,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "/home\n",
+ "/home\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "UDRHplYVlCrr"
+ },
+ "source": [
+ "----------\n",
+ "## Read the data into Pandas"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ },
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 244
+ },
+ "id": "otHEeK6JlCrs",
+ "outputId": "1487d03b-6577-4522-e25b-a461586d0276"
+ },
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ " SeriousDlqin2yrs RevolvingUtilizationOfUnsecuredLines age \\\n",
+ "0 1 0.766127 45.0 \n",
+ "1 0 0.957151 40.0 \n",
+ "2 0 0.658180 38.0 \n",
+ "3 0 0.233810 30.0 \n",
+ "4 0 0.907239 49.0 \n",
+ "\n",
+ " NumberOfTime30-59DaysPastDueNotWorse DebtRatio MonthlyIncome \\\n",
+ "0 2.0 0.802982 9120.0 \n",
+ "1 0.0 0.121876 2600.0 \n",
+ "2 1.0 0.085113 3042.0 \n",
+ "3 0.0 0.036050 3300.0 \n",
+ "4 1.0 0.024926 63588.0 \n",
+ "\n",
+ " NumberOfOpenCreditLinesAndLoans NumberOfTimes90DaysLate \\\n",
+ "0 13.0 0.0 \n",
+ "1 4.0 0.0 \n",
+ "2 2.0 1.0 \n",
+ "3 5.0 0.0 \n",
+ "4 7.0 0.0 \n",
+ "\n",
+ " NumberRealEstateLoansOrLines NumberOfTime60-89DaysPastDueNotWorse \\\n",
+ "0 6.0 0.0 \n",
+ "1 0.0 0.0 \n",
+ "2 0.0 0.0 \n",
+ "3 0.0 0.0 \n",
+ "4 1.0 0.0 \n",
+ "\n",
+ " NumberOfDependents \n",
+ "0 2.0 \n",
+ "1 1.0 \n",
+ "2 0.0 \n",
+ "3 0.0 \n",
+ "4 0.0 "
+ ],
+ "text/html": [
+ "\n",
+ " \n",
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " SeriousDlqin2yrs \n",
+ " RevolvingUtilizationOfUnsecuredLines \n",
+ " age \n",
+ " NumberOfTime30-59DaysPastDueNotWorse \n",
+ " DebtRatio \n",
+ " MonthlyIncome \n",
+ " NumberOfOpenCreditLinesAndLoans \n",
+ " NumberOfTimes90DaysLate \n",
+ " NumberRealEstateLoansOrLines \n",
+ " NumberOfTime60-89DaysPastDueNotWorse \n",
+ " NumberOfDependents \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 1 \n",
+ " 0.766127 \n",
+ " 45.0 \n",
+ " 2.0 \n",
+ " 0.802982 \n",
+ " 9120.0 \n",
+ " 13.0 \n",
+ " 0.0 \n",
+ " 6.0 \n",
+ " 0.0 \n",
+ " 2.0 \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 0 \n",
+ " 0.957151 \n",
+ " 40.0 \n",
+ " 0.0 \n",
+ " 0.121876 \n",
+ " 2600.0 \n",
+ " 4.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 0 \n",
+ " 0.658180 \n",
+ " 38.0 \n",
+ " 1.0 \n",
+ " 0.085113 \n",
+ " 3042.0 \n",
+ " 2.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 0 \n",
+ " 0.233810 \n",
+ " 30.0 \n",
+ " 0.0 \n",
+ " 0.036050 \n",
+ " 3300.0 \n",
+ " 5.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " 0 \n",
+ " 0.907239 \n",
+ " 49.0 \n",
+ " 1.0 \n",
+ " 0.024926 \n",
+ " 63588.0 \n",
+ " 7.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
\n",
+ "
\n",
+ "
\n"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 4
+ }
+ ],
+ "source": [
+ "import pandas as pd\n",
+ "pd.set_option('display.max_columns', 500)\n",
+ "import zipfile\n",
+ "with zipfile.ZipFile('KaggleCredit2.csv.zip', 'r') as z:\n",
+ " f = z.open('KaggleCredit2.csv')\n",
+ " data = pd.read_csv(f, index_col=0)\n",
+ "data.head()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ },
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "rpSdRHj5lCrt",
+ "outputId": "ad14f6a3-aea4-4dee-9a15-7aefdce291a8"
+ },
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "(112915, 11)"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 5
+ }
+ ],
+ "source": [
+ "data.shape"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "Ey4XR8julCrt"
+ },
+ "source": [
+ "------------\n",
+ "## Drop na"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ },
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "OGuu1V7plCru",
+ "outputId": "a6c978f7-0977-49a0-dbe7-f7be99578440"
+ },
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "SeriousDlqin2yrs 0\n",
+ "RevolvingUtilizationOfUnsecuredLines 0\n",
+ "age 4267\n",
+ "NumberOfTime30-59DaysPastDueNotWorse 0\n",
+ "DebtRatio 0\n",
+ "MonthlyIncome 0\n",
+ "NumberOfOpenCreditLinesAndLoans 0\n",
+ "NumberOfTimes90DaysLate 0\n",
+ "NumberRealEstateLoansOrLines 0\n",
+ "NumberOfTime60-89DaysPastDueNotWorse 0\n",
+ "NumberOfDependents 4267\n",
+ "dtype: int64"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 6
+ }
+ ],
+ "source": [
+ "data.isnull().sum(axis=0)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ },
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "r9VgpoValCru",
+ "outputId": "1ddba7eb-5409-46f3-bed5-78e966ad8432"
+ },
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "(108648, 11)"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 7
+ }
+ ],
+ "source": [
+ "data.dropna(inplace=True)\n",
+ "data.shape"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "6IjRfh0llCru"
+ },
+ "source": [
+ "---------\n",
+ "## Create X and y"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ },
+ "id": "Lj82L8m7lCrv"
+ },
+ "outputs": [],
+ "source": [
+ "y = data['SeriousDlqin2yrs']\n",
+ "X = data.drop('SeriousDlqin2yrs', axis=1)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "y\n",
+ "y.mean()\n",
+ "X"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 461
+ },
+ "id": "wvZqZ50CWwyw",
+ "outputId": "2b3718d5-05cd-4030-dc14-368b447146ee"
+ },
+ "execution_count": 15,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ " RevolvingUtilizationOfUnsecuredLines age \\\n",
+ "0 0.766127 45.0 \n",
+ "1 0.957151 40.0 \n",
+ "2 0.658180 38.0 \n",
+ "3 0.233810 30.0 \n",
+ "4 0.907239 49.0 \n",
+ "... ... ... \n",
+ "112910 0.385742 50.0 \n",
+ "112911 0.040674 74.0 \n",
+ "112912 0.299745 44.0 \n",
+ "112913 0.000000 30.0 \n",
+ "112914 0.850283 64.0 \n",
+ "\n",
+ " NumberOfTime30-59DaysPastDueNotWorse DebtRatio MonthlyIncome \\\n",
+ "0 2.0 0.802982 9120.0 \n",
+ "1 0.0 0.121876 2600.0 \n",
+ "2 1.0 0.085113 3042.0 \n",
+ "3 0.0 0.036050 3300.0 \n",
+ "4 1.0 0.024926 63588.0 \n",
+ "... ... ... ... \n",
+ "112910 0.0 0.404293 3400.0 \n",
+ "112911 0.0 0.225131 2100.0 \n",
+ "112912 0.0 0.716562 5584.0 \n",
+ "112913 0.0 0.000000 5716.0 \n",
+ "112914 0.0 0.249908 8158.0 \n",
+ "\n",
+ " NumberOfOpenCreditLinesAndLoans NumberOfTimes90DaysLate \\\n",
+ "0 13.0 0.0 \n",
+ "1 4.0 0.0 \n",
+ "2 2.0 1.0 \n",
+ "3 5.0 0.0 \n",
+ "4 7.0 0.0 \n",
+ "... ... ... \n",
+ "112910 7.0 0.0 \n",
+ "112911 4.0 0.0 \n",
+ "112912 4.0 0.0 \n",
+ "112913 4.0 0.0 \n",
+ "112914 8.0 0.0 \n",
+ "\n",
+ " NumberRealEstateLoansOrLines NumberOfTime60-89DaysPastDueNotWorse \\\n",
+ "0 6.0 0.0 \n",
+ "1 0.0 0.0 \n",
+ "2 0.0 0.0 \n",
+ "3 0.0 0.0 \n",
+ "4 1.0 0.0 \n",
+ "... ... ... \n",
+ "112910 0.0 0.0 \n",
+ "112911 1.0 0.0 \n",
+ "112912 1.0 0.0 \n",
+ "112913 0.0 0.0 \n",
+ "112914 2.0 0.0 \n",
+ "\n",
+ " NumberOfDependents \n",
+ "0 2.0 \n",
+ "1 1.0 \n",
+ "2 0.0 \n",
+ "3 0.0 \n",
+ "4 0.0 \n",
+ "... ... \n",
+ "112910 0.0 \n",
+ "112911 0.0 \n",
+ "112912 2.0 \n",
+ "112913 0.0 \n",
+ "112914 0.0 \n",
+ "\n",
+ "[108648 rows x 10 columns]"
+ ],
+ "text/html": [
+ "\n",
+ " \n",
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " RevolvingUtilizationOfUnsecuredLines \n",
+ " age \n",
+ " NumberOfTime30-59DaysPastDueNotWorse \n",
+ " DebtRatio \n",
+ " MonthlyIncome \n",
+ " NumberOfOpenCreditLinesAndLoans \n",
+ " NumberOfTimes90DaysLate \n",
+ " NumberRealEstateLoansOrLines \n",
+ " NumberOfTime60-89DaysPastDueNotWorse \n",
+ " NumberOfDependents \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 0.766127 \n",
+ " 45.0 \n",
+ " 2.0 \n",
+ " 0.802982 \n",
+ " 9120.0 \n",
+ " 13.0 \n",
+ " 0.0 \n",
+ " 6.0 \n",
+ " 0.0 \n",
+ " 2.0 \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 0.957151 \n",
+ " 40.0 \n",
+ " 0.0 \n",
+ " 0.121876 \n",
+ " 2600.0 \n",
+ " 4.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 0.658180 \n",
+ " 38.0 \n",
+ " 1.0 \n",
+ " 0.085113 \n",
+ " 3042.0 \n",
+ " 2.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 0.233810 \n",
+ " 30.0 \n",
+ " 0.0 \n",
+ " 0.036050 \n",
+ " 3300.0 \n",
+ " 5.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " 0.907239 \n",
+ " 49.0 \n",
+ " 1.0 \n",
+ " 0.024926 \n",
+ " 63588.0 \n",
+ " 7.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " \n",
+ " \n",
+ " 112910 \n",
+ " 0.385742 \n",
+ " 50.0 \n",
+ " 0.0 \n",
+ " 0.404293 \n",
+ " 3400.0 \n",
+ " 7.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 112911 \n",
+ " 0.040674 \n",
+ " 74.0 \n",
+ " 0.0 \n",
+ " 0.225131 \n",
+ " 2100.0 \n",
+ " 4.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 112912 \n",
+ " 0.299745 \n",
+ " 44.0 \n",
+ " 0.0 \n",
+ " 0.716562 \n",
+ " 5584.0 \n",
+ " 4.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 2.0 \n",
+ " \n",
+ " \n",
+ " 112913 \n",
+ " 0.000000 \n",
+ " 30.0 \n",
+ " 0.0 \n",
+ " 0.000000 \n",
+ " 5716.0 \n",
+ " 4.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 112914 \n",
+ " 0.850283 \n",
+ " 64.0 \n",
+ " 0.0 \n",
+ " 0.249908 \n",
+ " 8158.0 \n",
+ " 8.0 \n",
+ " 0.0 \n",
+ " 2.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
108648 rows × 10 columns
\n",
+ "
\n",
+ "
\n",
+ "
\n"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 15
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ },
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 461
+ },
+ "id": "pch3JmENlCrv",
+ "outputId": "0f0606f9-bed3-432e-fb27-ea827e557039"
+ },
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ " RevolvingUtilizationOfUnsecuredLines age \\\n",
+ "0 0.766127 45.0 \n",
+ "1 0.957151 40.0 \n",
+ "2 0.658180 38.0 \n",
+ "3 0.233810 30.0 \n",
+ "4 0.907239 49.0 \n",
+ "... ... ... \n",
+ "112910 0.385742 50.0 \n",
+ "112911 0.040674 74.0 \n",
+ "112912 0.299745 44.0 \n",
+ "112913 0.000000 30.0 \n",
+ "112914 0.850283 64.0 \n",
+ "\n",
+ " NumberOfTime30-59DaysPastDueNotWorse DebtRatio MonthlyIncome \\\n",
+ "0 2.0 0.802982 9120.0 \n",
+ "1 0.0 0.121876 2600.0 \n",
+ "2 1.0 0.085113 3042.0 \n",
+ "3 0.0 0.036050 3300.0 \n",
+ "4 1.0 0.024926 63588.0 \n",
+ "... ... ... ... \n",
+ "112910 0.0 0.404293 3400.0 \n",
+ "112911 0.0 0.225131 2100.0 \n",
+ "112912 0.0 0.716562 5584.0 \n",
+ "112913 0.0 0.000000 5716.0 \n",
+ "112914 0.0 0.249908 8158.0 \n",
+ "\n",
+ " NumberOfOpenCreditLinesAndLoans NumberOfTimes90DaysLate \\\n",
+ "0 13.0 0.0 \n",
+ "1 4.0 0.0 \n",
+ "2 2.0 1.0 \n",
+ "3 5.0 0.0 \n",
+ "4 7.0 0.0 \n",
+ "... ... ... \n",
+ "112910 7.0 0.0 \n",
+ "112911 4.0 0.0 \n",
+ "112912 4.0 0.0 \n",
+ "112913 4.0 0.0 \n",
+ "112914 8.0 0.0 \n",
+ "\n",
+ " NumberRealEstateLoansOrLines NumberOfTime60-89DaysPastDueNotWorse \\\n",
+ "0 6.0 0.0 \n",
+ "1 0.0 0.0 \n",
+ "2 0.0 0.0 \n",
+ "3 0.0 0.0 \n",
+ "4 1.0 0.0 \n",
+ "... ... ... \n",
+ "112910 0.0 0.0 \n",
+ "112911 1.0 0.0 \n",
+ "112912 1.0 0.0 \n",
+ "112913 0.0 0.0 \n",
+ "112914 2.0 0.0 \n",
+ "\n",
+ " NumberOfDependents \n",
+ "0 2.0 \n",
+ "1 1.0 \n",
+ "2 0.0 \n",
+ "3 0.0 \n",
+ "4 0.0 \n",
+ "... ... \n",
+ "112910 0.0 \n",
+ "112911 0.0 \n",
+ "112912 2.0 \n",
+ "112913 0.0 \n",
+ "112914 0.0 \n",
+ "\n",
+ "[108648 rows x 10 columns]"
+ ],
+ "text/html": [
+ "\n",
+ " \n",
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " RevolvingUtilizationOfUnsecuredLines \n",
+ " age \n",
+ " NumberOfTime30-59DaysPastDueNotWorse \n",
+ " DebtRatio \n",
+ " MonthlyIncome \n",
+ " NumberOfOpenCreditLinesAndLoans \n",
+ " NumberOfTimes90DaysLate \n",
+ " NumberRealEstateLoansOrLines \n",
+ " NumberOfTime60-89DaysPastDueNotWorse \n",
+ " NumberOfDependents \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 0.766127 \n",
+ " 45.0 \n",
+ " 2.0 \n",
+ " 0.802982 \n",
+ " 9120.0 \n",
+ " 13.0 \n",
+ " 0.0 \n",
+ " 6.0 \n",
+ " 0.0 \n",
+ " 2.0 \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 0.957151 \n",
+ " 40.0 \n",
+ " 0.0 \n",
+ " 0.121876 \n",
+ " 2600.0 \n",
+ " 4.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 0.658180 \n",
+ " 38.0 \n",
+ " 1.0 \n",
+ " 0.085113 \n",
+ " 3042.0 \n",
+ " 2.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 0.233810 \n",
+ " 30.0 \n",
+ " 0.0 \n",
+ " 0.036050 \n",
+ " 3300.0 \n",
+ " 5.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " 0.907239 \n",
+ " 49.0 \n",
+ " 1.0 \n",
+ " 0.024926 \n",
+ " 63588.0 \n",
+ " 7.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " \n",
+ " \n",
+ " 112910 \n",
+ " 0.385742 \n",
+ " 50.0 \n",
+ " 0.0 \n",
+ " 0.404293 \n",
+ " 3400.0 \n",
+ " 7.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 112911 \n",
+ " 0.040674 \n",
+ " 74.0 \n",
+ " 0.0 \n",
+ " 0.225131 \n",
+ " 2100.0 \n",
+ " 4.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 112912 \n",
+ " 0.299745 \n",
+ " 44.0 \n",
+ " 0.0 \n",
+ " 0.716562 \n",
+ " 5584.0 \n",
+ " 4.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 2.0 \n",
+ " \n",
+ " \n",
+ " 112913 \n",
+ " 0.000000 \n",
+ " 30.0 \n",
+ " 0.0 \n",
+ " 0.000000 \n",
+ " 5716.0 \n",
+ " 4.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 112914 \n",
+ " 0.850283 \n",
+ " 64.0 \n",
+ " 0.0 \n",
+ " 0.249908 \n",
+ " 8158.0 \n",
+ " 8.0 \n",
+ " 0.0 \n",
+ " 2.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
108648 rows × 10 columns
\n",
+ "
\n",
+ "
\n",
+ "
\n"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 10
+ }
+ ],
+ "source": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "ibsxca-zlCrv"
+ },
+ "source": [
+ "---\n",
+ "## 练习1:把数据切分成训练集和测试集\n",
+ "- 提示:from sklearn.model_selection import train_test_split('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 21,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ },
+ "id": "l68XolUflCrv",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "f4cf6577-fd69-4af7-ecb0-4dfed6bc538d"
+ },
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "((76053, 10), (32595, 10))"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 21
+ }
+ ],
+ "source": [
+ "from sklearn.model_selection import train_test_split\n",
+ "\n",
+ "X_train,X_test,y_train,y_test = train_test_split(X,y,random_state=0,test_size=0.3, shuffle=True,)\n",
+ "X_train.shape,X_test.shape\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "from sklearn.preprocessing import StandardScaler\n",
+ "from sklearn.metrics import classification_report\n",
+ "\n",
+ "scaler = StandardScaler()\n",
+ "scaler.fit(X_train)\n",
+ "X_train_std = scaler.transform(X_train)\n",
+ "X_test_std = scaler.transform(X_test)"
+ ],
+ "metadata": {
+ "id": "9v3D-D3Y07Ul"
+ },
+ "execution_count": 26,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "Q87tJ0J2lCrw"
+ },
+ "source": [
+ "----\n",
+ "## 练习2:使用logistic regression/决策树/SVM/KNN...等sklearn分类算法进行分类\n",
+ "尝试查sklearn API了解模型参数含义,调整不同的参数"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "Ca37sZB8lCrx"
+ },
+ "source": [
+ "### Logistic regression\n",
+ "- 提示:from sklearn import linear_model('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 34,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ },
+ "id": "XTm5aVdilCrx",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 75
+ },
+ "outputId": "36fe07bc-abc9-4231-bad6-d5689888800d"
+ },
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "LogisticRegression()"
+ ],
+ "text/html": [
+ "LogisticRegression() In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org. "
+ ]
+ },
+ "metadata": {},
+ "execution_count": 34
+ }
+ ],
+ "source": [
+ "from sklearn.linear_model import LogisticRegression\n",
+ "#模型评估\n",
+ "clf_LR = LogisticRegression()\n",
+ "clf_LR.fit(X_train_std,y_train)\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "-dAewE6flCrx"
+ },
+ "source": [
+ "### Decision Tree\n",
+ "- 提示:from sklearn.tree import DecisionTreeClassifier('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 35,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ },
+ "id": "IZTMFpZIlCrx",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 75
+ },
+ "outputId": "252e385d-c39a-4ca1-ec30-3893dddcf18e"
+ },
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "DecisionTreeClassifier()"
+ ],
+ "text/html": [
+ "DecisionTreeClassifier() In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org. "
+ ]
+ },
+ "metadata": {},
+ "execution_count": 35
+ }
+ ],
+ "source": [
+ "from sklearn.tree import DecisionTreeClassifier\n",
+ "\n",
+ "tree = DecisionTreeClassifier()\n",
+ "tree.fit(X_train_std,y_train)\n",
+ "\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "hKGUQCl3lCrx"
+ },
+ "source": [
+ "### Random Forest\n",
+ "- 提示:from sklearn.ensemble import RandomForestClassifier('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 36,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ },
+ "id": "LOEnf6iflCrx",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 75
+ },
+ "outputId": "23ff6f70-b172-4ad8-a9c0-b0de4aa4cfa9"
+ },
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "RandomForestClassifier()"
+ ],
+ "text/html": [
+ "RandomForestClassifier() In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org. "
+ ]
+ },
+ "metadata": {},
+ "execution_count": 36
+ }
+ ],
+ "source": [
+ "from sklearn.ensemble import RandomForestClassifier\n",
+ "\n",
+ "forest = RandomForestClassifier()\n",
+ "forest.fit(X_train_std,y_train)\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "oRasKTa7lCrx"
+ },
+ "source": [
+ "### SVM\n",
+ "- 提示:from sklearn.svm import SVC('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 37,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ },
+ "id": "jqs7danTlCrx",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 75
+ },
+ "outputId": "6fdade0e-3f83-4f24-913f-feca5471cc5a"
+ },
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "RandomForestClassifier()"
+ ],
+ "text/html": [
+ "RandomForestClassifier() In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org. "
+ ]
+ },
+ "metadata": {},
+ "execution_count": 37
+ }
+ ],
+ "source": [
+ "from sklearn.ensemble import RandomForestClassifier\n",
+ "\n",
+ "clf_svc = RandomForestClassifier()\n",
+ "clf_svc.fit(X_train_std,y_train)\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "HA9CxHGOlCrx"
+ },
+ "source": [
+ "### KNN\n",
+ "- 提示:from sklearn.neighbors import KNeighborsClassifier('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 38,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ },
+ "id": "d7MZ88l2lCry",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 75
+ },
+ "outputId": "35fad5aa-06b4-4944-ee71-b8f07fa56bf7"
+ },
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "RandomForestClassifier()"
+ ],
+ "text/html": [
+ "RandomForestClassifier() In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org. "
+ ]
+ },
+ "metadata": {},
+ "execution_count": 38
+ }
+ ],
+ "source": [
+ "from sklearn.ensemble import RandomForestClassifier\n",
+ "\n",
+ "neigh = RandomForestClassifier()\n",
+ "neigh.fit(X_train_std,y_train)\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "rN1I4x_1lCry"
+ },
+ "source": [
+ "---\n",
+ "\n",
+ "## 练习3:在测试集上进行预测,计算准确度"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "from sklearn.metrics import classification_report"
+ ],
+ "metadata": {
+ "id": "VIOTodNWAMf5"
+ },
+ "execution_count": 63,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "bRGwY3NJlCry"
+ },
+ "source": [
+ "### Logistic regression\n",
+ "- 提示:y_pred_LR = clf_LR.predict(x_test)('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 64,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ },
+ "id": "Au1nyS-AlCry",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "f23b7c51-9de0-440f-914e-b6be3ae45dbd"
+ },
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ " precision recall f1-score support\n",
+ "\n",
+ " 0 0.94 1.00 0.97 30424\n",
+ " 1 0.54 0.04 0.08 2171\n",
+ "\n",
+ " accuracy 0.93 32595\n",
+ " macro avg 0.74 0.52 0.52 32595\n",
+ "weighted avg 0.91 0.93 0.91 32595\n",
+ "\n"
+ ]
+ }
+ ],
+ "source": [
+ "## your code here\n",
+ "#模型评估\n",
+ "clf_LR_predictions = clf_LR.predict(X_test_std)\n",
+ "print(classification_report(y_test, clf_LR_predictions))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "UQIT323FlCry"
+ },
+ "source": [
+ "### Decision Tree\n",
+ "- 提示:y_pred_tree = tree.predict(x_test)('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 40,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ },
+ "id": "EPRAFTr-lCry",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "7755340b-f70b-430d-959e-f3e3f2f7d8c8"
+ },
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ " precision recall f1-score support\n",
+ "\n",
+ " 0 0.95 0.94 0.94 30424\n",
+ " 1 0.25 0.29 0.27 2171\n",
+ "\n",
+ " accuracy 0.89 32595\n",
+ " macro avg 0.60 0.62 0.61 32595\n",
+ "weighted avg 0.90 0.89 0.90 32595\n",
+ "\n"
+ ]
+ }
+ ],
+ "source": [
+ "## your code here\n",
+ "#模型评估\n",
+ "tree_predictions = tree.predict(X_test_std)\n",
+ "\n",
+ "print(classification_report(y_test, tree_predictions))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "MNlenJcJlCry"
+ },
+ "source": [
+ "### Random Forest\n",
+ "- 提示:y_pred_forest = forest.predict(x_test)('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 41,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ },
+ "id": "jEKV2DnQlCry",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "9e5a144c-10bb-44fe-ba8d-f02d256b2420"
+ },
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ " precision recall f1-score support\n",
+ "\n",
+ " 0 0.94 0.99 0.97 30424\n",
+ " 1 0.54 0.16 0.25 2171\n",
+ "\n",
+ " accuracy 0.93 32595\n",
+ " macro avg 0.74 0.57 0.61 32595\n",
+ "weighted avg 0.92 0.93 0.92 32595\n",
+ "\n"
+ ]
+ }
+ ],
+ "source": [
+ "## your code here\n",
+ "#模型评估\n",
+ "forest_predictions = forest.predict(X_test_std)\n",
+ "\n",
+ "print(classification_report(y_test, forest_predictions))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "mbLLBUbylCry"
+ },
+ "source": [
+ "### SVM\n",
+ "- 提示:y_pred_SVC = clf_svc.predict(x_test)('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 42,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ },
+ "id": "_3my9tQZlCry",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "83eddee4-474f-4ff5-c214-2b9f8286067a"
+ },
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ " precision recall f1-score support\n",
+ "\n",
+ " 0 0.94 0.99 0.97 30424\n",
+ " 1 0.56 0.17 0.26 2171\n",
+ "\n",
+ " accuracy 0.94 32595\n",
+ " macro avg 0.75 0.58 0.61 32595\n",
+ "weighted avg 0.92 0.94 0.92 32595\n",
+ "\n"
+ ]
+ }
+ ],
+ "source": [
+ "## your code here\n",
+ "#模型评估\n",
+ "clf_svc_predictions = clf_svc.predict(X_test_std)\n",
+ "\n",
+ "print(classification_report(y_test, clf_svc_predictions))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "s7FRhxBplCrz"
+ },
+ "source": [
+ "### KNN\n",
+ "- 提示:y_pred_KNN = neigh.predict(x_test)('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 43,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ },
+ "id": "rJ-6hv40lCrz",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "f26c9c36-cfeb-4f1b-e4c1-b1f82c44a285"
+ },
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ " precision recall f1-score support\n",
+ "\n",
+ " 0 0.94 0.99 0.97 30424\n",
+ " 1 0.53 0.17 0.25 2171\n",
+ "\n",
+ " accuracy 0.93 32595\n",
+ " macro avg 0.73 0.58 0.61 32595\n",
+ "weighted avg 0.92 0.93 0.92 32595\n",
+ "\n"
+ ]
+ }
+ ],
+ "source": [
+ "## your code here\n",
+ "#模型评估\n",
+ "neigh_predictions = neigh.predict(X_test_std)\n",
+ "\n",
+ "print(classification_report(y_test, neigh_predictions))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "jvp_0-3wlCrz"
+ },
+ "source": [
+ "---\n",
+ "## 练习4:查看sklearn的官方说明,了解分类问题的评估标准,并对此例进行评估"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "9UhwdZ79lCrz"
+ },
+ "source": [
+ "**混淆矩阵(Confusion Matrix)相关学习链接**\n",
+ "\n",
+ "- Blog: \n",
+ "http://blog.csdn.net/vesper305/article/details/44927047 \n",
+ "- WiKi: \n",
+ "http://en.wikipedia.org/wiki/Confusion_matrix \n",
+ "- sklearn doc: \n",
+ "http://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 56,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ },
+ "id": "w6QhQynrlCrz",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 1000
+ },
+ "outputId": "474fef12-eb12-4722-fb03-3361f3fc659e"
+ },
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stderr",
+ "text": [
+ "/usr/local/lib/python3.10/dist-packages/seaborn/utils.py:80: UserWarning: Glyph 23454 (\\N{CJK UNIFIED IDEOGRAPH-5B9E}) missing from current font.\n",
+ " fig.canvas.draw()\n",
+ "/usr/local/lib/python3.10/dist-packages/seaborn/utils.py:80: UserWarning: Glyph 38469 (\\N{CJK UNIFIED IDEOGRAPH-9645}) missing from current font.\n",
+ " fig.canvas.draw()\n",
+ "/usr/local/lib/python3.10/dist-packages/seaborn/utils.py:80: UserWarning: Glyph 20540 (\\N{CJK UNIFIED IDEOGRAPH-503C}) missing from current font.\n",
+ " fig.canvas.draw()\n",
+ "/usr/local/lib/python3.10/dist-packages/seaborn/utils.py:80: UserWarning: Glyph 28151 (\\N{CJK UNIFIED IDEOGRAPH-6DF7}) missing from current font.\n",
+ " fig.canvas.draw()\n",
+ "/usr/local/lib/python3.10/dist-packages/seaborn/utils.py:80: UserWarning: Glyph 28102 (\\N{CJK UNIFIED IDEOGRAPH-6DC6}) missing from current font.\n",
+ " fig.canvas.draw()\n",
+ "/usr/local/lib/python3.10/dist-packages/seaborn/utils.py:80: UserWarning: Glyph 30697 (\\N{CJK UNIFIED IDEOGRAPH-77E9}) missing from current font.\n",
+ " fig.canvas.draw()\n",
+ "/usr/local/lib/python3.10/dist-packages/seaborn/utils.py:80: UserWarning: Glyph 38453 (\\N{CJK UNIFIED IDEOGRAPH-9635}) missing from current font.\n",
+ " fig.canvas.draw()\n",
+ "/usr/local/lib/python3.10/dist-packages/seaborn/utils.py:80: UserWarning: Glyph 39044 (\\N{CJK UNIFIED IDEOGRAPH-9884}) missing from current font.\n",
+ " fig.canvas.draw()\n",
+ "/usr/local/lib/python3.10/dist-packages/seaborn/utils.py:80: UserWarning: Glyph 27979 (\\N{CJK UNIFIED IDEOGRAPH-6D4B}) missing from current font.\n",
+ " fig.canvas.draw()\n",
+ "/usr/local/lib/python3.10/dist-packages/IPython/core/events.py:89: UserWarning: Glyph 39044 (\\N{CJK UNIFIED IDEOGRAPH-9884}) missing from current font.\n",
+ " func(*args, **kwargs)\n",
+ "/usr/local/lib/python3.10/dist-packages/IPython/core/events.py:89: UserWarning: Glyph 27979 (\\N{CJK UNIFIED IDEOGRAPH-6D4B}) missing from current font.\n",
+ " func(*args, **kwargs)\n",
+ "/usr/local/lib/python3.10/dist-packages/IPython/core/events.py:89: UserWarning: Glyph 20540 (\\N{CJK UNIFIED IDEOGRAPH-503C}) missing from current font.\n",
+ " func(*args, **kwargs)\n",
+ "/usr/local/lib/python3.10/dist-packages/IPython/core/events.py:89: UserWarning: Glyph 23454 (\\N{CJK UNIFIED IDEOGRAPH-5B9E}) missing from current font.\n",
+ " func(*args, **kwargs)\n",
+ "/usr/local/lib/python3.10/dist-packages/IPython/core/events.py:89: UserWarning: Glyph 38469 (\\N{CJK UNIFIED IDEOGRAPH-9645}) missing from current font.\n",
+ " func(*args, **kwargs)\n",
+ "/usr/local/lib/python3.10/dist-packages/IPython/core/events.py:89: UserWarning: Glyph 28151 (\\N{CJK UNIFIED IDEOGRAPH-6DF7}) missing from current font.\n",
+ " func(*args, **kwargs)\n",
+ "/usr/local/lib/python3.10/dist-packages/IPython/core/events.py:89: UserWarning: Glyph 28102 (\\N{CJK UNIFIED IDEOGRAPH-6DC6}) missing from current font.\n",
+ " func(*args, **kwargs)\n",
+ "/usr/local/lib/python3.10/dist-packages/IPython/core/events.py:89: UserWarning: Glyph 30697 (\\N{CJK UNIFIED IDEOGRAPH-77E9}) missing from current font.\n",
+ " func(*args, **kwargs)\n",
+ "/usr/local/lib/python3.10/dist-packages/IPython/core/events.py:89: UserWarning: Glyph 38453 (\\N{CJK UNIFIED IDEOGRAPH-9635}) missing from current font.\n",
+ " func(*args, **kwargs)\n",
+ "/usr/local/lib/python3.10/dist-packages/IPython/core/pylabtools.py:151: UserWarning: Glyph 23454 (\\N{CJK UNIFIED IDEOGRAPH-5B9E}) missing from current font.\n",
+ " fig.canvas.print_figure(bytes_io, **kw)\n",
+ "/usr/local/lib/python3.10/dist-packages/IPython/core/pylabtools.py:151: UserWarning: Glyph 38469 (\\N{CJK UNIFIED IDEOGRAPH-9645}) missing from current font.\n",
+ " fig.canvas.print_figure(bytes_io, **kw)\n",
+ "/usr/local/lib/python3.10/dist-packages/IPython/core/pylabtools.py:151: UserWarning: Glyph 20540 (\\N{CJK UNIFIED IDEOGRAPH-503C}) missing from current font.\n",
+ " fig.canvas.print_figure(bytes_io, **kw)\n",
+ "/usr/local/lib/python3.10/dist-packages/IPython/core/pylabtools.py:151: UserWarning: Glyph 28151 (\\N{CJK UNIFIED IDEOGRAPH-6DF7}) missing from current font.\n",
+ " fig.canvas.print_figure(bytes_io, **kw)\n",
+ "/usr/local/lib/python3.10/dist-packages/IPython/core/pylabtools.py:151: UserWarning: Glyph 28102 (\\N{CJK UNIFIED IDEOGRAPH-6DC6}) missing from current font.\n",
+ " fig.canvas.print_figure(bytes_io, **kw)\n",
+ "/usr/local/lib/python3.10/dist-packages/IPython/core/pylabtools.py:151: UserWarning: Glyph 30697 (\\N{CJK UNIFIED IDEOGRAPH-77E9}) missing from current font.\n",
+ " fig.canvas.print_figure(bytes_io, **kw)\n",
+ "/usr/local/lib/python3.10/dist-packages/IPython/core/pylabtools.py:151: UserWarning: Glyph 38453 (\\N{CJK UNIFIED IDEOGRAPH-9635}) missing from current font.\n",
+ " fig.canvas.print_figure(bytes_io, **kw)\n",
+ "/usr/local/lib/python3.10/dist-packages/IPython/core/pylabtools.py:151: UserWarning: Glyph 39044 (\\N{CJK UNIFIED IDEOGRAPH-9884}) missing from current font.\n",
+ " fig.canvas.print_figure(bytes_io, **kw)\n",
+ "/usr/local/lib/python3.10/dist-packages/IPython/core/pylabtools.py:151: UserWarning: Glyph 27979 (\\N{CJK UNIFIED IDEOGRAPH-6D4B}) missing from current font.\n",
+ " fig.canvas.print_figure(bytes_io, **kw)\n"
+ ]
+ },
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": [
+ ""
+ ],
+ "image/png": "\n"
+ },
+ "metadata": {}
+ }
+ ],
+ "source": [
+ "## your code here\n",
+ "from sklearn.metrics import confusion_matrix, classification_report\n",
+ "import seaborn as sns\n",
+ "import matplotlib.pyplot as plt\n",
+ "\n",
+ "\n",
+ "clf_LR_cm = confusion_matrix(y_test, clf_LR_predictions)\n",
+ "tree_cm = confusion_matrix(y_test, tree_predictions)\n",
+ "forest_cm = confusion_matrix(y_test, forest_predictions)\n",
+ "clf_svc_cm = confusion_matrix(y_test, clf_svc_predictions)\n",
+ "neigh_cm = confusion_matrix(y_test, neigh_predictions)\n",
+ "cm_list = [clf_LR_cm,tree_cm,forest_cm,clf_svc_cm,neigh_cm]\n",
+ "model_title = ['Logistic regression', 'Decision Tree', 'Random Forest', 'SVM', 'KNN']\n",
+ "plt.figure(figsize=(10, 10))\n",
+ "n=1\n",
+ "for i in range(len(cm_list)):\n",
+ " plt.subplot(3,2,n)\n",
+ " sns.heatmap(cm_list[i], annot=True, fmt=\"d\")\n",
+ " plt.title(model_title[i] +\"混淆矩阵\")\n",
+ " plt.ylabel('实际值')\n",
+ " plt.xlabel('预测值')\n",
+ "\n",
+ " n+=1\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "5MMIBxlClCr8"
+ },
+ "source": [
+ "## 练习5:调整模型的标准\n",
+ "\n",
+ "银行通常会有更严格的要求,因为fraud带来的后果通常比较严重,一般我们会调整模型的标准。 \n",
+ "\n",
+ "比如在logistic regression当中,一般我们的概率判定边界为0.5,但是我们可以把阈值设定低一些,来提高模型的“敏感度”,试试看把阈值设定为0.3,再看看这时的评估指标(主要是准确率和召回率)。\n",
+ "\n",
+ "- 提示:sklearn的很多分类模型,predict_prob可以拿到预估的概率,可以根据它和设定的阈值大小去判断最终结果(分类类别)('Gender') "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 67,
+ "metadata": {
+ "collapsed": true,
+ "jupyter": {
+ "outputs_hidden": true
+ },
+ "id": "r8awlD5ilCr8",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 1000
+ },
+ "outputId": "c8050295-e767-446b-9874-e195ea987242"
+ },
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stderr",
+ "text": [
+ "/usr/local/lib/python3.10/dist-packages/sklearn/base.py:432: UserWarning: X has feature names, but LogisticRegression was fitted without feature names\n",
+ " warnings.warn(\n",
+ "/usr/local/lib/python3.10/dist-packages/sklearn/metrics/_classification.py:1344: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.\n",
+ " _warn_prf(average, modifier, msg_start, len(result))\n",
+ "/usr/local/lib/python3.10/dist-packages/sklearn/metrics/_classification.py:1344: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.\n",
+ " _warn_prf(average, modifier, msg_start, len(result))\n",
+ "/usr/local/lib/python3.10/dist-packages/sklearn/metrics/_classification.py:1344: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.\n",
+ " _warn_prf(average, modifier, msg_start, len(result))\n"
+ ]
+ },
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ " precision recall f1-score support\n",
+ "\n",
+ " 0 0.93 1.00 0.97 30424\n",
+ " 1 0.00 0.00 0.00 2171\n",
+ "\n",
+ " accuracy 0.93 32595\n",
+ " macro avg 0.47 0.50 0.48 32595\n",
+ "weighted avg 0.87 0.93 0.90 32595\n",
+ "\n"
+ ]
+ },
+ {
+ "output_type": "stream",
+ "name": "stderr",
+ "text": [
+ "/usr/local/lib/python3.10/dist-packages/IPython/core/pylabtools.py:151: UserWarning: Glyph 23454 (\\N{CJK UNIFIED IDEOGRAPH-5B9E}) missing from current font.\n",
+ " fig.canvas.print_figure(bytes_io, **kw)\n",
+ "/usr/local/lib/python3.10/dist-packages/IPython/core/pylabtools.py:151: UserWarning: Glyph 38469 (\\N{CJK UNIFIED IDEOGRAPH-9645}) missing from current font.\n",
+ " fig.canvas.print_figure(bytes_io, **kw)\n",
+ "/usr/local/lib/python3.10/dist-packages/IPython/core/pylabtools.py:151: UserWarning: Glyph 20540 (\\N{CJK UNIFIED IDEOGRAPH-503C}) missing from current font.\n",
+ " fig.canvas.print_figure(bytes_io, **kw)\n",
+ "/usr/local/lib/python3.10/dist-packages/IPython/core/pylabtools.py:151: UserWarning: Glyph 28151 (\\N{CJK UNIFIED IDEOGRAPH-6DF7}) missing from current font.\n",
+ " fig.canvas.print_figure(bytes_io, **kw)\n",
+ "/usr/local/lib/python3.10/dist-packages/IPython/core/pylabtools.py:151: UserWarning: Glyph 28102 (\\N{CJK UNIFIED IDEOGRAPH-6DC6}) missing from current font.\n",
+ " fig.canvas.print_figure(bytes_io, **kw)\n",
+ "/usr/local/lib/python3.10/dist-packages/IPython/core/pylabtools.py:151: UserWarning: Glyph 30697 (\\N{CJK UNIFIED IDEOGRAPH-77E9}) missing from current font.\n",
+ " fig.canvas.print_figure(bytes_io, **kw)\n",
+ "/usr/local/lib/python3.10/dist-packages/IPython/core/pylabtools.py:151: UserWarning: Glyph 38453 (\\N{CJK UNIFIED IDEOGRAPH-9635}) missing from current font.\n",
+ " fig.canvas.print_figure(bytes_io, **kw)\n",
+ "/usr/local/lib/python3.10/dist-packages/IPython/core/pylabtools.py:151: UserWarning: Glyph 39044 (\\N{CJK UNIFIED IDEOGRAPH-9884}) missing from current font.\n",
+ " fig.canvas.print_figure(bytes_io, **kw)\n",
+ "/usr/local/lib/python3.10/dist-packages/IPython/core/pylabtools.py:151: UserWarning: Glyph 27979 (\\N{CJK UNIFIED IDEOGRAPH-6D4B}) missing from current font.\n",
+ " fig.canvas.print_figure(bytes_io, **kw)\n"
+ ]
+ },
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": [
+ ""
+ ],
+ "image/png": "\n"
+ },
+ "metadata": {}
+ },
+ {
+ "output_type": "stream",
+ "name": "stderr",
+ "text": [
+ "/usr/local/lib/python3.10/dist-packages/sklearn/base.py:432: UserWarning: X has feature names, but DecisionTreeClassifier was fitted without feature names\n",
+ " warnings.warn(\n",
+ "/usr/local/lib/python3.10/dist-packages/IPython/core/pylabtools.py:151: UserWarning: Glyph 23454 (\\N{CJK UNIFIED IDEOGRAPH-5B9E}) missing from current font.\n",
+ " fig.canvas.print_figure(bytes_io, **kw)\n",
+ "/usr/local/lib/python3.10/dist-packages/IPython/core/pylabtools.py:151: UserWarning: Glyph 38469 (\\N{CJK UNIFIED IDEOGRAPH-9645}) missing from current font.\n",
+ " fig.canvas.print_figure(bytes_io, **kw)\n",
+ "/usr/local/lib/python3.10/dist-packages/IPython/core/pylabtools.py:151: UserWarning: Glyph 20540 (\\N{CJK UNIFIED IDEOGRAPH-503C}) missing from current font.\n",
+ " fig.canvas.print_figure(bytes_io, **kw)\n",
+ "/usr/local/lib/python3.10/dist-packages/IPython/core/pylabtools.py:151: UserWarning: Glyph 28151 (\\N{CJK UNIFIED IDEOGRAPH-6DF7}) missing from current font.\n",
+ " fig.canvas.print_figure(bytes_io, **kw)\n",
+ "/usr/local/lib/python3.10/dist-packages/IPython/core/pylabtools.py:151: UserWarning: Glyph 28102 (\\N{CJK UNIFIED IDEOGRAPH-6DC6}) missing from current font.\n",
+ " fig.canvas.print_figure(bytes_io, **kw)\n",
+ "/usr/local/lib/python3.10/dist-packages/IPython/core/pylabtools.py:151: UserWarning: Glyph 30697 (\\N{CJK UNIFIED IDEOGRAPH-77E9}) missing from current font.\n",
+ " fig.canvas.print_figure(bytes_io, **kw)\n",
+ "/usr/local/lib/python3.10/dist-packages/IPython/core/pylabtools.py:151: UserWarning: Glyph 38453 (\\N{CJK UNIFIED IDEOGRAPH-9635}) missing from current font.\n",
+ " fig.canvas.print_figure(bytes_io, **kw)\n",
+ "/usr/local/lib/python3.10/dist-packages/IPython/core/pylabtools.py:151: UserWarning: Glyph 39044 (\\N{CJK UNIFIED IDEOGRAPH-9884}) missing from current font.\n",
+ " fig.canvas.print_figure(bytes_io, **kw)\n",
+ "/usr/local/lib/python3.10/dist-packages/IPython/core/pylabtools.py:151: UserWarning: Glyph 27979 (\\N{CJK UNIFIED IDEOGRAPH-6D4B}) missing from current font.\n",
+ " fig.canvas.print_figure(bytes_io, **kw)\n"
+ ]
+ },
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ " precision recall f1-score support\n",
+ "\n",
+ " 0 0.94 0.90 0.92 30424\n",
+ " 1 0.09 0.13 0.11 2171\n",
+ "\n",
+ " accuracy 0.85 32595\n",
+ " macro avg 0.51 0.52 0.51 32595\n",
+ "weighted avg 0.88 0.85 0.86 32595\n",
+ "\n"
+ ]
+ },
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": [
+ ""
+ ],
+ "image/png": "\n"
+ },
+ "metadata": {}
+ },
+ {
+ "output_type": "stream",
+ "name": "stderr",
+ "text": [
+ "/usr/local/lib/python3.10/dist-packages/sklearn/base.py:432: UserWarning: X has feature names, but RandomForestClassifier was fitted without feature names\n",
+ " warnings.warn(\n"
+ ]
+ },
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ " precision recall f1-score support\n",
+ "\n",
+ " 0 0.97 0.84 0.90 30424\n",
+ " 1 0.23 0.65 0.34 2171\n",
+ "\n",
+ " accuracy 0.83 32595\n",
+ " macro avg 0.60 0.75 0.62 32595\n",
+ "weighted avg 0.92 0.83 0.86 32595\n",
+ "\n"
+ ]
+ },
+ {
+ "output_type": "stream",
+ "name": "stderr",
+ "text": [
+ "/usr/local/lib/python3.10/dist-packages/IPython/core/pylabtools.py:151: UserWarning: Glyph 23454 (\\N{CJK UNIFIED IDEOGRAPH-5B9E}) missing from current font.\n",
+ " fig.canvas.print_figure(bytes_io, **kw)\n",
+ "/usr/local/lib/python3.10/dist-packages/IPython/core/pylabtools.py:151: UserWarning: Glyph 38469 (\\N{CJK UNIFIED IDEOGRAPH-9645}) missing from current font.\n",
+ " fig.canvas.print_figure(bytes_io, **kw)\n",
+ "/usr/local/lib/python3.10/dist-packages/IPython/core/pylabtools.py:151: UserWarning: Glyph 20540 (\\N{CJK UNIFIED IDEOGRAPH-503C}) missing from current font.\n",
+ " fig.canvas.print_figure(bytes_io, **kw)\n",
+ "/usr/local/lib/python3.10/dist-packages/IPython/core/pylabtools.py:151: UserWarning: Glyph 28151 (\\N{CJK UNIFIED IDEOGRAPH-6DF7}) missing from current font.\n",
+ " fig.canvas.print_figure(bytes_io, **kw)\n",
+ "/usr/local/lib/python3.10/dist-packages/IPython/core/pylabtools.py:151: UserWarning: Glyph 28102 (\\N{CJK UNIFIED IDEOGRAPH-6DC6}) missing from current font.\n",
+ " fig.canvas.print_figure(bytes_io, **kw)\n",
+ "/usr/local/lib/python3.10/dist-packages/IPython/core/pylabtools.py:151: UserWarning: Glyph 30697 (\\N{CJK UNIFIED IDEOGRAPH-77E9}) missing from current font.\n",
+ " fig.canvas.print_figure(bytes_io, **kw)\n",
+ "/usr/local/lib/python3.10/dist-packages/IPython/core/pylabtools.py:151: UserWarning: Glyph 38453 (\\N{CJK UNIFIED IDEOGRAPH-9635}) missing from current font.\n",
+ " fig.canvas.print_figure(bytes_io, **kw)\n",
+ "/usr/local/lib/python3.10/dist-packages/IPython/core/pylabtools.py:151: UserWarning: Glyph 39044 (\\N{CJK UNIFIED IDEOGRAPH-9884}) missing from current font.\n",
+ " fig.canvas.print_figure(bytes_io, **kw)\n",
+ "/usr/local/lib/python3.10/dist-packages/IPython/core/pylabtools.py:151: UserWarning: Glyph 27979 (\\N{CJK UNIFIED IDEOGRAPH-6D4B}) missing from current font.\n",
+ " fig.canvas.print_figure(bytes_io, **kw)\n"
+ ]
+ },
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": [
+ ""
+ ],
+ "image/png": "\n"
+ },
+ "metadata": {}
+ },
+ {
+ "output_type": "stream",
+ "name": "stderr",
+ "text": [
+ "/usr/local/lib/python3.10/dist-packages/sklearn/base.py:432: UserWarning: X has feature names, but RandomForestClassifier was fitted without feature names\n",
+ " warnings.warn(\n"
+ ]
+ },
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ " precision recall f1-score support\n",
+ "\n",
+ " 0 0.97 0.85 0.91 30424\n",
+ " 1 0.23 0.64 0.34 2171\n",
+ "\n",
+ " accuracy 0.84 32595\n",
+ " macro avg 0.60 0.75 0.62 32595\n",
+ "weighted avg 0.92 0.84 0.87 32595\n",
+ "\n"
+ ]
+ },
+ {
+ "output_type": "stream",
+ "name": "stderr",
+ "text": [
+ "/usr/local/lib/python3.10/dist-packages/IPython/core/pylabtools.py:151: UserWarning: Glyph 23454 (\\N{CJK UNIFIED IDEOGRAPH-5B9E}) missing from current font.\n",
+ " fig.canvas.print_figure(bytes_io, **kw)\n",
+ "/usr/local/lib/python3.10/dist-packages/IPython/core/pylabtools.py:151: UserWarning: Glyph 38469 (\\N{CJK UNIFIED IDEOGRAPH-9645}) missing from current font.\n",
+ " fig.canvas.print_figure(bytes_io, **kw)\n",
+ "/usr/local/lib/python3.10/dist-packages/IPython/core/pylabtools.py:151: UserWarning: Glyph 20540 (\\N{CJK UNIFIED IDEOGRAPH-503C}) missing from current font.\n",
+ " fig.canvas.print_figure(bytes_io, **kw)\n",
+ "/usr/local/lib/python3.10/dist-packages/IPython/core/pylabtools.py:151: UserWarning: Glyph 28151 (\\N{CJK UNIFIED IDEOGRAPH-6DF7}) missing from current font.\n",
+ " fig.canvas.print_figure(bytes_io, **kw)\n",
+ "/usr/local/lib/python3.10/dist-packages/IPython/core/pylabtools.py:151: UserWarning: Glyph 28102 (\\N{CJK UNIFIED IDEOGRAPH-6DC6}) missing from current font.\n",
+ " fig.canvas.print_figure(bytes_io, **kw)\n",
+ "/usr/local/lib/python3.10/dist-packages/IPython/core/pylabtools.py:151: UserWarning: Glyph 30697 (\\N{CJK UNIFIED IDEOGRAPH-77E9}) missing from current font.\n",
+ " fig.canvas.print_figure(bytes_io, **kw)\n",
+ "/usr/local/lib/python3.10/dist-packages/IPython/core/pylabtools.py:151: UserWarning: Glyph 38453 (\\N{CJK UNIFIED IDEOGRAPH-9635}) missing from current font.\n",
+ " fig.canvas.print_figure(bytes_io, **kw)\n",
+ "/usr/local/lib/python3.10/dist-packages/IPython/core/pylabtools.py:151: UserWarning: Glyph 39044 (\\N{CJK UNIFIED IDEOGRAPH-9884}) missing from current font.\n",
+ " fig.canvas.print_figure(bytes_io, **kw)\n",
+ "/usr/local/lib/python3.10/dist-packages/IPython/core/pylabtools.py:151: UserWarning: Glyph 27979 (\\N{CJK UNIFIED IDEOGRAPH-6D4B}) missing from current font.\n",
+ " fig.canvas.print_figure(bytes_io, **kw)\n"
+ ]
+ },
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": [
+ ""
+ ],
+ "image/png": "\n"
+ },
+ "metadata": {}
+ },
+ {
+ "output_type": "stream",
+ "name": "stderr",
+ "text": [
+ "/usr/local/lib/python3.10/dist-packages/sklearn/base.py:432: UserWarning: X has feature names, but RandomForestClassifier was fitted without feature names\n",
+ " warnings.warn(\n"
+ ]
+ },
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ " precision recall f1-score support\n",
+ "\n",
+ " 0 0.97 0.83 0.89 30424\n",
+ " 1 0.22 0.67 0.33 2171\n",
+ "\n",
+ " accuracy 0.82 32595\n",
+ " macro avg 0.59 0.75 0.61 32595\n",
+ "weighted avg 0.92 0.82 0.86 32595\n",
+ "\n"
+ ]
+ },
+ {
+ "output_type": "stream",
+ "name": "stderr",
+ "text": [
+ "/usr/local/lib/python3.10/dist-packages/IPython/core/pylabtools.py:151: UserWarning: Glyph 23454 (\\N{CJK UNIFIED IDEOGRAPH-5B9E}) missing from current font.\n",
+ " fig.canvas.print_figure(bytes_io, **kw)\n",
+ "/usr/local/lib/python3.10/dist-packages/IPython/core/pylabtools.py:151: UserWarning: Glyph 38469 (\\N{CJK UNIFIED IDEOGRAPH-9645}) missing from current font.\n",
+ " fig.canvas.print_figure(bytes_io, **kw)\n",
+ "/usr/local/lib/python3.10/dist-packages/IPython/core/pylabtools.py:151: UserWarning: Glyph 20540 (\\N{CJK UNIFIED IDEOGRAPH-503C}) missing from current font.\n",
+ " fig.canvas.print_figure(bytes_io, **kw)\n",
+ "/usr/local/lib/python3.10/dist-packages/IPython/core/pylabtools.py:151: UserWarning: Glyph 28151 (\\N{CJK UNIFIED IDEOGRAPH-6DF7}) missing from current font.\n",
+ " fig.canvas.print_figure(bytes_io, **kw)\n",
+ "/usr/local/lib/python3.10/dist-packages/IPython/core/pylabtools.py:151: UserWarning: Glyph 28102 (\\N{CJK UNIFIED IDEOGRAPH-6DC6}) missing from current font.\n",
+ " fig.canvas.print_figure(bytes_io, **kw)\n",
+ "/usr/local/lib/python3.10/dist-packages/IPython/core/pylabtools.py:151: UserWarning: Glyph 30697 (\\N{CJK UNIFIED IDEOGRAPH-77E9}) missing from current font.\n",
+ " fig.canvas.print_figure(bytes_io, **kw)\n",
+ "/usr/local/lib/python3.10/dist-packages/IPython/core/pylabtools.py:151: UserWarning: Glyph 38453 (\\N{CJK UNIFIED IDEOGRAPH-9635}) missing from current font.\n",
+ " fig.canvas.print_figure(bytes_io, **kw)\n",
+ "/usr/local/lib/python3.10/dist-packages/IPython/core/pylabtools.py:151: UserWarning: Glyph 39044 (\\N{CJK UNIFIED IDEOGRAPH-9884}) missing from current font.\n",
+ " fig.canvas.print_figure(bytes_io, **kw)\n",
+ "/usr/local/lib/python3.10/dist-packages/IPython/core/pylabtools.py:151: UserWarning: Glyph 27979 (\\N{CJK UNIFIED IDEOGRAPH-6D4B}) missing from current font.\n",
+ " fig.canvas.print_figure(bytes_io, **kw)\n"
+ ]
+ },
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": [
+ ""
+ ],
+ "image/png": "\n"
+ },
+ "metadata": {}
+ }
+ ],
+ "source": [
+ "## your code here\n",
+ "import numpy as np\n",
+ "\n",
+ "def PredictProb_Func(model,strs):\n",
+ "# 获取测试集上的预测概率\n",
+ " probabilities = model.predict_proba(X_test)\n",
+ "\n",
+ " # 将概率转换为基于阈值的预测\n",
+ " # 阈值设置为 0.3\n",
+ " threshold = 0.3\n",
+ " predictions = np.where(probabilities[:, 1] > threshold, 1, 0)\n",
+ "\n",
+ " cm = confusion_matrix(y_test, predictions)\n",
+ " print(classification_report(y_test, predictions))\n",
+ "\n",
+ " # 使用 Seaborn 绘制混淆矩阵的热图\n",
+ " plt.figure(figsize=(10, 7))\n",
+ " sns.heatmap(cm, annot=True, fmt=\"d\")\n",
+ " plt.title(strs + \"混淆矩阵\")\n",
+ " plt.ylabel('实际值')\n",
+ " plt.xlabel('预测值')\n",
+ " plt.show()\n",
+ "models = [clf_LR,tree,forest,clf_svc,neigh]\n",
+ "model_title = ['Logistic regression', 'Decision Tree', 'Random Forest', 'SVM', 'KNN']\n",
+ "for i in range(len(models)):\n",
+ " PredictProb_Func(models[i],strs=model_title[i])"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "ICTP-AP",
+ "language": "python",
+ "name": "ictp-ap"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.10.13"
+ },
+ "colab": {
+ "provenance": []
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
\ No newline at end of file
diff --git a/2023/homework/shanleilei/homework_credit_scoring_finetune_ensemble (1).ipynb b/2023/homework/shanleilei/homework_credit_scoring_finetune_ensemble (1).ipynb
new file mode 100644
index 00000000..f525356c
--- /dev/null
+++ b/2023/homework/shanleilei/homework_credit_scoring_finetune_ensemble (1).ipynb
@@ -0,0 +1,4564 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "k_CLEMjrMAqJ"
+ },
+ "source": [
+ "# 一起来打怪之 Credit Scoring 练习"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "Z0OW7RM-MFT8",
+ "outputId": "72d80441-4c4f-4c58-ee99-c7dfdc090ba7"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Mounted at /content/drive\n"
+ ]
+ }
+ ],
+ "source": [
+ "from google.colab import drive\n",
+ "drive.mount('/content/drive')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "5V6KTIVDMvOq",
+ "outputId": "3e1e6d3c-6445-400e-b244-1c914ba74d61"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "/home\n",
+ "/home\n"
+ ]
+ }
+ ],
+ "source": [
+ "%cd /home\n",
+ "!pwd"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "mG4nJrowMAqN"
+ },
+ "source": [
+ "-------\n",
+ "## >>>说明:\n",
+ "### 1. 答题步骤:\n",
+ "- 回答问题**请保留每一步**操作过程,请不要仅仅给出最后答案\n",
+ "- 请养成代码注释的好习惯\n",
+ "\n",
+ "### 2. 解题思路:\n",
+ "- 为方便大家准确理解题目,在习题实战中有所收获,本文档提供了解题思路提示\n",
+ "- 解题思路**仅供参考**,鼓励原创解题方法\n",
+ "- 为督促同学们自己思考,解题思路内容设置为**注释**,请注意查看\n",
+ "\n",
+ "### 3. 所用数据:\n",
+ "- 问题使用了多个数据库,请注意导入每个数据库后都先**查看和了解数据的基本性质**,后面的问题不再一一提醒"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "-pqLruT0MAqO"
+ },
+ "source": [
+ "--------\n",
+ "## 操作题"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "LqukVKStMAqO"
+ },
+ "source": [
+ "### 信用卡欺诈项目"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "wi25E5aoMAqO"
+ },
+ "source": [
+ " #### 前期数据导入,预览及处理(此部分勿修改,涉及的数据文件无需复制移动)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 224
+ },
+ "id": "lgp-DAsbMAqP",
+ "outputId": "058ae1bc-ed01-4dae-bdd2-e13c8cca3df7",
+ "tags": []
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ " \n",
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " SeriousDlqin2yrs \n",
+ " RevolvingUtilizationOfUnsecuredLines \n",
+ " age \n",
+ " NumberOfTime30-59DaysPastDueNotWorse \n",
+ " DebtRatio \n",
+ " MonthlyIncome \n",
+ " NumberOfOpenCreditLinesAndLoans \n",
+ " NumberOfTimes90DaysLate \n",
+ " NumberRealEstateLoansOrLines \n",
+ " NumberOfTime60-89DaysPastDueNotWorse \n",
+ " NumberOfDependents \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 1 \n",
+ " 0.766127 \n",
+ " 45.0 \n",
+ " 2.0 \n",
+ " 0.802982 \n",
+ " 9120.0 \n",
+ " 13.0 \n",
+ " 0.0 \n",
+ " 6.0 \n",
+ " 0.0 \n",
+ " 2.0 \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 0 \n",
+ " 0.957151 \n",
+ " 40.0 \n",
+ " 0.0 \n",
+ " 0.121876 \n",
+ " 2600.0 \n",
+ " 4.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 0 \n",
+ " 0.658180 \n",
+ " 38.0 \n",
+ " 1.0 \n",
+ " 0.085113 \n",
+ " 3042.0 \n",
+ " 2.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 0 \n",
+ " 0.233810 \n",
+ " 30.0 \n",
+ " 0.0 \n",
+ " 0.036050 \n",
+ " 3300.0 \n",
+ " 5.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " 0 \n",
+ " 0.907239 \n",
+ " 49.0 \n",
+ " 1.0 \n",
+ " 0.024926 \n",
+ " 63588.0 \n",
+ " 7.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
\n",
+ "
\n",
+ "
\n"
+ ],
+ "text/plain": [
+ " SeriousDlqin2yrs RevolvingUtilizationOfUnsecuredLines age \\\n",
+ "0 1 0.766127 45.0 \n",
+ "1 0 0.957151 40.0 \n",
+ "2 0 0.658180 38.0 \n",
+ "3 0 0.233810 30.0 \n",
+ "4 0 0.907239 49.0 \n",
+ "\n",
+ " NumberOfTime30-59DaysPastDueNotWorse DebtRatio MonthlyIncome \\\n",
+ "0 2.0 0.802982 9120.0 \n",
+ "1 0.0 0.121876 2600.0 \n",
+ "2 1.0 0.085113 3042.0 \n",
+ "3 0.0 0.036050 3300.0 \n",
+ "4 1.0 0.024926 63588.0 \n",
+ "\n",
+ " NumberOfOpenCreditLinesAndLoans NumberOfTimes90DaysLate \\\n",
+ "0 13.0 0.0 \n",
+ "1 4.0 0.0 \n",
+ "2 2.0 1.0 \n",
+ "3 5.0 0.0 \n",
+ "4 7.0 0.0 \n",
+ "\n",
+ " NumberRealEstateLoansOrLines NumberOfTime60-89DaysPastDueNotWorse \\\n",
+ "0 6.0 0.0 \n",
+ "1 0.0 0.0 \n",
+ "2 0.0 0.0 \n",
+ "3 0.0 0.0 \n",
+ "4 1.0 0.0 \n",
+ "\n",
+ " NumberOfDependents \n",
+ "0 2.0 \n",
+ "1 1.0 \n",
+ "2 0.0 \n",
+ "3 0.0 \n",
+ "4 0.0 "
+ ]
+ },
+ "execution_count": 5,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "import pandas as pd\n",
+ "pd.set_option('display.max_columns', 500)\n",
+ "import zipfile\n",
+ "with zipfile.ZipFile('KaggleCredit2.csv.zip', 'r') as z:\n",
+ " f = z.open('KaggleCredit2.csv')\n",
+ " data = pd.read_csv(f, index_col=0)\n",
+ "data.head()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "collapsed": true,
+ "id": "Mc4-fzrqMAqR",
+ "jupyter": {
+ "outputs_hidden": true
+ },
+ "outputId": "c2b92b9e-08e5-419a-f9bf-52fb4749efc6"
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(112915, 11)"
+ ]
+ },
+ "execution_count": 6,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# 检查数据维度\n",
+ "data.shape"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "collapsed": true,
+ "id": "IERweTeOMAqR",
+ "jupyter": {
+ "outputs_hidden": true
+ },
+ "outputId": "23a41874-77fe-4826-91be-23fd91235033"
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "SeriousDlqin2yrs 0\n",
+ "RevolvingUtilizationOfUnsecuredLines 0\n",
+ "age 4267\n",
+ "NumberOfTime30-59DaysPastDueNotWorse 0\n",
+ "DebtRatio 0\n",
+ "MonthlyIncome 0\n",
+ "NumberOfOpenCreditLinesAndLoans 0\n",
+ "NumberOfTimes90DaysLate 0\n",
+ "NumberRealEstateLoansOrLines 0\n",
+ "NumberOfTime60-89DaysPastDueNotWorse 0\n",
+ "NumberOfDependents 4267\n",
+ "dtype: int64"
+ ]
+ },
+ "execution_count": 7,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# 查看数据缺失值情况\n",
+ "data.isnull().sum(axis=0)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "collapsed": true,
+ "id": "4ar24HP0MAqR",
+ "jupyter": {
+ "outputs_hidden": true
+ },
+ "outputId": "2360108c-1766-4834-d100-b95f7f5ff050"
+ },
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ ":3: UserWarning: Pandas doesn't allow columns to be created via a new attribute name - see https://pandas.pydata.org/pandas-docs/stable/indexing.html#attribute-access\n",
+ " data.shapey = data['SeriousDlqin2yrs']\n"
+ ]
+ }
+ ],
+ "source": [
+ "# 清除缺失值\n",
+ "data.dropna(inplace=True)\n",
+ "data.shapey = data['SeriousDlqin2yrs']\n",
+ "X = data.drop('SeriousDlqin2yrs', axis=1)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "collapsed": true,
+ "id": "VzIwPYH4MAqS",
+ "jupyter": {
+ "outputs_hidden": true
+ },
+ "outputId": "b2d92fb4-ab15-4885-9722-25143a37e5eb"
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0.06742876076872101"
+ ]
+ },
+ "execution_count": 9,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# 取出对应的X和y\n",
+ "y = data['SeriousDlqin2yrs']\n",
+ "X = data.drop('SeriousDlqin2yrs', axis=1)\n",
+ "# 查看平均的欺诈率\n",
+ "y.mean()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "dFhSUyJhMAqS"
+ },
+ "source": [
+ "### 以下为操作题"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "j7Tr10JPMAqS"
+ },
+ "source": [
+ "#### 1.把数据切分成训练集和测试集"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "collapsed": true,
+ "id": "gI6ZAu6iMAqT",
+ "jupyter": {
+ "outputs_hidden": true
+ },
+ "outputId": "a64f0047-5c0a-4fd3-8703-6671037750cc"
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "((76053, 10), (32595, 10))"
+ ]
+ },
+ "execution_count": 10,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# 提示:查看train_test_split函数\n",
+ "from sklearn.model_selection import train_test_split\n",
+ "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)\n",
+ "\n",
+ "X_train.shape, X_test.shape"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 498
+ },
+ "collapsed": true,
+ "id": "N10VpYshMAqT",
+ "jupyter": {
+ "outputs_hidden": true
+ },
+ "outputId": "371d1bff-3d3d-4537-f771-9bc7a0656037"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "0 101322\n",
+ "1 7326\n",
+ "Name: SeriousDlqin2yrs, dtype: int64\n"
+ ]
+ },
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 11,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png": "",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "# 通过SeriousDlqin2yrs字段查看正负样本分布情况\n",
+ "# 提示:value_counts\n",
+ "print(y.value_counts())\n",
+ "\n",
+ "# 绘制两种类别的柱状图\n",
+ "# 提示:dataframe可以直接plot(kind='bar')\n",
+ "y.value_counts().plot(kind='bar')\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "73ubVTnSMAqU"
+ },
+ "source": [
+ "#### 2.数据预处理之离散化"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 615
+ },
+ "collapsed": true,
+ "id": "KM7rwQjOMAqU",
+ "jupyter": {
+ "outputs_hidden": true
+ },
+ "outputId": "6db15cd8-7764-4f9d-aa12-6eef880d6289"
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ " \n",
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " RevolvingUtilizationOfUnsecuredLines \n",
+ " age \n",
+ " NumberOfTime30-59DaysPastDueNotWorse \n",
+ " DebtRatio \n",
+ " MonthlyIncome \n",
+ " NumberOfOpenCreditLinesAndLoans \n",
+ " NumberOfTimes90DaysLate \n",
+ " NumberRealEstateLoansOrLines \n",
+ " NumberOfTime60-89DaysPastDueNotWorse \n",
+ " NumberOfDependents \n",
+ " age_class \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 87116 \n",
+ " 0.408449 \n",
+ " 54.0 \n",
+ " 0.0 \n",
+ " 0.206660 \n",
+ " 6696.0 \n",
+ " 20.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 2.0 \n",
+ " (51.0, 54.0] \n",
+ " \n",
+ " \n",
+ " 60949 \n",
+ " 0.236068 \n",
+ " 72.0 \n",
+ " 0.0 \n",
+ " 0.383361 \n",
+ " 13365.0 \n",
+ " 10.0 \n",
+ " 0.0 \n",
+ " 3.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " (69.0, 72.0] \n",
+ " \n",
+ " \n",
+ " 81875 \n",
+ " 0.029731 \n",
+ " 65.0 \n",
+ " 1.0 \n",
+ " 0.260219 \n",
+ " 7950.0 \n",
+ " 8.0 \n",
+ " 0.0 \n",
+ " 2.0 \n",
+ " 0.0 \n",
+ " 2.0 \n",
+ " (63.0, 66.0] \n",
+ " \n",
+ " \n",
+ " 48666 \n",
+ " 0.021520 \n",
+ " 49.0 \n",
+ " 0.0 \n",
+ " 0.548327 \n",
+ " 7500.0 \n",
+ " 12.0 \n",
+ " 0.0 \n",
+ " 2.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " (48.0, 51.0] \n",
+ " \n",
+ " \n",
+ " 56435 \n",
+ " 0.450862 \n",
+ " 24.0 \n",
+ " 0.0 \n",
+ " 0.941176 \n",
+ " 900.0 \n",
+ " 3.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " (21.0, 24.0] \n",
+ " \n",
+ " \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " \n",
+ " \n",
+ " 52062 \n",
+ " 0.015245 \n",
+ " 57.0 \n",
+ " 0.0 \n",
+ " 0.101078 \n",
+ " 6400.0 \n",
+ " 9.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " (54.0, 57.0] \n",
+ " \n",
+ " \n",
+ " 101903 \n",
+ " 0.026483 \n",
+ " 42.0 \n",
+ " 0.0 \n",
+ " 0.336710 \n",
+ " 6711.0 \n",
+ " 20.0 \n",
+ " 0.0 \n",
+ " 2.0 \n",
+ " 0.0 \n",
+ " 2.0 \n",
+ " (39.0, 42.0] \n",
+ " \n",
+ " \n",
+ " 5396 \n",
+ " 0.094593 \n",
+ " 49.0 \n",
+ " 0.0 \n",
+ " 0.098861 \n",
+ " 5269.0 \n",
+ " 8.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " (48.0, 51.0] \n",
+ " \n",
+ " \n",
+ " 80798 \n",
+ " 0.133383 \n",
+ " 55.0 \n",
+ " 0.0 \n",
+ " 0.477288 \n",
+ " 10500.0 \n",
+ " 19.0 \n",
+ " 0.0 \n",
+ " 3.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " (54.0, 57.0] \n",
+ " \n",
+ " \n",
+ " 102407 \n",
+ " 0.480536 \n",
+ " 49.0 \n",
+ " 0.0 \n",
+ " 0.636136 \n",
+ " 10000.0 \n",
+ " 8.0 \n",
+ " 0.0 \n",
+ " 2.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " (48.0, 51.0] \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
76053 rows × 11 columns
\n",
+ "
\n",
+ "
\n",
+ "
\n"
+ ],
+ "text/plain": [
+ " RevolvingUtilizationOfUnsecuredLines age \\\n",
+ "87116 0.408449 54.0 \n",
+ "60949 0.236068 72.0 \n",
+ "81875 0.029731 65.0 \n",
+ "48666 0.021520 49.0 \n",
+ "56435 0.450862 24.0 \n",
+ "... ... ... \n",
+ "52062 0.015245 57.0 \n",
+ "101903 0.026483 42.0 \n",
+ "5396 0.094593 49.0 \n",
+ "80798 0.133383 55.0 \n",
+ "102407 0.480536 49.0 \n",
+ "\n",
+ " NumberOfTime30-59DaysPastDueNotWorse DebtRatio MonthlyIncome \\\n",
+ "87116 0.0 0.206660 6696.0 \n",
+ "60949 0.0 0.383361 13365.0 \n",
+ "81875 1.0 0.260219 7950.0 \n",
+ "48666 0.0 0.548327 7500.0 \n",
+ "56435 0.0 0.941176 900.0 \n",
+ "... ... ... ... \n",
+ "52062 0.0 0.101078 6400.0 \n",
+ "101903 0.0 0.336710 6711.0 \n",
+ "5396 0.0 0.098861 5269.0 \n",
+ "80798 0.0 0.477288 10500.0 \n",
+ "102407 0.0 0.636136 10000.0 \n",
+ "\n",
+ " NumberOfOpenCreditLinesAndLoans NumberOfTimes90DaysLate \\\n",
+ "87116 20.0 0.0 \n",
+ "60949 10.0 0.0 \n",
+ "81875 8.0 0.0 \n",
+ "48666 12.0 0.0 \n",
+ "56435 3.0 0.0 \n",
+ "... ... ... \n",
+ "52062 9.0 0.0 \n",
+ "101903 20.0 0.0 \n",
+ "5396 8.0 0.0 \n",
+ "80798 19.0 0.0 \n",
+ "102407 8.0 0.0 \n",
+ "\n",
+ " NumberRealEstateLoansOrLines NumberOfTime60-89DaysPastDueNotWorse \\\n",
+ "87116 0.0 0.0 \n",
+ "60949 3.0 0.0 \n",
+ "81875 2.0 0.0 \n",
+ "48666 2.0 0.0 \n",
+ "56435 0.0 0.0 \n",
+ "... ... ... \n",
+ "52062 0.0 0.0 \n",
+ "101903 2.0 0.0 \n",
+ "5396 0.0 0.0 \n",
+ "80798 3.0 0.0 \n",
+ "102407 2.0 0.0 \n",
+ "\n",
+ " NumberOfDependents age_class \n",
+ "87116 2.0 (51.0, 54.0] \n",
+ "60949 0.0 (69.0, 72.0] \n",
+ "81875 2.0 (63.0, 66.0] \n",
+ "48666 0.0 (48.0, 51.0] \n",
+ "56435 0.0 (21.0, 24.0] \n",
+ "... ... ... \n",
+ "52062 1.0 (54.0, 57.0] \n",
+ "101903 2.0 (39.0, 42.0] \n",
+ "5396 0.0 (48.0, 51.0] \n",
+ "80798 0.0 (54.0, 57.0] \n",
+ "102407 0.0 (48.0, 51.0] \n",
+ "\n",
+ "[76053 rows x 11 columns]"
+ ]
+ },
+ "execution_count": 12,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# 请对年龄按照3岁一个区间进行离散化\n",
+ "# 提示:可以先计算出分桶边界,再基于pandas的cut函数进行离散化(分箱、分桶)\n",
+ "# 确定age的最小值和最大值\n",
+ "age_min = int(X_train['age'].min())\n",
+ "age_max = int(X_train['age'].max())\n",
+ "\n",
+ "# 创建区间边界\n",
+ "bins = list(range(age_min, age_max + 3, 3))\n",
+ "\n",
+ "# 使用cut函数进行离散化\n",
+ "X_train['age_class'] = pd.cut(X_train.age, bins=bins, include_lowest=True)\n",
+ "X_train"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 13,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 615
+ },
+ "id": "ZC-ztuOkXjyT",
+ "outputId": "c3d292aa-ba62-4e4c-98b1-0ba0cf75978e"
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ " \n",
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " RevolvingUtilizationOfUnsecuredLines \n",
+ " age \n",
+ " NumberOfTime30-59DaysPastDueNotWorse \n",
+ " DebtRatio \n",
+ " MonthlyIncome \n",
+ " NumberOfOpenCreditLinesAndLoans \n",
+ " NumberOfTimes90DaysLate \n",
+ " NumberRealEstateLoansOrLines \n",
+ " NumberOfTime60-89DaysPastDueNotWorse \n",
+ " NumberOfDependents \n",
+ " age_class \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 94876 \n",
+ " 0.110716 \n",
+ " 71.0 \n",
+ " 0.0 \n",
+ " 0.379946 \n",
+ " 7100.0 \n",
+ " 19.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " (69.0, 72.0] \n",
+ " \n",
+ " \n",
+ " 74492 \n",
+ " 0.022211 \n",
+ " 58.0 \n",
+ " 0.0 \n",
+ " 0.266376 \n",
+ " 11250.0 \n",
+ " 10.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 2.0 \n",
+ " (57.0, 60.0] \n",
+ " \n",
+ " \n",
+ " 48991 \n",
+ " 0.995334 \n",
+ " 54.0 \n",
+ " 0.0 \n",
+ " 0.229102 \n",
+ " 2583.0 \n",
+ " 3.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 3.0 \n",
+ " (51.0, 54.0] \n",
+ " \n",
+ " \n",
+ " 109765 \n",
+ " 0.012314 \n",
+ " 80.0 \n",
+ " 0.0 \n",
+ " 0.002997 \n",
+ " 1000.0 \n",
+ " 2.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " (78.0, 81.0] \n",
+ " \n",
+ " \n",
+ " 16286 \n",
+ " 0.671940 \n",
+ " 50.0 \n",
+ " 0.0 \n",
+ " 0.316112 \n",
+ " 15000.0 \n",
+ " 15.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 2.0 \n",
+ " (48.0, 51.0] \n",
+ " \n",
+ " \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " \n",
+ " \n",
+ " 105990 \n",
+ " 0.432024 \n",
+ " 74.0 \n",
+ " 0.0 \n",
+ " 0.732508 \n",
+ " 2100.0 \n",
+ " 10.0 \n",
+ " 0.0 \n",
+ " 2.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " (72.0, 75.0] \n",
+ " \n",
+ " \n",
+ " 31564 \n",
+ " 1.000000 \n",
+ " 51.0 \n",
+ " 0.0 \n",
+ " 0.293369 \n",
+ " 4086.0 \n",
+ " 7.0 \n",
+ " 2.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " (48.0, 51.0] \n",
+ " \n",
+ " \n",
+ " 47890 \n",
+ " 0.728543 \n",
+ " 60.0 \n",
+ " 1.0 \n",
+ " 0.305331 \n",
+ " 3094.0 \n",
+ " 5.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " (57.0, 60.0] \n",
+ " \n",
+ " \n",
+ " 102980 \n",
+ " 0.056936 \n",
+ " 62.0 \n",
+ " 0.0 \n",
+ " 0.173340 \n",
+ " 22700.0 \n",
+ " 7.0 \n",
+ " 0.0 \n",
+ " 2.0 \n",
+ " 0.0 \n",
+ " 2.0 \n",
+ " (60.0, 63.0] \n",
+ " \n",
+ " \n",
+ " 81410 \n",
+ " 0.319929 \n",
+ " 72.0 \n",
+ " 0.0 \n",
+ " 0.397048 \n",
+ " 4200.0 \n",
+ " 7.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " (69.0, 72.0] \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
32595 rows × 11 columns
\n",
+ "
\n",
+ "
\n",
+ "
\n"
+ ],
+ "text/plain": [
+ " RevolvingUtilizationOfUnsecuredLines age \\\n",
+ "94876 0.110716 71.0 \n",
+ "74492 0.022211 58.0 \n",
+ "48991 0.995334 54.0 \n",
+ "109765 0.012314 80.0 \n",
+ "16286 0.671940 50.0 \n",
+ "... ... ... \n",
+ "105990 0.432024 74.0 \n",
+ "31564 1.000000 51.0 \n",
+ "47890 0.728543 60.0 \n",
+ "102980 0.056936 62.0 \n",
+ "81410 0.319929 72.0 \n",
+ "\n",
+ " NumberOfTime30-59DaysPastDueNotWorse DebtRatio MonthlyIncome \\\n",
+ "94876 0.0 0.379946 7100.0 \n",
+ "74492 0.0 0.266376 11250.0 \n",
+ "48991 0.0 0.229102 2583.0 \n",
+ "109765 0.0 0.002997 1000.0 \n",
+ "16286 0.0 0.316112 15000.0 \n",
+ "... ... ... ... \n",
+ "105990 0.0 0.732508 2100.0 \n",
+ "31564 0.0 0.293369 4086.0 \n",
+ "47890 1.0 0.305331 3094.0 \n",
+ "102980 0.0 0.173340 22700.0 \n",
+ "81410 0.0 0.397048 4200.0 \n",
+ "\n",
+ " NumberOfOpenCreditLinesAndLoans NumberOfTimes90DaysLate \\\n",
+ "94876 19.0 0.0 \n",
+ "74492 10.0 0.0 \n",
+ "48991 3.0 0.0 \n",
+ "109765 2.0 0.0 \n",
+ "16286 15.0 0.0 \n",
+ "... ... ... \n",
+ "105990 10.0 0.0 \n",
+ "31564 7.0 2.0 \n",
+ "47890 5.0 0.0 \n",
+ "102980 7.0 0.0 \n",
+ "81410 7.0 0.0 \n",
+ "\n",
+ " NumberRealEstateLoansOrLines NumberOfTime60-89DaysPastDueNotWorse \\\n",
+ "94876 1.0 0.0 \n",
+ "74492 1.0 0.0 \n",
+ "48991 1.0 0.0 \n",
+ "109765 0.0 0.0 \n",
+ "16286 1.0 0.0 \n",
+ "... ... ... \n",
+ "105990 2.0 0.0 \n",
+ "31564 0.0 0.0 \n",
+ "47890 1.0 0.0 \n",
+ "102980 2.0 0.0 \n",
+ "81410 1.0 0.0 \n",
+ "\n",
+ " NumberOfDependents age_class \n",
+ "94876 0.0 (69.0, 72.0] \n",
+ "74492 2.0 (57.0, 60.0] \n",
+ "48991 3.0 (51.0, 54.0] \n",
+ "109765 1.0 (78.0, 81.0] \n",
+ "16286 2.0 (48.0, 51.0] \n",
+ "... ... ... \n",
+ "105990 0.0 (72.0, 75.0] \n",
+ "31564 0.0 (48.0, 51.0] \n",
+ "47890 0.0 (57.0, 60.0] \n",
+ "102980 2.0 (60.0, 63.0] \n",
+ "81410 0.0 (69.0, 72.0] \n",
+ "\n",
+ "[32595 rows x 11 columns]"
+ ]
+ },
+ "execution_count": 13,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "\n",
+ "X_test['age_class'] = pd.cut(X_test.age, bins=bins, include_lowest=True)\n",
+ "X_test"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "g4DQ_hVfMAqU"
+ },
+ "source": [
+ "#### 3.数据预处理之独热向量编码"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 14,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 461
+ },
+ "collapsed": true,
+ "id": "pmcPB4drMAqU",
+ "jupyter": {
+ "outputs_hidden": true
+ },
+ "outputId": "dbaf9e7e-9947-4c04-b20e-de00b66cea9e"
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ " \n",
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " RevolvingUtilizationOfUnsecuredLines \n",
+ " NumberOfTime30-59DaysPastDueNotWorse \n",
+ " DebtRatio \n",
+ " MonthlyIncome \n",
+ " NumberOfOpenCreditLinesAndLoans \n",
+ " NumberOfTimes90DaysLate \n",
+ " NumberRealEstateLoansOrLines \n",
+ " NumberOfTime60-89DaysPastDueNotWorse \n",
+ " NumberOfDependents \n",
+ " age_class_(-0.001, 3.0] \n",
+ " age_class_(3.0, 6.0] \n",
+ " age_class_(6.0, 9.0] \n",
+ " age_class_(9.0, 12.0] \n",
+ " age_class_(12.0, 15.0] \n",
+ " age_class_(15.0, 18.0] \n",
+ " age_class_(18.0, 21.0] \n",
+ " age_class_(21.0, 24.0] \n",
+ " age_class_(24.0, 27.0] \n",
+ " age_class_(27.0, 30.0] \n",
+ " age_class_(30.0, 33.0] \n",
+ " age_class_(33.0, 36.0] \n",
+ " age_class_(36.0, 39.0] \n",
+ " age_class_(39.0, 42.0] \n",
+ " age_class_(42.0, 45.0] \n",
+ " age_class_(45.0, 48.0] \n",
+ " age_class_(48.0, 51.0] \n",
+ " age_class_(51.0, 54.0] \n",
+ " age_class_(54.0, 57.0] \n",
+ " age_class_(57.0, 60.0] \n",
+ " age_class_(60.0, 63.0] \n",
+ " age_class_(63.0, 66.0] \n",
+ " age_class_(66.0, 69.0] \n",
+ " age_class_(69.0, 72.0] \n",
+ " age_class_(72.0, 75.0] \n",
+ " age_class_(75.0, 78.0] \n",
+ " age_class_(78.0, 81.0] \n",
+ " age_class_(81.0, 84.0] \n",
+ " age_class_(84.0, 87.0] \n",
+ " age_class_(87.0, 90.0] \n",
+ " age_class_(90.0, 93.0] \n",
+ " age_class_(93.0, 96.0] \n",
+ " age_class_(96.0, 99.0] \n",
+ " age_class_(99.0, 102.0] \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 87116 \n",
+ " 0.408449 \n",
+ " 0.0 \n",
+ " 0.206660 \n",
+ " 6696.0 \n",
+ " 20.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 2.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 60949 \n",
+ " 0.236068 \n",
+ " 0.0 \n",
+ " 0.383361 \n",
+ " 13365.0 \n",
+ " 10.0 \n",
+ " 0.0 \n",
+ " 3.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 81875 \n",
+ " 0.029731 \n",
+ " 1.0 \n",
+ " 0.260219 \n",
+ " 7950.0 \n",
+ " 8.0 \n",
+ " 0.0 \n",
+ " 2.0 \n",
+ " 0.0 \n",
+ " 2.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 48666 \n",
+ " 0.021520 \n",
+ " 0.0 \n",
+ " 0.548327 \n",
+ " 7500.0 \n",
+ " 12.0 \n",
+ " 0.0 \n",
+ " 2.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 56435 \n",
+ " 0.450862 \n",
+ " 0.0 \n",
+ " 0.941176 \n",
+ " 900.0 \n",
+ " 3.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " \n",
+ " \n",
+ " 52062 \n",
+ " 0.015245 \n",
+ " 0.0 \n",
+ " 0.101078 \n",
+ " 6400.0 \n",
+ " 9.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 101903 \n",
+ " 0.026483 \n",
+ " 0.0 \n",
+ " 0.336710 \n",
+ " 6711.0 \n",
+ " 20.0 \n",
+ " 0.0 \n",
+ " 2.0 \n",
+ " 0.0 \n",
+ " 2.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 5396 \n",
+ " 0.094593 \n",
+ " 0.0 \n",
+ " 0.098861 \n",
+ " 5269.0 \n",
+ " 8.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 80798 \n",
+ " 0.133383 \n",
+ " 0.0 \n",
+ " 0.477288 \n",
+ " 10500.0 \n",
+ " 19.0 \n",
+ " 0.0 \n",
+ " 3.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 102407 \n",
+ " 0.480536 \n",
+ " 0.0 \n",
+ " 0.636136 \n",
+ " 10000.0 \n",
+ " 8.0 \n",
+ " 0.0 \n",
+ " 2.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
76053 rows × 43 columns
\n",
+ "
\n",
+ "
\n",
+ "
\n"
+ ],
+ "text/plain": [
+ " RevolvingUtilizationOfUnsecuredLines \\\n",
+ "87116 0.408449 \n",
+ "60949 0.236068 \n",
+ "81875 0.029731 \n",
+ "48666 0.021520 \n",
+ "56435 0.450862 \n",
+ "... ... \n",
+ "52062 0.015245 \n",
+ "101903 0.026483 \n",
+ "5396 0.094593 \n",
+ "80798 0.133383 \n",
+ "102407 0.480536 \n",
+ "\n",
+ " NumberOfTime30-59DaysPastDueNotWorse DebtRatio MonthlyIncome \\\n",
+ "87116 0.0 0.206660 6696.0 \n",
+ "60949 0.0 0.383361 13365.0 \n",
+ "81875 1.0 0.260219 7950.0 \n",
+ "48666 0.0 0.548327 7500.0 \n",
+ "56435 0.0 0.941176 900.0 \n",
+ "... ... ... ... \n",
+ "52062 0.0 0.101078 6400.0 \n",
+ "101903 0.0 0.336710 6711.0 \n",
+ "5396 0.0 0.098861 5269.0 \n",
+ "80798 0.0 0.477288 10500.0 \n",
+ "102407 0.0 0.636136 10000.0 \n",
+ "\n",
+ " NumberOfOpenCreditLinesAndLoans NumberOfTimes90DaysLate \\\n",
+ "87116 20.0 0.0 \n",
+ "60949 10.0 0.0 \n",
+ "81875 8.0 0.0 \n",
+ "48666 12.0 0.0 \n",
+ "56435 3.0 0.0 \n",
+ "... ... ... \n",
+ "52062 9.0 0.0 \n",
+ "101903 20.0 0.0 \n",
+ "5396 8.0 0.0 \n",
+ "80798 19.0 0.0 \n",
+ "102407 8.0 0.0 \n",
+ "\n",
+ " NumberRealEstateLoansOrLines NumberOfTime60-89DaysPastDueNotWorse \\\n",
+ "87116 0.0 0.0 \n",
+ "60949 3.0 0.0 \n",
+ "81875 2.0 0.0 \n",
+ "48666 2.0 0.0 \n",
+ "56435 0.0 0.0 \n",
+ "... ... ... \n",
+ "52062 0.0 0.0 \n",
+ "101903 2.0 0.0 \n",
+ "5396 0.0 0.0 \n",
+ "80798 3.0 0.0 \n",
+ "102407 2.0 0.0 \n",
+ "\n",
+ " NumberOfDependents age_class_(-0.001, 3.0] age_class_(3.0, 6.0] \\\n",
+ "87116 2.0 0.0 0.0 \n",
+ "60949 0.0 0.0 0.0 \n",
+ "81875 2.0 0.0 0.0 \n",
+ "48666 0.0 0.0 0.0 \n",
+ "56435 0.0 0.0 0.0 \n",
+ "... ... ... ... \n",
+ "52062 1.0 0.0 0.0 \n",
+ "101903 2.0 0.0 0.0 \n",
+ "5396 0.0 0.0 0.0 \n",
+ "80798 0.0 0.0 0.0 \n",
+ "102407 0.0 0.0 0.0 \n",
+ "\n",
+ " age_class_(6.0, 9.0] age_class_(9.0, 12.0] age_class_(12.0, 15.0] \\\n",
+ "87116 0.0 0.0 0.0 \n",
+ "60949 0.0 0.0 0.0 \n",
+ "81875 0.0 0.0 0.0 \n",
+ "48666 0.0 0.0 0.0 \n",
+ "56435 0.0 0.0 0.0 \n",
+ "... ... ... ... \n",
+ "52062 0.0 0.0 0.0 \n",
+ "101903 0.0 0.0 0.0 \n",
+ "5396 0.0 0.0 0.0 \n",
+ "80798 0.0 0.0 0.0 \n",
+ "102407 0.0 0.0 0.0 \n",
+ "\n",
+ " age_class_(15.0, 18.0] age_class_(18.0, 21.0] \\\n",
+ "87116 0.0 0.0 \n",
+ "60949 0.0 0.0 \n",
+ "81875 0.0 0.0 \n",
+ "48666 0.0 0.0 \n",
+ "56435 0.0 0.0 \n",
+ "... ... ... \n",
+ "52062 0.0 0.0 \n",
+ "101903 0.0 0.0 \n",
+ "5396 0.0 0.0 \n",
+ "80798 0.0 0.0 \n",
+ "102407 0.0 0.0 \n",
+ "\n",
+ " age_class_(21.0, 24.0] age_class_(24.0, 27.0] \\\n",
+ "87116 0.0 0.0 \n",
+ "60949 0.0 0.0 \n",
+ "81875 0.0 0.0 \n",
+ "48666 0.0 0.0 \n",
+ "56435 1.0 0.0 \n",
+ "... ... ... \n",
+ "52062 0.0 0.0 \n",
+ "101903 0.0 0.0 \n",
+ "5396 0.0 0.0 \n",
+ "80798 0.0 0.0 \n",
+ "102407 0.0 0.0 \n",
+ "\n",
+ " age_class_(27.0, 30.0] age_class_(30.0, 33.0] \\\n",
+ "87116 0.0 0.0 \n",
+ "60949 0.0 0.0 \n",
+ "81875 0.0 0.0 \n",
+ "48666 0.0 0.0 \n",
+ "56435 0.0 0.0 \n",
+ "... ... ... \n",
+ "52062 0.0 0.0 \n",
+ "101903 0.0 0.0 \n",
+ "5396 0.0 0.0 \n",
+ "80798 0.0 0.0 \n",
+ "102407 0.0 0.0 \n",
+ "\n",
+ " age_class_(33.0, 36.0] age_class_(36.0, 39.0] \\\n",
+ "87116 0.0 0.0 \n",
+ "60949 0.0 0.0 \n",
+ "81875 0.0 0.0 \n",
+ "48666 0.0 0.0 \n",
+ "56435 0.0 0.0 \n",
+ "... ... ... \n",
+ "52062 0.0 0.0 \n",
+ "101903 0.0 0.0 \n",
+ "5396 0.0 0.0 \n",
+ "80798 0.0 0.0 \n",
+ "102407 0.0 0.0 \n",
+ "\n",
+ " age_class_(39.0, 42.0] age_class_(42.0, 45.0] \\\n",
+ "87116 0.0 0.0 \n",
+ "60949 0.0 0.0 \n",
+ "81875 0.0 0.0 \n",
+ "48666 0.0 0.0 \n",
+ "56435 0.0 0.0 \n",
+ "... ... ... \n",
+ "52062 0.0 0.0 \n",
+ "101903 1.0 0.0 \n",
+ "5396 0.0 0.0 \n",
+ "80798 0.0 0.0 \n",
+ "102407 0.0 0.0 \n",
+ "\n",
+ " age_class_(45.0, 48.0] age_class_(48.0, 51.0] \\\n",
+ "87116 0.0 0.0 \n",
+ "60949 0.0 0.0 \n",
+ "81875 0.0 0.0 \n",
+ "48666 0.0 1.0 \n",
+ "56435 0.0 0.0 \n",
+ "... ... ... \n",
+ "52062 0.0 0.0 \n",
+ "101903 0.0 0.0 \n",
+ "5396 0.0 1.0 \n",
+ "80798 0.0 0.0 \n",
+ "102407 0.0 1.0 \n",
+ "\n",
+ " age_class_(51.0, 54.0] age_class_(54.0, 57.0] \\\n",
+ "87116 1.0 0.0 \n",
+ "60949 0.0 0.0 \n",
+ "81875 0.0 0.0 \n",
+ "48666 0.0 0.0 \n",
+ "56435 0.0 0.0 \n",
+ "... ... ... \n",
+ "52062 0.0 1.0 \n",
+ "101903 0.0 0.0 \n",
+ "5396 0.0 0.0 \n",
+ "80798 0.0 1.0 \n",
+ "102407 0.0 0.0 \n",
+ "\n",
+ " age_class_(57.0, 60.0] age_class_(60.0, 63.0] \\\n",
+ "87116 0.0 0.0 \n",
+ "60949 0.0 0.0 \n",
+ "81875 0.0 0.0 \n",
+ "48666 0.0 0.0 \n",
+ "56435 0.0 0.0 \n",
+ "... ... ... \n",
+ "52062 0.0 0.0 \n",
+ "101903 0.0 0.0 \n",
+ "5396 0.0 0.0 \n",
+ "80798 0.0 0.0 \n",
+ "102407 0.0 0.0 \n",
+ "\n",
+ " age_class_(63.0, 66.0] age_class_(66.0, 69.0] \\\n",
+ "87116 0.0 0.0 \n",
+ "60949 0.0 0.0 \n",
+ "81875 1.0 0.0 \n",
+ "48666 0.0 0.0 \n",
+ "56435 0.0 0.0 \n",
+ "... ... ... \n",
+ "52062 0.0 0.0 \n",
+ "101903 0.0 0.0 \n",
+ "5396 0.0 0.0 \n",
+ "80798 0.0 0.0 \n",
+ "102407 0.0 0.0 \n",
+ "\n",
+ " age_class_(69.0, 72.0] age_class_(72.0, 75.0] \\\n",
+ "87116 0.0 0.0 \n",
+ "60949 1.0 0.0 \n",
+ "81875 0.0 0.0 \n",
+ "48666 0.0 0.0 \n",
+ "56435 0.0 0.0 \n",
+ "... ... ... \n",
+ "52062 0.0 0.0 \n",
+ "101903 0.0 0.0 \n",
+ "5396 0.0 0.0 \n",
+ "80798 0.0 0.0 \n",
+ "102407 0.0 0.0 \n",
+ "\n",
+ " age_class_(75.0, 78.0] age_class_(78.0, 81.0] \\\n",
+ "87116 0.0 0.0 \n",
+ "60949 0.0 0.0 \n",
+ "81875 0.0 0.0 \n",
+ "48666 0.0 0.0 \n",
+ "56435 0.0 0.0 \n",
+ "... ... ... \n",
+ "52062 0.0 0.0 \n",
+ "101903 0.0 0.0 \n",
+ "5396 0.0 0.0 \n",
+ "80798 0.0 0.0 \n",
+ "102407 0.0 0.0 \n",
+ "\n",
+ " age_class_(81.0, 84.0] age_class_(84.0, 87.0] \\\n",
+ "87116 0.0 0.0 \n",
+ "60949 0.0 0.0 \n",
+ "81875 0.0 0.0 \n",
+ "48666 0.0 0.0 \n",
+ "56435 0.0 0.0 \n",
+ "... ... ... \n",
+ "52062 0.0 0.0 \n",
+ "101903 0.0 0.0 \n",
+ "5396 0.0 0.0 \n",
+ "80798 0.0 0.0 \n",
+ "102407 0.0 0.0 \n",
+ "\n",
+ " age_class_(87.0, 90.0] age_class_(90.0, 93.0] \\\n",
+ "87116 0.0 0.0 \n",
+ "60949 0.0 0.0 \n",
+ "81875 0.0 0.0 \n",
+ "48666 0.0 0.0 \n",
+ "56435 0.0 0.0 \n",
+ "... ... ... \n",
+ "52062 0.0 0.0 \n",
+ "101903 0.0 0.0 \n",
+ "5396 0.0 0.0 \n",
+ "80798 0.0 0.0 \n",
+ "102407 0.0 0.0 \n",
+ "\n",
+ " age_class_(93.0, 96.0] age_class_(96.0, 99.0] \\\n",
+ "87116 0.0 0.0 \n",
+ "60949 0.0 0.0 \n",
+ "81875 0.0 0.0 \n",
+ "48666 0.0 0.0 \n",
+ "56435 0.0 0.0 \n",
+ "... ... ... \n",
+ "52062 0.0 0.0 \n",
+ "101903 0.0 0.0 \n",
+ "5396 0.0 0.0 \n",
+ "80798 0.0 0.0 \n",
+ "102407 0.0 0.0 \n",
+ "\n",
+ " age_class_(99.0, 102.0] \n",
+ "87116 0.0 \n",
+ "60949 0.0 \n",
+ "81875 0.0 \n",
+ "48666 0.0 \n",
+ "56435 0.0 \n",
+ "... ... \n",
+ "52062 0.0 \n",
+ "101903 0.0 \n",
+ "5396 0.0 \n",
+ "80798 0.0 \n",
+ "102407 0.0 \n",
+ "\n",
+ "[76053 rows x 43 columns]"
+ ]
+ },
+ "execution_count": 14,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# 请对上述分箱后的年龄段进行独热向量编码\n",
+ "# 提示:使用pandas的get_dummies完成\n",
+ "X_train = pd.get_dummies(X_train, dtype='float')\n",
+ "X_train = X_train.drop('age', axis=1)\n",
+ "X_train"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 15,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 461
+ },
+ "id": "vX-1iSNdWsQZ",
+ "outputId": "f771950f-0e56-4d3a-9ab9-2f9b410e59b7"
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ " \n",
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " RevolvingUtilizationOfUnsecuredLines \n",
+ " NumberOfTime30-59DaysPastDueNotWorse \n",
+ " DebtRatio \n",
+ " MonthlyIncome \n",
+ " NumberOfOpenCreditLinesAndLoans \n",
+ " NumberOfTimes90DaysLate \n",
+ " NumberRealEstateLoansOrLines \n",
+ " NumberOfTime60-89DaysPastDueNotWorse \n",
+ " NumberOfDependents \n",
+ " age_class_(-0.001, 3.0] \n",
+ " age_class_(3.0, 6.0] \n",
+ " age_class_(6.0, 9.0] \n",
+ " age_class_(9.0, 12.0] \n",
+ " age_class_(12.0, 15.0] \n",
+ " age_class_(15.0, 18.0] \n",
+ " age_class_(18.0, 21.0] \n",
+ " age_class_(21.0, 24.0] \n",
+ " age_class_(24.0, 27.0] \n",
+ " age_class_(27.0, 30.0] \n",
+ " age_class_(30.0, 33.0] \n",
+ " age_class_(33.0, 36.0] \n",
+ " age_class_(36.0, 39.0] \n",
+ " age_class_(39.0, 42.0] \n",
+ " age_class_(42.0, 45.0] \n",
+ " age_class_(45.0, 48.0] \n",
+ " age_class_(48.0, 51.0] \n",
+ " age_class_(51.0, 54.0] \n",
+ " age_class_(54.0, 57.0] \n",
+ " age_class_(57.0, 60.0] \n",
+ " age_class_(60.0, 63.0] \n",
+ " age_class_(63.0, 66.0] \n",
+ " age_class_(66.0, 69.0] \n",
+ " age_class_(69.0, 72.0] \n",
+ " age_class_(72.0, 75.0] \n",
+ " age_class_(75.0, 78.0] \n",
+ " age_class_(78.0, 81.0] \n",
+ " age_class_(81.0, 84.0] \n",
+ " age_class_(84.0, 87.0] \n",
+ " age_class_(87.0, 90.0] \n",
+ " age_class_(90.0, 93.0] \n",
+ " age_class_(93.0, 96.0] \n",
+ " age_class_(96.0, 99.0] \n",
+ " age_class_(99.0, 102.0] \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 94876 \n",
+ " 0.110716 \n",
+ " 0.0 \n",
+ " 0.379946 \n",
+ " 7100.0 \n",
+ " 19.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 74492 \n",
+ " 0.022211 \n",
+ " 0.0 \n",
+ " 0.266376 \n",
+ " 11250.0 \n",
+ " 10.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 2.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 48991 \n",
+ " 0.995334 \n",
+ " 0.0 \n",
+ " 0.229102 \n",
+ " 2583.0 \n",
+ " 3.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 3.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 109765 \n",
+ " 0.012314 \n",
+ " 0.0 \n",
+ " 0.002997 \n",
+ " 1000.0 \n",
+ " 2.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 16286 \n",
+ " 0.671940 \n",
+ " 0.0 \n",
+ " 0.316112 \n",
+ " 15000.0 \n",
+ " 15.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 2.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " \n",
+ " \n",
+ " 105990 \n",
+ " 0.432024 \n",
+ " 0.0 \n",
+ " 0.732508 \n",
+ " 2100.0 \n",
+ " 10.0 \n",
+ " 0.0 \n",
+ " 2.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 31564 \n",
+ " 1.000000 \n",
+ " 0.0 \n",
+ " 0.293369 \n",
+ " 4086.0 \n",
+ " 7.0 \n",
+ " 2.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 47890 \n",
+ " 0.728543 \n",
+ " 1.0 \n",
+ " 0.305331 \n",
+ " 3094.0 \n",
+ " 5.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 102980 \n",
+ " 0.056936 \n",
+ " 0.0 \n",
+ " 0.173340 \n",
+ " 22700.0 \n",
+ " 7.0 \n",
+ " 0.0 \n",
+ " 2.0 \n",
+ " 0.0 \n",
+ " 2.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 81410 \n",
+ " 0.319929 \n",
+ " 0.0 \n",
+ " 0.397048 \n",
+ " 4200.0 \n",
+ " 7.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
32595 rows × 43 columns
\n",
+ "
\n",
+ "
\n",
+ "
\n"
+ ],
+ "text/plain": [
+ " RevolvingUtilizationOfUnsecuredLines \\\n",
+ "94876 0.110716 \n",
+ "74492 0.022211 \n",
+ "48991 0.995334 \n",
+ "109765 0.012314 \n",
+ "16286 0.671940 \n",
+ "... ... \n",
+ "105990 0.432024 \n",
+ "31564 1.000000 \n",
+ "47890 0.728543 \n",
+ "102980 0.056936 \n",
+ "81410 0.319929 \n",
+ "\n",
+ " NumberOfTime30-59DaysPastDueNotWorse DebtRatio MonthlyIncome \\\n",
+ "94876 0.0 0.379946 7100.0 \n",
+ "74492 0.0 0.266376 11250.0 \n",
+ "48991 0.0 0.229102 2583.0 \n",
+ "109765 0.0 0.002997 1000.0 \n",
+ "16286 0.0 0.316112 15000.0 \n",
+ "... ... ... ... \n",
+ "105990 0.0 0.732508 2100.0 \n",
+ "31564 0.0 0.293369 4086.0 \n",
+ "47890 1.0 0.305331 3094.0 \n",
+ "102980 0.0 0.173340 22700.0 \n",
+ "81410 0.0 0.397048 4200.0 \n",
+ "\n",
+ " NumberOfOpenCreditLinesAndLoans NumberOfTimes90DaysLate \\\n",
+ "94876 19.0 0.0 \n",
+ "74492 10.0 0.0 \n",
+ "48991 3.0 0.0 \n",
+ "109765 2.0 0.0 \n",
+ "16286 15.0 0.0 \n",
+ "... ... ... \n",
+ "105990 10.0 0.0 \n",
+ "31564 7.0 2.0 \n",
+ "47890 5.0 0.0 \n",
+ "102980 7.0 0.0 \n",
+ "81410 7.0 0.0 \n",
+ "\n",
+ " NumberRealEstateLoansOrLines NumberOfTime60-89DaysPastDueNotWorse \\\n",
+ "94876 1.0 0.0 \n",
+ "74492 1.0 0.0 \n",
+ "48991 1.0 0.0 \n",
+ "109765 0.0 0.0 \n",
+ "16286 1.0 0.0 \n",
+ "... ... ... \n",
+ "105990 2.0 0.0 \n",
+ "31564 0.0 0.0 \n",
+ "47890 1.0 0.0 \n",
+ "102980 2.0 0.0 \n",
+ "81410 1.0 0.0 \n",
+ "\n",
+ " NumberOfDependents age_class_(-0.001, 3.0] age_class_(3.0, 6.0] \\\n",
+ "94876 0.0 0.0 0.0 \n",
+ "74492 2.0 0.0 0.0 \n",
+ "48991 3.0 0.0 0.0 \n",
+ "109765 1.0 0.0 0.0 \n",
+ "16286 2.0 0.0 0.0 \n",
+ "... ... ... ... \n",
+ "105990 0.0 0.0 0.0 \n",
+ "31564 0.0 0.0 0.0 \n",
+ "47890 0.0 0.0 0.0 \n",
+ "102980 2.0 0.0 0.0 \n",
+ "81410 0.0 0.0 0.0 \n",
+ "\n",
+ " age_class_(6.0, 9.0] age_class_(9.0, 12.0] age_class_(12.0, 15.0] \\\n",
+ "94876 0.0 0.0 0.0 \n",
+ "74492 0.0 0.0 0.0 \n",
+ "48991 0.0 0.0 0.0 \n",
+ "109765 0.0 0.0 0.0 \n",
+ "16286 0.0 0.0 0.0 \n",
+ "... ... ... ... \n",
+ "105990 0.0 0.0 0.0 \n",
+ "31564 0.0 0.0 0.0 \n",
+ "47890 0.0 0.0 0.0 \n",
+ "102980 0.0 0.0 0.0 \n",
+ "81410 0.0 0.0 0.0 \n",
+ "\n",
+ " age_class_(15.0, 18.0] age_class_(18.0, 21.0] \\\n",
+ "94876 0.0 0.0 \n",
+ "74492 0.0 0.0 \n",
+ "48991 0.0 0.0 \n",
+ "109765 0.0 0.0 \n",
+ "16286 0.0 0.0 \n",
+ "... ... ... \n",
+ "105990 0.0 0.0 \n",
+ "31564 0.0 0.0 \n",
+ "47890 0.0 0.0 \n",
+ "102980 0.0 0.0 \n",
+ "81410 0.0 0.0 \n",
+ "\n",
+ " age_class_(21.0, 24.0] age_class_(24.0, 27.0] \\\n",
+ "94876 0.0 0.0 \n",
+ "74492 0.0 0.0 \n",
+ "48991 0.0 0.0 \n",
+ "109765 0.0 0.0 \n",
+ "16286 0.0 0.0 \n",
+ "... ... ... \n",
+ "105990 0.0 0.0 \n",
+ "31564 0.0 0.0 \n",
+ "47890 0.0 0.0 \n",
+ "102980 0.0 0.0 \n",
+ "81410 0.0 0.0 \n",
+ "\n",
+ " age_class_(27.0, 30.0] age_class_(30.0, 33.0] \\\n",
+ "94876 0.0 0.0 \n",
+ "74492 0.0 0.0 \n",
+ "48991 0.0 0.0 \n",
+ "109765 0.0 0.0 \n",
+ "16286 0.0 0.0 \n",
+ "... ... ... \n",
+ "105990 0.0 0.0 \n",
+ "31564 0.0 0.0 \n",
+ "47890 0.0 0.0 \n",
+ "102980 0.0 0.0 \n",
+ "81410 0.0 0.0 \n",
+ "\n",
+ " age_class_(33.0, 36.0] age_class_(36.0, 39.0] \\\n",
+ "94876 0.0 0.0 \n",
+ "74492 0.0 0.0 \n",
+ "48991 0.0 0.0 \n",
+ "109765 0.0 0.0 \n",
+ "16286 0.0 0.0 \n",
+ "... ... ... \n",
+ "105990 0.0 0.0 \n",
+ "31564 0.0 0.0 \n",
+ "47890 0.0 0.0 \n",
+ "102980 0.0 0.0 \n",
+ "81410 0.0 0.0 \n",
+ "\n",
+ " age_class_(39.0, 42.0] age_class_(42.0, 45.0] \\\n",
+ "94876 0.0 0.0 \n",
+ "74492 0.0 0.0 \n",
+ "48991 0.0 0.0 \n",
+ "109765 0.0 0.0 \n",
+ "16286 0.0 0.0 \n",
+ "... ... ... \n",
+ "105990 0.0 0.0 \n",
+ "31564 0.0 0.0 \n",
+ "47890 0.0 0.0 \n",
+ "102980 0.0 0.0 \n",
+ "81410 0.0 0.0 \n",
+ "\n",
+ " age_class_(45.0, 48.0] age_class_(48.0, 51.0] \\\n",
+ "94876 0.0 0.0 \n",
+ "74492 0.0 0.0 \n",
+ "48991 0.0 0.0 \n",
+ "109765 0.0 0.0 \n",
+ "16286 0.0 1.0 \n",
+ "... ... ... \n",
+ "105990 0.0 0.0 \n",
+ "31564 0.0 1.0 \n",
+ "47890 0.0 0.0 \n",
+ "102980 0.0 0.0 \n",
+ "81410 0.0 0.0 \n",
+ "\n",
+ " age_class_(51.0, 54.0] age_class_(54.0, 57.0] \\\n",
+ "94876 0.0 0.0 \n",
+ "74492 0.0 0.0 \n",
+ "48991 1.0 0.0 \n",
+ "109765 0.0 0.0 \n",
+ "16286 0.0 0.0 \n",
+ "... ... ... \n",
+ "105990 0.0 0.0 \n",
+ "31564 0.0 0.0 \n",
+ "47890 0.0 0.0 \n",
+ "102980 0.0 0.0 \n",
+ "81410 0.0 0.0 \n",
+ "\n",
+ " age_class_(57.0, 60.0] age_class_(60.0, 63.0] \\\n",
+ "94876 0.0 0.0 \n",
+ "74492 1.0 0.0 \n",
+ "48991 0.0 0.0 \n",
+ "109765 0.0 0.0 \n",
+ "16286 0.0 0.0 \n",
+ "... ... ... \n",
+ "105990 0.0 0.0 \n",
+ "31564 0.0 0.0 \n",
+ "47890 1.0 0.0 \n",
+ "102980 0.0 1.0 \n",
+ "81410 0.0 0.0 \n",
+ "\n",
+ " age_class_(63.0, 66.0] age_class_(66.0, 69.0] \\\n",
+ "94876 0.0 0.0 \n",
+ "74492 0.0 0.0 \n",
+ "48991 0.0 0.0 \n",
+ "109765 0.0 0.0 \n",
+ "16286 0.0 0.0 \n",
+ "... ... ... \n",
+ "105990 0.0 0.0 \n",
+ "31564 0.0 0.0 \n",
+ "47890 0.0 0.0 \n",
+ "102980 0.0 0.0 \n",
+ "81410 0.0 0.0 \n",
+ "\n",
+ " age_class_(69.0, 72.0] age_class_(72.0, 75.0] \\\n",
+ "94876 1.0 0.0 \n",
+ "74492 0.0 0.0 \n",
+ "48991 0.0 0.0 \n",
+ "109765 0.0 0.0 \n",
+ "16286 0.0 0.0 \n",
+ "... ... ... \n",
+ "105990 0.0 1.0 \n",
+ "31564 0.0 0.0 \n",
+ "47890 0.0 0.0 \n",
+ "102980 0.0 0.0 \n",
+ "81410 1.0 0.0 \n",
+ "\n",
+ " age_class_(75.0, 78.0] age_class_(78.0, 81.0] \\\n",
+ "94876 0.0 0.0 \n",
+ "74492 0.0 0.0 \n",
+ "48991 0.0 0.0 \n",
+ "109765 0.0 1.0 \n",
+ "16286 0.0 0.0 \n",
+ "... ... ... \n",
+ "105990 0.0 0.0 \n",
+ "31564 0.0 0.0 \n",
+ "47890 0.0 0.0 \n",
+ "102980 0.0 0.0 \n",
+ "81410 0.0 0.0 \n",
+ "\n",
+ " age_class_(81.0, 84.0] age_class_(84.0, 87.0] \\\n",
+ "94876 0.0 0.0 \n",
+ "74492 0.0 0.0 \n",
+ "48991 0.0 0.0 \n",
+ "109765 0.0 0.0 \n",
+ "16286 0.0 0.0 \n",
+ "... ... ... \n",
+ "105990 0.0 0.0 \n",
+ "31564 0.0 0.0 \n",
+ "47890 0.0 0.0 \n",
+ "102980 0.0 0.0 \n",
+ "81410 0.0 0.0 \n",
+ "\n",
+ " age_class_(87.0, 90.0] age_class_(90.0, 93.0] \\\n",
+ "94876 0.0 0.0 \n",
+ "74492 0.0 0.0 \n",
+ "48991 0.0 0.0 \n",
+ "109765 0.0 0.0 \n",
+ "16286 0.0 0.0 \n",
+ "... ... ... \n",
+ "105990 0.0 0.0 \n",
+ "31564 0.0 0.0 \n",
+ "47890 0.0 0.0 \n",
+ "102980 0.0 0.0 \n",
+ "81410 0.0 0.0 \n",
+ "\n",
+ " age_class_(93.0, 96.0] age_class_(96.0, 99.0] \\\n",
+ "94876 0.0 0.0 \n",
+ "74492 0.0 0.0 \n",
+ "48991 0.0 0.0 \n",
+ "109765 0.0 0.0 \n",
+ "16286 0.0 0.0 \n",
+ "... ... ... \n",
+ "105990 0.0 0.0 \n",
+ "31564 0.0 0.0 \n",
+ "47890 0.0 0.0 \n",
+ "102980 0.0 0.0 \n",
+ "81410 0.0 0.0 \n",
+ "\n",
+ " age_class_(99.0, 102.0] \n",
+ "94876 0.0 \n",
+ "74492 0.0 \n",
+ "48991 0.0 \n",
+ "109765 0.0 \n",
+ "16286 0.0 \n",
+ "... ... \n",
+ "105990 0.0 \n",
+ "31564 0.0 \n",
+ "47890 0.0 \n",
+ "102980 0.0 \n",
+ "81410 0.0 \n",
+ "\n",
+ "[32595 rows x 43 columns]"
+ ]
+ },
+ "execution_count": 15,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "X_test = pd.get_dummies(X_test, dtype='float')\n",
+ "X_test = X_test.drop('age', axis=1)\n",
+ "X_test"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "Q8MhdlfDMAqV"
+ },
+ "source": [
+ "#### 4.数据预处理之幅度缩放"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 16,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "collapsed": true,
+ "id": "_XAKQa86MAqV",
+ "jupyter": {
+ "outputs_hidden": true
+ },
+ "outputId": "7a0985ec-c685-44ee-8e3c-e76f05ce0be1"
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "array([[-0.02280078, -0.10952419, -0.44571797, ..., -0.0183736 ,\n",
+ " -0.01356876, -0.005539 ],\n",
+ " [-0.02352386, -0.10952419, 0.34755716, ..., -0.0183736 ,\n",
+ " -0.01356876, -0.005539 ],\n",
+ " [-0.02438938, 0.18379115, -0.20527158, ..., -0.0183736 ,\n",
+ " -0.01356876, -0.005539 ],\n",
+ " ...,\n",
+ " [-0.02411731, -0.10952419, -0.92966324, ..., -0.0183736 ,\n",
+ " -0.01356876, -0.005539 ],\n",
+ " [-0.0239546 , -0.10952419, 0.76923 , ..., -0.0183736 ,\n",
+ " -0.01356876, -0.005539 ],\n",
+ " [-0.02249839, -0.10952419, 1.48235852, ..., -0.0183736 ,\n",
+ " -0.01356876, -0.005539 ]])"
+ ]
+ },
+ "execution_count": 16,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# 请对连续值特征进行幅度缩放\n",
+ "# 提示:可以使用StandardScaler等幅度缩放器进行处理\n",
+ "from sklearn.preprocessing import StandardScaler\n",
+ "import numpy as np\n",
+ "\n",
+ "scaler = StandardScaler()\n",
+ "scaler.fit(X_train)\n",
+ "scaler.fit(X_test)\n",
+ "X_train_std = scaler.transform(X_train)\n",
+ "X_test_std = scaler.transform(X_test)\n",
+ "X_train_std"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "yIa7KMepMAqW"
+ },
+ "source": [
+ "#### 5.使用logistic regression建模,并且输出一下系数,分析重要度。 "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 17,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "collapsed": true,
+ "id": "6o74poK3MAqW",
+ "jupyter": {
+ "outputs_hidden": true
+ },
+ "outputId": "c6512799-a9ab-4735-e856-0a1fd4ae51e3"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Coefficients: [[-2.81589427e-02 1.63969606e+00 2.96352536e-01 -1.28711091e-01\n",
+ " -1.17800741e-01 1.44089745e+00 -1.53793032e-01 -2.91889200e+00\n",
+ " 8.30549763e-02 -2.69194891e-03 0.00000000e+00 0.00000000e+00\n",
+ " 0.00000000e+00 0.00000000e+00 0.00000000e+00 -2.23147494e-02\n",
+ " 6.22851242e-02 1.05842914e-01 1.23723820e-01 1.00275470e-01\n",
+ " 8.62764684e-02 6.00809812e-02 6.98201692e-02 4.27174214e-02\n",
+ " 6.21330528e-02 4.78632927e-02 4.40763401e-02 -1.73953563e-02\n",
+ " -3.55745157e-02 -5.91254236e-02 -1.22688949e-01 -1.37964712e-01\n",
+ " -1.05043205e-01 -9.35789660e-02 -1.15241058e-01 -1.44065903e-01\n",
+ " -7.62206144e-02 -5.44624724e-02 -1.12937889e-02 -2.40835057e-02\n",
+ " -2.02254765e-01 1.45156952e-02 -5.22338763e-02]]\n"
+ ]
+ }
+ ],
+ "source": [
+ "# 提示:fit建模,建完模之后可以取出coef属性\n",
+ "from sklearn.linear_model import LogisticRegression\n",
+ "\n",
+ "lr = LogisticRegression()\n",
+ "lr.fit(X_train_std, y_train)\n",
+ "coefficients = lr.coef_\n",
+ "print(\"Coefficients:\", coefficients)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "OIw0HvmhMAqW"
+ },
+ "source": [
+ "#### 6.使用网格搜索交叉验证进行调参\n",
+ "调整penalty和C参数,其中penalty候选为\"l1\"和\"l2\",C的候选为[1,10,100,500]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 18,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 118
+ },
+ "collapsed": true,
+ "id": "GQzqlSMDMAqW",
+ "jupyter": {
+ "outputs_hidden": true
+ },
+ "outputId": "0ece21fd-3382-4b8a-e211-4945291b503f"
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "GridSearchCV(cv=5, estimator=LogisticRegression(solver='liblinear'),\n",
+ " param_grid={'C': [1, 10, 100, 500], 'penalty': ['l1', 'l2']},\n",
+ " scoring='accuracy') In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org. GridSearchCV GridSearchCV(cv=5, estimator=LogisticRegression(solver='liblinear'),\n",
+ " param_grid={'C': [1, 10, 100, 500], 'penalty': ['l1', 'l2']},\n",
+ " scoring='accuracy') "
+ ],
+ "text/plain": [
+ "GridSearchCV(cv=5, estimator=LogisticRegression(solver='liblinear'),\n",
+ " param_grid={'C': [1, 10, 100, 500], 'penalty': ['l1', 'l2']},\n",
+ " scoring='accuracy')"
+ ]
+ },
+ "execution_count": 18,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# 提示:先按照上面要求准备好网格字典,再使用GridSearchCV进行调参\n",
+ "\n",
+ "from sklearn.model_selection import GridSearchCV\n",
+ "\n",
+ "\n",
+ "logreg = LogisticRegression(solver='liblinear')\n",
+ "\n",
+ "param_grid = {\n",
+ " 'penalty': ['l1', 'l2'],\n",
+ " 'C': [1, 10, 100, 500]\n",
+ "}\n",
+ "\n",
+ "grid_search = GridSearchCV(logreg, param_grid, cv=5, scoring='accuracy')\n",
+ "grid_search.fit(X_train_std, y_train)\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 20,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "collapsed": true,
+ "id": "3p6zYd0uMAqX",
+ "jupyter": {
+ "outputs_hidden": true
+ },
+ "outputId": "7701011e-8441-4a79-e8ee-205c7dd3cf99"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Best parameters: {'C': 1, 'penalty': 'l2'}\n",
+ "Best cross-validation score: 0.93\n"
+ ]
+ }
+ ],
+ "source": [
+ "# 输出最好的超参数\n",
+ "# 输出最好的模型\n",
+ "\n",
+ "print(\"Best parameters:\", grid_search.best_params_)\n",
+ "print(\"Best cross-validation score: {:.2f}\".format(grid_search.best_score_))\n",
+ "\n",
+ "\n",
+ "best_model = grid_search.best_estimator_\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "BZ7sobk-MAqX"
+ },
+ "source": [
+ "#### 7.在测试集上进行预测,计算 查准率/查全率/auc/混淆矩阵/f1值 等测试指标"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 21,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "collapsed": true,
+ "id": "PMNRRrFUMAqX",
+ "jupyter": {
+ "outputs_hidden": true
+ },
+ "outputId": "bd10842a-c1d9-447f-e501-fbd141cd431c"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Precision: 0.5808823529411765\n",
+ "Recall: 0.03552158273381295\n",
+ "AUC: 0.7077820547881606\n",
+ "Confusion Matrix:\n",
+ " [[30314 57]\n",
+ " [ 2145 79]]\n",
+ "F1 Score: 0.06694915254237287\n"
+ ]
+ }
+ ],
+ "source": [
+ "# 提示:在测试集上预测可以使用predict\n",
+ "# 提示:各种指标可以在sklearn.metrics中查到各种评估指标,分别是accuracy_score、recall_score、auc、confusion_matrix、f1_score\n",
+ "from sklearn.metrics import precision_score, recall_score, roc_auc_score, confusion_matrix, f1_score\n",
+ "\n",
+ "\n",
+ "y_pred = best_model.predict(X_test_std)\n",
+ "\n",
+ "# 精确度\n",
+ "precision = precision_score(y_test, y_pred)\n",
+ "\n",
+ "# 召回率\n",
+ "recall = recall_score(y_test, y_pred)\n",
+ "\n",
+ "# AUC\n",
+ "y_pred_prob = best_model.predict_proba(X_test_std)[:, 1]\n",
+ "auc = roc_auc_score(y_test, y_pred_prob)\n",
+ "\n",
+ "# 混淆矩阵\n",
+ "conf_matrix = confusion_matrix(y_test, y_pred)\n",
+ "\n",
+ "# F1 值\n",
+ "f1 = f1_score(y_test, y_pred)\n",
+ "\n",
+ "\n",
+ "print(\"Precision:\", precision)\n",
+ "print(\"Recall:\", recall)\n",
+ "print(\"AUC:\", auc)\n",
+ "print(\"Confusion Matrix:\\n\", conf_matrix)\n",
+ "print(\"F1 Score:\", f1)\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "gW2BPfwYMAqX"
+ },
+ "source": [
+ "#### 8.更多优化\n",
+ "银行通常会有更严格的要求,因为欺诈带来的后果通常比较严重,一般我们会调整模型的标准。 \n",
+ "\n",
+ "比如在logistic regression当中,一般我们的概率判定边界为0.5,但是我们可以把阈值设定低一些,来提高模型的“敏感度” \n",
+ "试试看把阈值设定为0.3,再看看这个时候的混淆矩阵等评估指标。"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 22,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "collapsed": true,
+ "id": "GXrNzRoXMAqY",
+ "jupyter": {
+ "outputs_hidden": true
+ },
+ "outputId": "76943b8f-7bb3-4e21-a217-18ada2361859"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Threshold: 0.1\n",
+ "Precision: 0.20557645496791324\n",
+ "Recall: 0.4177158273381295\n",
+ "F1 Score: 0.2755450096396263\n",
+ "Confusion Matrix:\n",
+ "[[26781 3590]\n",
+ " [ 1295 929]]\n",
+ "\n",
+ "Threshold: 0.2\n",
+ "Precision: 0.4172972972972973\n",
+ "Recall: 0.1735611510791367\n",
+ "F1 Score: 0.24515719275960624\n",
+ "Confusion Matrix:\n",
+ "[[29832 539]\n",
+ " [ 1838 386]]\n",
+ "\n",
+ "Threshold: 0.3\n",
+ "Precision: 0.5319693094629157\n",
+ "Recall: 0.09352517985611511\n",
+ "F1 Score: 0.15908221797323135\n",
+ "Confusion Matrix:\n",
+ "[[30188 183]\n",
+ " [ 2016 208]]\n",
+ "\n",
+ "Threshold: 0.4\n",
+ "Precision: 0.5739910313901345\n",
+ "Recall: 0.05755395683453238\n",
+ "F1 Score: 0.10461789946873724\n",
+ "Confusion Matrix:\n",
+ "[[30276 95]\n",
+ " [ 2096 128]]\n",
+ "\n",
+ "Threshold: 0.5\n",
+ "Precision: 0.5808823529411765\n",
+ "Recall: 0.03552158273381295\n",
+ "F1 Score: 0.06694915254237287\n",
+ "Confusion Matrix:\n",
+ "[[30314 57]\n",
+ " [ 2145 79]]\n",
+ "\n",
+ "Threshold: 0.6\n",
+ "Precision: 0.5957446808510638\n",
+ "Recall: 0.025179856115107913\n",
+ "F1 Score: 0.04831751509922347\n",
+ "Confusion Matrix:\n",
+ "[[30333 38]\n",
+ " [ 2168 56]]\n",
+ "\n",
+ "Threshold: 0.7\n",
+ "Precision: 0.5172413793103449\n",
+ "Recall: 0.013489208633093525\n",
+ "F1 Score: 0.026292725679228742\n",
+ "Confusion Matrix:\n",
+ "[[30343 28]\n",
+ " [ 2194 30]]\n",
+ "\n",
+ "Threshold: 0.8\n",
+ "Precision: 0.5652173913043478\n",
+ "Recall: 0.005845323741007194\n",
+ "F1 Score: 0.011570983533600357\n",
+ "Confusion Matrix:\n",
+ "[[30361 10]\n",
+ " [ 2211 13]]\n",
+ "\n",
+ "Threshold: 0.9\n",
+ "Precision: 0.0\n",
+ "Recall: 0.0\n",
+ "F1 Score: 0.0\n",
+ "Confusion Matrix:\n",
+ "[[30366 5]\n",
+ " [ 2224 0]]\n",
+ "\n"
+ ]
+ }
+ ],
+ "source": [
+ "# 提示:thresholds = [0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9]\n",
+ "# 根据predict_proba的结果和threshold的比较确定结果,再评估各种结果指标\n",
+ "from sklearn.metrics import precision_score, recall_score, f1_score, confusion_matrix\n",
+ "\n",
+ "thresholds = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]\n",
+ "\n",
+ "\n",
+ "y_pred_probs = best_model.predict_proba(X_test_std)[:, 1]\n",
+ "\n",
+ "for threshold in thresholds:\n",
+ "\n",
+ " y_pred = (y_pred_probs >= threshold).astype(int)\n",
+ "\n",
+ " \n",
+ " precision = precision_score(y_test, y_pred)\n",
+ " recall = recall_score(y_test, y_pred)\n",
+ " f1 = f1_score(y_test, y_pred)\n",
+ " conf_matrix = confusion_matrix(y_test, y_pred)\n",
+ "\n",
+ " \n",
+ " print(f\"Threshold: {threshold}\")\n",
+ " print(f\"Precision: {precision}\")\n",
+ " print(f\"Recall: {recall}\")\n",
+ " print(f\"F1 Score: {f1}\")\n",
+ " print(f\"Confusion Matrix:\\n{conf_matrix}\\n\")\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "aFV9vlyGMAqY"
+ },
+ "source": [
+ "#### 9.尝试对不同特征的重要度进行排序,通过特征选择的方式,对特征进行筛选。并重新建模,观察此时的模型准确率等评估指标。"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 34,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "collapsed": true,
+ "id": "yEumBzTqMAqY",
+ "jupyter": {
+ "outputs_hidden": true
+ },
+ "outputId": "921af1aa-7567-4f2a-9f37-4012923a0805"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Feature: NumberOfTime60-89DaysPastDueNotWorse, Importance: 2.9174653938974706\n",
+ "Feature: NumberOfTime30-59DaysPastDueNotWorse, Importance: 1.6388460770487727\n",
+ "Feature: NumberOfTimes90DaysLate, Importance: 1.4403444959388176\n",
+ "Feature: DebtRatio, Importance: 0.2963457929465204\n",
+ "Feature: NumberRealEstateLoansOrLines, Importance: 0.15390142709018867\n",
+ "Feature: age_class_(27.0, 30.0], Importance: 0.1381495247113989\n",
+ "Feature: age_class_(78.0, 81.0], Importance: 0.1340374302506106\n",
+ "Feature: MonthlyIncome, Importance: 0.12802437048535784\n",
+ "Feature: age_class_(66.0, 69.0], Importance: 0.1228107647427749\n",
+ "Feature: NumberOfOpenCreditLinesAndLoans, Importance: 0.11780801501606504\n",
+ "Feature: age_class_(24.0, 27.0], Importance: 0.11737185444635168\n",
+ "Feature: age_class_(30.0, 33.0], Importance: 0.11627012277656434\n",
+ "Feature: age_class_(63.0, 66.0], Importance: 0.10610256001455878\n",
+ "Feature: age_class_(93.0, 96.0], Importance: 0.10529771171509311\n",
+ "Feature: age_class_(75.0, 78.0], Importance: 0.10472219206856115\n",
+ "Feature: age_class_(33.0, 36.0], Importance: 0.10266920409572494\n",
+ "Feature: age_class_(69.0, 72.0], Importance: 0.09173620740925557\n",
+ "Feature: age_class_(39.0, 42.0], Importance: 0.0886172285193445\n",
+ "Feature: NumberOfDependents, Importance: 0.08299353430525591\n",
+ "Feature: age_class_(45.0, 48.0], Importance: 0.08257977126074513\n",
+ "Feature: age_class_(72.0, 75.0], Importance: 0.0813493953813997\n",
+ "Feature: age_class_(36.0, 39.0], Importance: 0.07778977221362726\n",
+ "Feature: age_class_(-0.001, 3.0], Importance: 0.07636563563099678\n",
+ "Feature: age_class_(21.0, 24.0], Importance: 0.07062583494783624\n",
+ "Feature: age_class_(81.0, 84.0], Importance: 0.06897853067843891\n",
+ "Feature: age_class_(48.0, 51.0], Importance: 0.06812909366192271\n",
+ "Feature: age_class_(51.0, 54.0], Importance: 0.06376091352564103\n",
+ "Feature: age_class_(42.0, 45.0], Importance: 0.06229064909059505\n",
+ "Feature: age_class_(84.0, 87.0], Importance: 0.0482630874634455\n",
+ "Feature: age_class_(60.0, 63.0], Importance: 0.039924248168778015\n",
+ "Feature: RevolvingUtilizationOfUnsecuredLines, Importance: 0.02822625554947292\n",
+ "Feature: age_class_(99.0, 102.0], Importance: 0.02751711504765149\n",
+ "Feature: age_class_(90.0, 93.0], Importance: 0.02079225073730944\n",
+ "Feature: age_class_(18.0, 21.0], Importance: 0.020027256001566615\n",
+ "Feature: age_class_(57.0, 60.0], Importance: 0.016868344113720946\n",
+ "Feature: age_class_(96.0, 99.0], Importance: 0.015543600698082265\n",
+ "Feature: age_class_(87.0, 90.0], Importance: 0.00617061635496714\n",
+ "Feature: age_class_(54.0, 57.0], Importance: 0.0016995187975081712\n",
+ "Feature: age_class_(3.0, 6.0], Importance: 0.0\n",
+ "Feature: age_class_(6.0, 9.0], Importance: 0.0\n",
+ "Feature: age_class_(9.0, 12.0], Importance: 0.0\n",
+ "Feature: age_class_(12.0, 15.0], Importance: 0.0\n",
+ "Feature: age_class_(15.0, 18.0], Importance: 0.0\n",
+ "Precision: 0.5808823529411765\n",
+ "Recall: 0.03552158273381295\n",
+ "F1 Score: 0.06694915254237287\n"
+ ]
+ }
+ ],
+ "source": [
+ "# 可以根据逻辑回归的系数绝对值大小进行排序,也可以基于树模型的特征重要度进行排序\n",
+ "# 特征选择可以使用RFE或者selectFromModel\n",
+ "\n",
+ "import numpy as np\n",
+ "\n",
+ "\n",
+ "feature_names = X_train.columns\n",
+ "coefficients = best_model.coef_[0]\n",
+ "features_importance = zip(feature_names, np.abs(coefficients))\n",
+ "\n",
+ "\n",
+ "sorted_features = sorted(features_importance, key=lambda x: x[1], reverse=True)\n",
+ "for feature, importance in sorted_features:\n",
+ " print(f\"Feature: {feature}, Importance: {importance}\")\n",
+ "\n",
+ "top_features = [feature for feature, importance in sorted_features] \n",
+ "feature_names_list = feature_names.tolist()\n",
+ "\n",
+ "top_features_indices = [feature_names_list.index(feature) for feature in top_features]\n",
+ "\n",
+ "\n",
+ "X_train_selected = X_train_std[:, top_features_indices]\n",
+ "X_test_selected = X_test_std[:, top_features_indices]\n",
+ "\n",
+ "def new_model_func(model):\n",
+ "\n",
+ " model.fit(X_train_selected, y_train)\n",
+ "\n",
+ " y_pred = model.predict(X_test_selected)\n",
+ " print(\"Precision:\", precision_score(y_test, y_pred))\n",
+ " print(\"Recall:\", recall_score(y_test, y_pred))\n",
+ " print(\"F1 Score:\", f1_score(y_test, y_pred))\n",
+ "new_model = LogisticRegression()\n",
+ "new_model_func(new_model)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "d_Qm2FTHMAqZ"
+ },
+ "source": [
+ "#### 10.其他模型算法尝试\n",
+ "使用RandomForestClassifier/SVM/KNN等sklearn分类算法进行分类,尝试上述超参数调优算法过程。"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 36,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "collapsed": true,
+ "id": "bXhgfqWxMAqZ",
+ "jupyter": {
+ "outputs_hidden": true
+ },
+ "outputId": "eda6882f-a900-4c58-e532-59f15d9ed978"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "随机森林\n",
+ "Precision: 0.5507745266781411\n",
+ "Recall: 0.14388489208633093\n",
+ "F1 Score: 0.22816399286987524\n",
+ "支持向量机\n",
+ "Precision: 0.6\n",
+ "Recall: 0.009442446043165468\n",
+ "F1 Score: 0.01859229747675963\n",
+ "K最近邻\n",
+ "Precision: 0.46301369863013697\n",
+ "Recall: 0.07598920863309352\n",
+ "F1 Score: 0.13055233680957898\n"
+ ]
+ }
+ ],
+ "source": [
+ "# 随机森林\n",
+ "from sklearn.ensemble import RandomForestClassifier\n",
+ "print(\"随机森林\")\n",
+ "new_model = RandomForestClassifier()\n",
+ "new_model_func(new_model)\n",
+ "# 支持向量机\n",
+ "from sklearn.svm import SVC\n",
+ "print(\"支持向量机\")\n",
+ "new_model = SVC()\n",
+ "new_model_func(new_model)\n",
+ "# K最近邻\n",
+ "from sklearn.neighbors import KNeighborsClassifier\n",
+ "print(\"K最近邻\")\n",
+ "new_model = KNeighborsClassifier()\n",
+ "new_model_func(new_model)\n",
+ "\n"
+ ]
+ }
+ ],
+ "metadata": {
+ "colab": {
+ "provenance": []
+ },
+ "kernelspec": {
+ "display_name": "Python 3 (ipykernel)",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.10.13"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}