PART I: German Credit Score Classification Model EDA ¶

By: Krishna J¶

Importing necessary libraries ¶

import pandas as pd
import numpy as np
import seaborn               as sns
import matplotlib.pyplot     as plt
import shap
import eli5
from sklearn.model_selection import train_test_split
#from sklearn.ensemble        import RandomForestClassifier
#from sklearn.linear_model    import LogisticRegression
from sklearn.preprocessing   import MinMaxScaler, StandardScaler
from sklearn.base            import TransformerMixin
from sklearn.pipeline        import Pipeline, FeatureUnion
from typing                  import List, Union, Dict
# Warnings will be used to silence various model warnings for tidier output
import warnings
warnings.filterwarnings('ignore')
%matplotlib inline 
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
np.random.seed(0)

pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.

Importing the source dataset ¶

Source:

https://archive.ics.uci.edu/ml/datasets/Statlog+%28German+Credit+Data%29

Professor Dr. Hans Hofmann Institut f"ur Statistik und "Okonometrie Universit"at Hamburg FB Wirtschaftswissenschaften Von-Melle-Park 5 2000 Hamburg 13

This file has been edited and several indicator variables added to make it suitable for algorithms which cannot cope with categorical variables. Several attributes that are ordered categorical (such as attribute 17) have been coded as integer.

feature_list = ['CurrentAcc', 'NumMonths', 'CreditHistory', 'Purpose', 'CreditAmount', 
         'Savings', 'EmployDuration', 'PayBackPercent', 'Gender', 'Debtors', 
         'ResidenceDuration', 'Collateral', 'Age', 'OtherPayBackPlan', 'Property', 
         'ExistingCredit', 'Job', 'Dependents', 'Telephone', 'Foreignworker', 'CreditStatus']

german_xai = pd.read_csv('C:/Users/krish/Downloads/german.data.txt',names = feature_list, delimiter=' ')

german_xai.head()

german_xai.shape

(1000, 21)

The dataset has 1000 entries with 21 fields.

type(german_xai)

pandas.core.frame.DataFrame

german_xai.head(10)

german_xai.columns

Index(['CurrentAcc', 'NumMonths', 'CreditHistory', 'Purpose', 'CreditAmount',
       'Savings', 'EmployDuration', 'PayBackPercent', 'Gender', 'Debtors',
       'ResidenceDuration', 'Collateral', 'Age', 'OtherPayBackPlan',
       'Property', 'ExistingCredit', 'Job', 'Dependents', 'Telephone',
       'Foreignworker', 'CreditStatus'],
      dtype='object')

List of fields in the source dataset are listed above

german_xai.dtypes

CurrentAcc           object
NumMonths             int64
CreditHistory        object
Purpose              object
CreditAmount          int64
Savings              object
EmployDuration       object
PayBackPercent        int64
Gender               object
Debtors              object
ResidenceDuration     int64
Collateral           object
Age                   int64
OtherPayBackPlan     object
Property             object
ExistingCredit        int64
Job                  object
Dependents            int64
Telephone            object
Foreignworker        object
CreditStatus          int64
dtype: object

Datatypes of each field is displayed above

Missing Value Check ¶

import klib
klib.missingval_plot(german_xai)

No missing values found in the dataset.

Feature Engineering ¶

Encoding categorical fields ¶

1. Mapping to actual description¶

Here, first we are mapping the encrypted domain values of each field to its corresponding actual values depending on the description provided in the UCI machine learning repository.

Gender field desc:

A91 : male : divorced/separated;
A92 : female : divorced/separated/married;
A93 : male : single;
A94 : male : married/widowed;
A95 : female : single. Male is encoded as 1 and female as 0.

Creating new field marital status to study the impact as protected attribute.¶

german_xai['Gender'].value_counts()
#german_xai.replace({'Marital_Status':{'A93':'Single','A91':'divorced/married/widowed','A92':'divorced/married/widowed','A94':'divorced/married/widowed'},'Gender':{'A91':'1','A93':'1','A94':'1','A92':'0'}},inplace=True)
german_xai.replace({'Gender':{'A91':'1','A93':'1','A94':'1','A92':'0'}},inplace=True)
german_xai['Gender'].value_counts()

A93    548
A92    310
A94     92
A91     50
Name: Gender, dtype: int64

1    690
0    310
Name: Gender, dtype: int64

#german_xai['Age'].value_counts()
german_xai['Age']=german_xai['Age'].apply(lambda x: np.int(x >= 26))
german_xai['Age'].value_counts()

1    810
0    190
Name: Age, dtype: int64

Entries with age greater than or equal to 26yrs is encoded as 1 otherwise 0

#Encoding target field
german_xai.CreditStatus.value_counts()
german_xai['CreditStatus'].replace({1:1 , 2: 0}, inplace=True)
german_xai.CreditStatus.value_counts()

1    700
2    300
Name: CreditStatus, dtype: int64

1    700
0    300
Name: CreditStatus, dtype: int64

Target field CreditStatus is encoded as 1 = Good, 0 = Bad (positive class) ; in actual data 1 = Good, 2 = Bad. https://aif360.readthedocs.io/en/latest/modules/generated/aif360.datasets.GermanDataset.html#aif360.datasets.GermanDataset

Status of checking account desc:

A11 : <0;
A12 : 0 to 200;
A13 : >=200;
A14 : no account checking.

german_xai['CurrentAcc'].replace({'A11':'LT200' , 'A12': 'LT200','A13': 'GE200','A14': 'None'}, inplace=True)
german_xai.CurrentAcc.value_counts()

LT200    543
None     394
GE200     63
Name: CurrentAcc, dtype: int64

Employment duration desc:

A71 : unemployed;
A72 : ... < 1 year;
A73 : 1 <= ... < 4 years;
A74 : 4 <= ... < 7 years;
A75 : .. >= 7 years.

german_xai['EmployDuration'].replace({'A71':'unemployed' , 'A72': 'LT1','A73': '1-4','A74': '4-7', 'A75': 'GE7'}, inplace=True)
german_xai.EmployDuration.value_counts()

1-4           339
GE7           253
4-7           174
LT1           172
unemployed     62
Name: EmployDuration, dtype: int64

Credit History desc:

A30 : no credits taken/ all credits paid back duly,
A31 : all credits at this bank paid back duly,
A32 : existing credits paid back duly till now,
A33 : delay in paying off in the past,
A34 : critical account/ other credits existing (not at this bank).

german_xai['CreditHistory'].replace({'A30':'none/paid' , 'A31': 'none/paid','A32': 'none/paid','A33': 'Delay', 'A34': 'other'}, inplace=True)
german_xai['CreditHistory'].value_counts()

none/paid    619
other        293
Delay         88
Name: CreditHistory, dtype: int64

Savings Desc:

A61 : ... < 100 DM
A62 : 100 <= ... < 500 DM
A63 : 500 <= ... < 1000 DM
A64 : .. >= 1000 DM
A65 : unknown/ no savings account

german_xai['Savings'].replace({'A61':'LT500' , 'A62': 'LT500','A63': 'GT500','A64': 'GT500', 'A65': 'none'}, inplace=True)
german_xai['Savings'].value_counts()

LT500    706
none     183
GT500    111
Name: Savings, dtype: int64

Debtors desc: Other debtors / guarantors

A101 : none
A102 : co-applicant
A103 : guarantor

german_xai['Debtors'].replace({'A101':'none' , 'A102': 'co-applicant','A103': 'guarantor'}, inplace=True)
german_xai['Debtors'].value_counts()

none            907
guarantor        52
co-applicant     41
Name: Debtors, dtype: int64

Collateral desc:

A121 : real estate
A122 : if not A121 : building society savings agreement/ life insurance
A123 : if not A121/A122 : car or other, not in attribute 6
A124 : unknown / no property

german_xai['Collateral'].replace({'A121':'real_estate' , 'A122': 'savings/life_insurance','A123': 'car/other', 'A124':'unknown/none'}, inplace=True)
german_xai['Collateral'].value_counts()

car/other                 332
real_estate               282
savings/life_insurance    232
unknown/none              154
Name: Collateral, dtype: int64

Property: Housing

A151 : rent
A152 : own
A153 : for free

german_xai['Property'].replace({'A151':'rent' , 'A152': 'own','A153': 'free'}, inplace=True)
german_xai['Property'].value_counts()

own     713
rent    179
free    108
Name: Property, dtype: int64

Telephone desc:

A191 : none
A192 : yes, registered under the customers name

Foreign worker

A201 : yes
A202 : no

german_xai['Foreignworker'].replace({'A201':1 , 'A202': 0}, inplace=True)
german_xai['Telephone'].replace({'A191':0 , 'A192': 1}, inplace=True)
german_xai['Telephone'].value_counts()
german_xai['Foreignworker'].value_counts()

0    596
1    404
Name: Telephone, dtype: int64

1    963
0     37
Name: Foreignworker, dtype: int64

Purpose desc:

A40 : car (new)
A41 : car (used)
A42 : furniture/equipment
A43 : radio/television
A44 : domestic appliances
A45 : repairs
A46 : education
A47 : (vacation - does not exist?)
A48 : retraining
A49 : business
A410 : others

german_xai['Purpose'].replace({'A40':'CarNew' , 'A41': 'CarUsed' , 'A42': 'furniture/equip','A43':'radio/tv','A44':'domestic app','A45':'repairs','A46':'education','A47':'vacation','A48':'retraining','A49':'biz','A410':'others'}, inplace=True)
german_xai['Purpose'].value_counts()

radio/tv           280
CarNew             234
furniture/equip    181
CarUsed            103
biz                 97
education           50
repairs             22
others              12
domestic app        12
retraining           9
Name: Purpose, dtype: int64

Job desc:

A171 : unemployed/ unskilled - non-resident
A172 : unskilled - resident
A173 : skilled employee / official
A174 : management/ self-employed/highly qualified employee/ officer

german_xai['Job'].replace({'A171':'unemp/unskilled-non_resident' , 'A172': 'unskilled-resident','A173': 'skilled_employee','A174':'management/self-emp/officer/highly_qualif_emp'}, inplace=True)
german_xai['Job'].value_counts()

skilled_employee                                 630
unskilled-resident                               200
management/self-emp/officer/highly_qualif_emp    148
unemp/unskilled-non_resident                      22
Name: Job, dtype: int64

Other installment plans desc

A141 : bank
A142 : stores
A143 : none

german_xai['OtherPayBackPlan'].replace({'A141':'bank' , 'A142': 'stores','A143': 'none'}, inplace=True)
german_xai['OtherPayBackPlan'].value_counts()

none      814
bank      139
stores     47
Name: OtherPayBackPlan, dtype: int64

german_xai.head()

german_xai = german_xai.reindex(columns=['CurrentAcc','NumMonths', 'CreditHistory', 'Purpose', 'CreditAmount', 
         'Savings', 'EmployDuration', 'PayBackPercent', 'Gender', 'Debtors', 
         'ResidenceDuration', 'Collateral', 'Age', 'OtherPayBackPlan', 'Property', 
         'ExistingCredit', 'Job', 'Dependents', 'Telephone', 'Foreignworker', 'CreditStatus'])
##german_xai.head()

Writing data to csv file for re-usability¶

german_xai.to_csv('C:/Users/krish/Downloads/German-mapped_upd.csv', index=False)

German_df = pd.read_csv('C:/Users/krish/Downloads/German-mapped_upd.csv')
print(German_df.shape)
print (German_df.columns)

(1000, 21)
Index(['CurrentAcc', 'NumMonths', 'CreditHistory', 'Purpose', 'CreditAmount',
       'Savings', 'EmployDuration', 'PayBackPercent', 'Gender', 'Debtors',
       'ResidenceDuration', 'Collateral', 'Age', 'OtherPayBackPlan',
       'Property', 'ExistingCredit', 'Job', 'Dependents', 'Telephone',
       'Foreignworker', 'CreditStatus'],
      dtype='object')

Data Analysis¶

Correlation Analysis ¶

corrMatrix = round(German_df.corr(),1)
corrMatrix

plt.figure(figsize=(15,15))
sns.heatmap(corrMatrix, annot=True,cmap="Blues")
plt.show()

<Figure size 1080x1080 with 0 Axes>

<matplotlib.axes._subplots.AxesSubplot at 0x1e3ba21ce48>

klib.corr_plot(German_df,annot=False)

<matplotlib.axes._subplots.AxesSubplot at 0x1e3ba934988>

Observation:There is a good correlation credit amount and number of months ¶

Correlation w.r.to target field¶

klib.corr_plot(German_df,target='CreditStatus')

<matplotlib.axes._subplots.AxesSubplot at 0x1e3bafbda08>

klib.corr_mat(German_df)

klib.cat_plot(German_df)

GridSpec(6, 10)

klib.dist_plot(German_df)

<matplotlib.axes._subplots.AxesSubplot at 0x1e3bc4530c8>

import matplotlib.pyplot as plt 

import numpy as np 

age_count=German_df.Age.value_counts(sort=True)

print(age_count)

plt.figure(figsize=(10,5))

age_count.plot(kind='bar', color='skyblue', rot=0) 

plt.ylabel('Frequency',fontsize=12,color='green')

plt.xlabel('Age',fontsize=12,color='green')

plt.suptitle('Distribution of Age field',fontsize=15,color='orange',fontweight='bold')

plt.annotate(age_count[1],xy=(0,300),verticalalignment="top",horizontalalignment="center")
plt.annotate(age_count[0],xy=(1,100),verticalalignment="top",horizontalalignment="center")

LABELS=["1:Age>26","0:Age<26"]
plt.xticks(range(2),LABELS)

1    810
0    190
Name: Age, dtype: int64

<Figure size 720x360 with 0 Axes>

<matplotlib.axes._subplots.AxesSubplot at 0x1e3bc60c8c8>

Text(0, 0.5, 'Frequency')

Text(0.5, 0, 'Age')

Text(0.5, 0.98, 'Distribution of Age field')

Text(0, 300, '810')

Text(1, 100, '190')

([<matplotlib.axis.XTick at 0x1e3bd5e8608>,
  <matplotlib.axis.XTick at 0x1e3bd5e85c8>],
 [Text(0, 0, '1:Age>26'), Text(0, 0, '0:Age<26')])

Observation:There are more entries having age greater than 26yrs ¶

plt.figure(figsize=(10,5))

plt.hist(German_df.CreditAmount, color='tomato') 

plt.ylabel('Frequency')

plt.xlabel('Credit Amount')

plt.suptitle('Distribution of Credit Amount field',fontsize=15,color='slategrey',fontweight='bold')

<Figure size 720x360 with 0 Axes>

(array([445., 293.,  97.,  80.,  38.,  19.,  14.,   8.,   5.,   1.]),
 array([  250. ,  2067.4,  3884.8,  5702.2,  7519.6,  9337. , 11154.4,
        12971.8, 14789.2, 16606.6, 18424. ]),
 <a list of 10 Patch objects>)

Text(0, 0.5, 'Frequency')

Text(0.5, 0, 'Credit Amount')

Text(0.5, 0.98, 'Distribution of Credit Amount field')

Observation:There are more entries having lower credit amount that higher credit amount ¶

plt.figure(figsize=(10,5))

plt.hist(German_df.NumMonths, color='tan') 

plt.ylabel('Frequency')

plt.xlabel('Number of Months')

plt.suptitle('Distribution of NumMonths field',fontsize=15,color='teal',fontweight='bold')

<Figure size 720x360 with 0 Axes>

(array([171., 262., 337.,  57.,  86.,  17.,  54.,   2.,  13.,   1.]),
 array([ 4. , 10.8, 17.6, 24.4, 31.2, 38. , 44.8, 51.6, 58.4, 65.2, 72. ]),
 <a list of 10 Patch objects>)

Text(0, 0.5, 'Frequency')

Text(0.5, 0, 'Number of Months')

Text(0.5, 0.98, 'Distribution of NumMonths field')

Observation:There are more entries having lower duration than higher duration in months ¶

target_count=German_df.CreditStatus.value_counts(sort=True)

print(target_count)

plt.figure(figsize=(10,5))

target_count.plot(kind='bar', color='gold', rot=0) 

plt.ylabel('Frequency',fontsize=12,color='green')

plt.xlabel('Credit Status',fontsize=12,color='green')

plt.suptitle('Distribution of Credit Status field',fontsize=15,color='red',fontweight='bold')

plt.annotate(target_count[1],xy=(0,300),verticalalignment="top",horizontalalignment="center")
plt.annotate(target_count[0],xy=(1,200),verticalalignment="top",horizontalalignment="center")

LABELS=["1:Good credit score","0:Bad credit score"]
plt.xticks(range(2),LABELS)

1    700
0    300
Name: CreditStatus, dtype: int64

<Figure size 720x360 with 0 Axes>

<matplotlib.axes._subplots.AxesSubplot at 0x1e3bd77da08>

Text(0, 0.5, 'Frequency')

Text(0.5, 0, 'Credit Status')

Text(0.5, 0.98, 'Distribution of Credit Status field')

Text(0, 300, '700')

Text(1, 200, '300')

([<matplotlib.axis.XTick at 0x1e3bd779c48>,
  <matplotlib.axis.XTick at 0x1e3bd779c08>],
 [Text(0, 0, '1:Good credit score'), Text(0, 0, '0:Bad credit score')])

Observation:There are more entries having good credit score than bad credit score ¶

German_df['Age'].describe()

count    1000.000000
mean        0.810000
std         0.392497
min         0.000000
25%         1.000000
50%         1.000000
75%         1.000000
max         1.000000
Name: Age, dtype: float64

German_df.Gender.unique()

array([1, 0], dtype=int64)

Gender_count=German_df.Gender.value_counts()
print(Gender_count)

plt.figure(figsize=(10,5))

Gender_count.plot(kind='bar', color='pink', rot=0) 

plt.ylabel('Frequency',fontsize=12,color='blue')

plt.xlabel('Gender',fontsize=12,color='blue')

plt.suptitle('Distribution of Gender field',fontsize=15,color='Green',fontweight='bold')

plt.annotate(Gender_count[1],xy=(0,300),verticalalignment="top",horizontalalignment="center")
plt.annotate(Gender_count[0],xy=(1,200),verticalalignment="top",horizontalalignment="center")

LABELS=["1:Male","0:Female"]
plt.xticks(range(2),LABELS)

1    690
0    310
Name: Gender, dtype: int64

<Figure size 720x360 with 0 Axes>

<matplotlib.axes._subplots.AxesSubplot at 0x1e3bd7d72c8>

Text(0, 0.5, 'Frequency')

Text(0.5, 0, 'Gender')

Text(0.5, 0.98, 'Distribution of Gender field')

Text(0, 300, '690')

Text(1, 200, '310')

([<matplotlib.axis.XTick at 0x1e3bd7f9048>,
  <matplotlib.axis.XTick at 0x1e3bd7fafc8>],
 [Text(0, 0, '1:Male'), Text(0, 0, '0:Female')])

Observation:There are more male entries than female entries ¶

colour=['blue','pink','orange','green','tan','violet','olive','gold','tomato','skyblue']
for i,j in zip(German_df.columns,colour):
    field_count=German_df[i].value_counts()
    #print(field_count)

    plt.figure(figsize=(10,5))

    field_count.plot(kind='bar', color=j, rot=0) 

    plt.ylabel('Frequency',fontsize=12,color='black')

    plt.xlabel(i,fontsize=12,color='black')

    plt.suptitle('Distribution of '+ i,fontsize=15,color='Green',fontweight='bold')

<Figure size 720x360 with 0 Axes>

<matplotlib.axes._subplots.AxesSubplot at 0x1e3bd850408>

Text(0, 0.5, 'Frequency')

Text(0.5, 0, 'CurrentAcc')

Text(0.5, 0.98, 'Distribution of CurrentAcc')

<Figure size 720x360 with 0 Axes>

<matplotlib.axes._subplots.AxesSubplot at 0x1e3bd89de48>

Text(0, 0.5, 'Frequency')

Text(0.5, 0, 'NumMonths')

Text(0.5, 0.98, 'Distribution of NumMonths')

Converting categorical fields to numerical fields¶

german_xai=pd.get_dummies(German_df,columns=['CurrentAcc','CreditHistory','Purpose','Savings','EmployDuration','Debtors','Collateral','OtherPayBackPlan','Property','Job'])
german_xai.head()

german_xai.columns

Index(['NumMonths', 'CreditAmount', 'PayBackPercent', 'Gender',
       'ResidenceDuration', 'Age', 'ExistingCredit', 'Dependents', 'Telephone',
       'Foreignworker', 'CreditStatus', 'CurrentAcc_GE200', 'CurrentAcc_LT200',
       'CurrentAcc_None', 'CreditHistory_Delay', 'CreditHistory_none/paid',
       'CreditHistory_other', 'Purpose_CarNew', 'Purpose_CarUsed',
       'Purpose_biz', 'Purpose_domestic app', 'Purpose_education',
       'Purpose_furniture/equip', 'Purpose_others', 'Purpose_radio/tv',
       'Purpose_repairs', 'Purpose_retraining', 'Savings_GT500',
       'Savings_LT500', 'Savings_none', 'EmployDuration_1-4',
       'EmployDuration_4-7', 'EmployDuration_GE7', 'EmployDuration_LT1',
       'EmployDuration_unemployed', 'Debtors_co-applicant',
       'Debtors_guarantor', 'Debtors_none', 'Collateral_car/other',
       'Collateral_real_estate', 'Collateral_savings/life_insurance',
       'Collateral_unknown/none', 'OtherPayBackPlan_bank',
       'OtherPayBackPlan_none', 'OtherPayBackPlan_stores', 'Property_free',
       'Property_own', 'Property_rent',
       'Job_management/self-emp/officer/highly_qualif_emp',
       'Job_skilled_employee', 'Job_unemp/unskilled-non_resident',
       'Job_unskilled-resident'],
      dtype='object')

Reordering index¶

german_xai = german_xai.reindex(columns=['NumMonths', 'CreditAmount', 'PayBackPercent', 'Gender',
       'ResidenceDuration', 'Age', 'ExistingCredit', 'Dependents', 'Telephone',
       'Foreignworker', 'CurrentAcc_GE200',
       'CurrentAcc_LT200', 'CurrentAcc_None', 'CreditHistory_Delay',
       'CreditHistory_none/paid', 'CreditHistory_other', 'Purpose_CarNew',
       'Purpose_CarUsed', 'Purpose_biz', 'Purpose_domestic app',
       'Purpose_education', 'Purpose_furniture/equip', 'Purpose_others',
       'Purpose_radio/tv', 'Purpose_repairs', 'Purpose_retraining',
       'Savings_GT500', 'Savings_LT500', 'Savings_none', 'EmployDuration_1-4',
       'EmployDuration_4-7', 'EmployDuration_GE7', 'EmployDuration_LT1',
       'EmployDuration_unemployed', 'Debtors_co-applicant',
       'Debtors_guarantor', 'Debtors_none', 'Collateral_car/other',
       'Collateral_real_estate', 'Collateral_savings/life_insurance',
       'Collateral_unknown/none', 'OtherPayBackPlan_bank',
       'OtherPayBackPlan_none', 'OtherPayBackPlan_stores', 'Property_free',
       'Property_own', 'Property_rent',
       'Job_management/self-emp/officer/highly_qualif_emp',
       'Job_skilled_employee', 'Job_unemp/unskilled-non_resident',
       'Job_unskilled-resident','CreditStatus'])
german_xai.head()

Scaling Credit Amount¶

from sklearn.preprocessing import MinMaxScaler #since the field is not normally distributed
scaler = MinMaxScaler()
german_xai[['CreditAmount']]=scaler.fit_transform(german_xai[['CreditAmount']])

german_xai.head()

Writing data to csv file¶

german_xai.to_csv('C:/Users/krish/Downloads/German-encoded_upd.csv', index=False)

Splitting into train and test data ¶

It is desirable to split the dataset into train and test sets in a way that preserves the same proportions of examples in each class as observed in the original dataset.This is called a stratified train-test split.We can achieve this by setting the “stratify” argument to the y component of the original dataset. https://machinelearningmastery.com/train-test-split-for-evaluating-machine-learning-algorithms/

X = german_xai.iloc[:, :-1]
y = german_xai['CreditStatus']
X.head()
y.head()
X_train,X_test,y_train, y_test = train_test_split(X,y, test_size=0.2, random_state=40,stratify=y)

0    1
1    0
2    1
3    1
4    0
Name: CreditStatus, dtype: int64

german_xai.dtypes
german_xai.shape

NumMonths                                              int64
CreditAmount                                         float64
PayBackPercent                                         int64
Gender                                                 int64
ResidenceDuration                                      int64
Age                                                    int64
ExistingCredit                                         int64
Dependents                                             int64
Telephone                                              int64
Foreignworker                                          int64
CurrentAcc_GE200                                       uint8
CurrentAcc_LT200                                       uint8
CurrentAcc_None                                        uint8
CreditHistory_Delay                                    uint8
CreditHistory_none/paid                                uint8
CreditHistory_other                                    uint8
Purpose_CarNew                                         uint8
Purpose_CarUsed                                        uint8
Purpose_biz                                            uint8
Purpose_domestic app                                   uint8
Purpose_education                                      uint8
Purpose_furniture/equip                                uint8
Purpose_others                                         uint8
Purpose_radio/tv                                       uint8
Purpose_repairs                                        uint8
Purpose_retraining                                     uint8
Savings_GT500                                          uint8
Savings_LT500                                          uint8
Savings_none                                           uint8
EmployDuration_1-4                                     uint8
EmployDuration_4-7                                     uint8
EmployDuration_GE7                                     uint8
EmployDuration_LT1                                     uint8
EmployDuration_unemployed                              uint8
Debtors_co-applicant                                   uint8
Debtors_guarantor                                      uint8
Debtors_none                                           uint8
Collateral_car/other                                   uint8
Collateral_real_estate                                 uint8
Collateral_savings/life_insurance                      uint8
Collateral_unknown/none                                uint8
OtherPayBackPlan_bank                                  uint8
OtherPayBackPlan_none                                  uint8
OtherPayBackPlan_stores                                uint8
Property_free                                          uint8
Property_own                                           uint8
Property_rent                                          uint8
Job_management/self-emp/officer/highly_qualif_emp      uint8
Job_skilled_employee                                   uint8
Job_unemp/unskilled-non_resident                       uint8
Job_unskilled-resident                                 uint8
CreditStatus                                           int64
dtype: object

(1000, 52)

import klib
klib.missingval_plot(X)
klib.missingval_plot(y)

No missing values found in the dataset.
No missing values found in the dataset.

Feature Selection ¶

1. Using Mutual info classif ¶

from sklearn.feature_selection import mutual_info_classif
mutual_info=mutual_info_classif(X_train, y_train,random_state=40)
mutual_info

array([0.05678877, 0.02318715, 0.        , 0.00573952, 0.        ,
       0.01872571, 0.0136521 , 0.        , 0.        , 0.0095328 ,
       0.02398797, 0.05296151, 0.06220834, 0.03330972, 0.03190313,
       0.00553199, 0.00157823, 0.01061717, 0.        , 0.        ,
       0.01199827, 0.01585764, 0.01401993, 0.02156122, 0.02378362,
       0.        , 0.0200516 , 0.01483981, 0.0201077 , 0.02075406,
       0.0081356 , 0.00738015, 0.01843426, 0.01352031, 0.01477018,
       0.00101237, 0.00829283, 0.        , 0.00020473, 0.02448042,
       0.        , 0.        , 0.02107073, 0.        , 0.        ,
       0.0046772 , 0.02035891, 0.02148182, 0.        , 0.        ,
       0.        ])

Estimate mutual information for a discrete target variable.

Mutual information (MI) [1] between two random variables is a non-negative value, which measures the dependency between the variables. It is equal to zero if and only if two random variables are independent, and higher values mean higher dependency.

X.columns

Index(['NumMonths', 'CreditAmount', 'PayBackPercent', 'Gender',
       'ResidenceDuration', 'Age', 'ExistingCredit', 'Dependents', 'Telephone',
       'Foreignworker', 'CurrentAcc_GE200', 'CurrentAcc_LT200',
       'CurrentAcc_None', 'CreditHistory_Delay', 'CreditHistory_none/paid',
       'CreditHistory_other', 'Purpose_CarNew', 'Purpose_CarUsed',
       'Purpose_biz', 'Purpose_domestic app', 'Purpose_education',
       'Purpose_furniture/equip', 'Purpose_others', 'Purpose_radio/tv',
       'Purpose_repairs', 'Purpose_retraining', 'Savings_GT500',
       'Savings_LT500', 'Savings_none', 'EmployDuration_1-4',
       'EmployDuration_4-7', 'EmployDuration_GE7', 'EmployDuration_LT1',
       'EmployDuration_unemployed', 'Debtors_co-applicant',
       'Debtors_guarantor', 'Debtors_none', 'Collateral_car/other',
       'Collateral_real_estate', 'Collateral_savings/life_insurance',
       'Collateral_unknown/none', 'OtherPayBackPlan_bank',
       'OtherPayBackPlan_none', 'OtherPayBackPlan_stores', 'Property_free',
       'Property_own', 'Property_rent',
       'Job_management/self-emp/officer/highly_qualif_emp',
       'Job_skilled_employee', 'Job_unemp/unskilled-non_resident',
       'Job_unskilled-resident'],
      dtype='object')

mutual_info=pd.Series(mutual_info)
mutual_info.index=X_train.columns
mutual_info.sort_values(ascending=False)

CurrentAcc_None                                      0.062208
NumMonths                                            0.056789
CurrentAcc_LT200                                     0.052962
CreditHistory_Delay                                  0.033310
CreditHistory_none/paid                              0.031903
Collateral_savings/life_insurance                    0.024480
CurrentAcc_GE200                                     0.023988
Purpose_repairs                                      0.023784
CreditAmount                                         0.023187
Purpose_radio/tv                                     0.021561
Job_management/self-emp/officer/highly_qualif_emp    0.021482
OtherPayBackPlan_none                                0.021071
EmployDuration_1-4                                   0.020754
Property_rent                                        0.020359
Savings_none                                         0.020108
Savings_GT500                                        0.020052
Age                                                  0.018726
EmployDuration_LT1                                   0.018434
Purpose_furniture/equip                              0.015858
Savings_LT500                                        0.014840
Debtors_co-applicant                                 0.014770
Purpose_others                                       0.014020
ExistingCredit                                       0.013652
EmployDuration_unemployed                            0.013520
Purpose_education                                    0.011998
Purpose_CarUsed                                      0.010617
Foreignworker                                        0.009533
Debtors_none                                         0.008293
EmployDuration_4-7                                   0.008136
EmployDuration_GE7                                   0.007380
Gender                                               0.005740
CreditHistory_other                                  0.005532
Property_own                                         0.004677
Purpose_CarNew                                       0.001578
Debtors_guarantor                                    0.001012
Collateral_real_estate                               0.000205
Job_skilled_employee                                 0.000000
OtherPayBackPlan_bank                                0.000000
Property_free                                        0.000000
OtherPayBackPlan_stores                              0.000000
Job_unemp/unskilled-non_resident                     0.000000
Purpose_retraining                                   0.000000
Collateral_unknown/none                              0.000000
Collateral_car/other                                 0.000000
Purpose_domestic app                                 0.000000
Purpose_biz                                          0.000000
Telephone                                            0.000000
Dependents                                           0.000000
ResidenceDuration                                    0.000000
PayBackPercent                                       0.000000
Job_unskilled-resident                               0.000000
dtype: float64

mutual_info.sort_values(ascending=False).plot.bar(figsize=(15,5))

<matplotlib.axes._subplots.AxesSubplot at 0x1e3bfb9b688>

Selecting top 25% features having highest dependencies w.r.to target variable CreditStatus along with protected variables under consideration age, gender, marital status.

mutual_info.sort_values(ascending=False)[0:10]

CurrentAcc_None                      0.062208
NumMonths                            0.056789
CurrentAcc_LT200                     0.052962
CreditHistory_Delay                  0.033310
CreditHistory_none/paid              0.031903
Collateral_savings/life_insurance    0.024480
CurrentAcc_GE200                     0.023988
Purpose_repairs                      0.023784
CreditAmount                         0.023187
Purpose_radio/tv                     0.021561
dtype: float64

german_xai_imp=german_xai[['CurrentAcc_None',
'NumMonths',
'CurrentAcc_LT200',
'CreditHistory_Delay',
'CreditHistory_none/paid',
'Collateral_savings/life_insurance',
'CurrentAcc_GE200',
'Purpose_repairs',
'CreditAmount',
'Purpose_radio/tv',
'Gender','Age','CreditStatus']]
german_xai_imp.head()

german_xai_imp.dtypes

CurrentAcc_None                        uint8
NumMonths                              int64
CurrentAcc_LT200                       uint8
CreditHistory_Delay                    uint8
CreditHistory_none/paid                uint8
Collateral_savings/life_insurance      uint8
CurrentAcc_GE200                       uint8
Purpose_repairs                        uint8
CreditAmount                         float64
Purpose_radio/tv                       uint8
Gender                                 int64
Age                                    int64
CreditStatus                           int64
dtype: object

2. Using correlation ¶

corrMatrix = round(german_xai_imp.corr(),1)
corrMatrix

klib.corr_plot(german_xai_imp,annot=False)

<matplotlib.axes._subplots.AxesSubplot at 0x1e3c0fecac8>

corrMatrix1 = round(german_xai_imp.corr(),1)
corrMatrix1
plt.figure(figsize=(15,15))
sns.heatmap(corrMatrix1, annot=True,cmap="Blues")
plt.show()

<Figure size 1080x1080 with 0 Axes>

<matplotlib.axes._subplots.AxesSubplot at 0x1e3c0e67f48>

german_upd=german_xai_imp.drop(['CurrentAcc_LT200','CreditAmount'],axis=1)
german_upd

corrMatrix2 = round(german_upd.corr(),1)
corrMatrix2
plt.figure(figsize=(15,15))
sns.heatmap(corrMatrix2, annot=True,cmap="Blues")
plt.show()

<Figure size 1080x1080 with 0 Axes>

<matplotlib.axes._subplots.AxesSubplot at 0x1e3c0f39e48>

No higher correlation is observed between input variables (except gender, marital status (0.7) and credit amount, num of months (0.6) and between target variable and input variables. But since we are trying to understand the impact of protected variables let us retain them without dropping.

writing data to csv file ¶

german_upd.to_csv('C:/Users/krish/Downloads/German-reduced_upd.csv', index=False)

List of protected attributes ¶

(https://arxiv.org/pdf/1811.11154.pdf)¶

from IPython.display  import Image
Image(filename='C:/Users/krish/Desktop/MAIN PJT/list of protected variables.png',width=500,height=30)

From the above, we have 3 protected fields in our dataset:

1. Gender
2. Age

Now, let us identify previlege class in each protected attribute.

1.Gender¶

print(german_upd['Gender'].value_counts())
german_upd.groupby(['Gender'])['CreditStatus'].mean()
#https://arxiv.org/pdf/1810.01943.pdf, https://arxiv.org/pdf/2005.12379.pdf

1    690
0    310
Name: Gender, dtype: int64

Gender
0    0.648387
1    0.723188
Name: CreditStatus, dtype: float64

Males(1) are more than females and for males(1) target variable CreditScore is more favorable having higher value for given number of males than female group average. Hence male(1) is privelieged class.

2.Age¶

print(german_upd['Age'].value_counts())
german_upd.groupby(['Age'])['CreditStatus'].mean()

1    810
0    190
Name: Age, dtype: int64

Age
0    0.578947
1    0.728395
Name: CreditStatus, dtype: float64

Age >26: 1; else 0; so ppl above 26 are more and group average of ppl with age >26 is higher than the group of age < 26 ,so age(1) is priveleiged group

german_upd.columns

Index(['CurrentAcc_None', 'NumMonths', 'CreditHistory_Delay',
       'CreditHistory_none/paid', 'Collateral_savings/life_insurance',
       'CurrentAcc_GE200', 'Purpose_repairs', 'Purpose_radio/tv', 'Gender',
       'Age', 'CreditStatus'],
      dtype='object')

	NumMonths	CreditAmount	PayBackPercent	Gender	ResidenceDuration	Age	ExistingCredit	Dependents	Telephone	Foreignworker	CreditStatus
NumMonths	1.00	0.62	0.07	0.08	0.03	0.01	-0.01	-0.02	0.16	0.14	-0.21
CreditAmount	0.62	1.00	-0.27	0.09	0.03	0.05	0.02	0.02	0.28	0.05	-0.15
PayBackPercent	0.07	-0.27	1.00	0.09	0.05	0.06	0.02	-0.07	0.01	0.09	-0.07
Gender	0.08	0.09	0.09	1.00	-0.01	0.25	0.09	0.20	0.08	-0.05	0.08
ResidenceDuration	0.03	0.03	0.05	-0.01	1.00	0.01	0.09	0.04	0.10	0.05	-0.00
Age	0.01	0.05	0.06	0.25	0.01	1.00	0.14	0.17	0.16	-0.05	0.13
ExistingCredit	-0.01	0.02	0.02	0.09	0.09	0.14	1.00	0.11	0.07	0.01	0.05
Dependents	-0.02	0.02	-0.07	0.20	0.04	0.17	0.11	1.00	-0.01	-0.08	0.00
Telephone	0.16	0.28	0.01	0.08	0.10	0.16	0.07	-0.01	1.00	0.11	0.04
Foreignworker	0.14	0.05	0.09	-0.05	0.05	-0.05	0.01	-0.08	0.11	1.00	-0.08
CreditStatus	-0.21	-0.15	-0.07	0.08	-0.00	0.13	0.05	0.00	0.04	-0.08	1.00

	CurrentAcc_None	NumMonths	CurrentAcc_LT200	CreditHistory_Delay	CreditHistory_none/paid	Collateral_savings/life_insurance	CurrentAcc_GE200	Purpose_repairs	CreditAmount	Purpose_radio/tv	Gender	Age	CreditStatus
CurrentAcc_None	1.0	-0.1	-0.9	0.0	-0.2	-0.0	-0.2	-0.0	-0.0	0.1	0.0	0.1	0.3
NumMonths	-0.1	1.0	0.1	0.1	-0.0	-0.1	-0.1	-0.0	0.6	-0.0	0.1	0.0	-0.2
CurrentAcc_LT200	-0.9	0.1	1.0	-0.0	0.2	0.0	-0.3	0.0	0.1	-0.1	-0.0	-0.1	-0.3
CreditHistory_Delay	0.0	0.1	-0.0	1.0	-0.4	0.0	-0.0	0.0	0.1	-0.0	0.1	0.1	-0.0
CreditHistory_none/paid	-0.2	-0.0	0.2	-0.4	1.0	-0.0	0.0	-0.0	-0.0	0.0	-0.1	-0.2	-0.2
Collateral_savings/life_insurance	-0.0	-0.1	0.0	0.0	-0.0	1.0	-0.0	-0.0	-0.0	-0.1	-0.0	-0.0	-0.0
CurrentAcc_GE200	-0.2	-0.1	-0.3	-0.0	0.0	-0.0	1.0	-0.0	-0.1	0.1	-0.0	0.0	0.0
Purpose_repairs	-0.0	-0.0	0.0	0.0	-0.0	-0.0	-0.0	1.0	-0.0	-0.1	0.0	-0.0	-0.0
CreditAmount	-0.0	0.6	0.1	0.1	-0.0	-0.0	-0.1	-0.0	1.0	-0.2	0.1	0.0	-0.2
Purpose_radio/tv	0.1	-0.0	-0.1	-0.0	0.0	-0.1	0.1	-0.1	-0.2	1.0	0.0	-0.1	0.1
Gender	0.0	0.1	-0.0	0.1	-0.1	-0.0	-0.0	0.0	0.1	0.0	1.0	0.3	0.1
Age	0.1	0.0	-0.1	0.1	-0.2	-0.0	0.0	-0.0	0.0	-0.1	0.3	1.0	0.1
CreditStatus	0.3	-0.2	-0.3	-0.0	-0.2	-0.0	0.0	-0.0	-0.2	0.1	0.1	0.1	1.0

	CurrentAcc_None	NumMonths	CurrentAcc_LT200	CreditHistory_Delay	CreditHistory_none/paid	Collateral_savings/life_insurance	CurrentAcc_GE200	Purpose_repairs	CreditAmount	Purpose_radio/tv	Gender	Age	CreditStatus
CurrentAcc_None	1.0	-0.1	-0.9	0.0	-0.2	-0.0	-0.2	-0.0	-0.0	0.1	0.0	0.1	0.3
NumMonths	-0.1	1.0	0.1	0.1	-0.0	-0.1	-0.1	-0.0	0.6	-0.0	0.1	0.0	-0.2
CurrentAcc_LT200	-0.9	0.1	1.0	-0.0	0.2	0.0	-0.3	0.0	0.1	-0.1	-0.0	-0.1	-0.3
CreditHistory_Delay	0.0	0.1	-0.0	1.0	-0.4	0.0	-0.0	0.0	0.1	-0.0	0.1	0.1	-0.0
CreditHistory_none/paid	-0.2	-0.0	0.2	-0.4	1.0	-0.0	0.0	-0.0	-0.0	0.0	-0.1	-0.2	-0.2
Collateral_savings/life_insurance	-0.0	-0.1	0.0	0.0	-0.0	1.0	-0.0	-0.0	-0.0	-0.1	-0.0	-0.0	-0.0
CurrentAcc_GE200	-0.2	-0.1	-0.3	-0.0	0.0	-0.0	1.0	-0.0	-0.1	0.1	-0.0	0.0	0.0
Purpose_repairs	-0.0	-0.0	0.0	0.0	-0.0	-0.0	-0.0	1.0	-0.0	-0.1	0.0	-0.0	-0.0
CreditAmount	-0.0	0.6	0.1	0.1	-0.0	-0.0	-0.1	-0.0	1.0	-0.2	0.1	0.0	-0.2
Purpose_radio/tv	0.1	-0.0	-0.1	-0.0	0.0	-0.1	0.1	-0.1	-0.2	1.0	0.0	-0.1	0.1
Gender	0.0	0.1	-0.0	0.1	-0.1	-0.0	-0.0	0.0	0.1	0.0	1.0	0.3	0.1
Age	0.1	0.0	-0.1	0.1	-0.2	-0.0	0.0	-0.0	0.0	-0.1	0.3	1.0	0.1
CreditStatus	0.3	-0.2	-0.3	-0.0	-0.2	-0.0	0.0	-0.0	-0.2	0.1	0.1	0.1	1.0

	CurrentAcc_None	NumMonths	CreditHistory_Delay	CreditHistory_none/paid	Collateral_savings/life_insurance	CurrentAcc_GE200	Purpose_repairs	Purpose_radio/tv	Gender	Age	CreditStatus
CurrentAcc_None	1.0	-0.1	0.0	-0.2	-0.0	-0.2	-0.0	0.1	0.0	0.1	0.3
NumMonths	-0.1	1.0	0.1	-0.0	-0.1	-0.1	-0.0	-0.0	0.1	0.0	-0.2
CreditHistory_Delay	0.0	0.1	1.0	-0.4	0.0	-0.0	0.0	-0.0	0.1	0.1	-0.0
CreditHistory_none/paid	-0.2	-0.0	-0.4	1.0	-0.0	0.0	-0.0	0.0	-0.1	-0.2	-0.2
Collateral_savings/life_insurance	-0.0	-0.1	0.0	-0.0	1.0	-0.0	-0.0	-0.1	-0.0	-0.0	-0.0
CurrentAcc_GE200	-0.2	-0.1	-0.0	0.0	-0.0	1.0	-0.0	0.1	-0.0	0.0	0.0
Purpose_repairs	-0.0	-0.0	0.0	-0.0	-0.0	-0.0	1.0	-0.1	0.0	-0.0	-0.0
Purpose_radio/tv	0.1	-0.0	-0.0	0.0	-0.1	0.1	-0.1	1.0	0.0	-0.1	0.1
Gender	0.0	0.1	0.1	-0.1	-0.0	-0.0	0.0	0.0	1.0	0.3	0.1
Age	0.1	0.0	0.1	-0.2	-0.0	0.0	-0.0	-0.1	0.3	1.0	0.1
CreditStatus	0.3	-0.2	-0.0	-0.2	-0.0	0.0	-0.0	0.1	0.1	0.1	1.0

	CurrentAcc	NumMonths	CreditHistory	Purpose	CreditAmount	Savings	EmployDuration	PayBackPercent	Gender	Debtors	...	Collateral	Age	OtherPayBackPlan	Property	ExistingCredit	Job	Dependents	Telephone	Foreignworker	CreditStatus
0	A11	6	A34	A43	1169	A65	A75	4	A93	A101	...	A121	67	A143	A152	2	A173	1	A192	A201	1
1	A12	48	A32	A43	5951	A61	A73	2	A92	A101	...	A121	22	A143	A152	1	A173	1	A191	A201	2
2	A14	12	A34	A46	2096	A61	A74	2	A93	A101	...	A121	49	A143	A152	1	A172	2	A191	A201	1
3	A11	42	A32	A42	7882	A61	A74	2	A93	A103	...	A122	45	A143	A153	1	A173	2	A191	A201	1
4	A11	24	A33	A40	4870	A61	A73	3	A93	A101	...	A124	53	A143	A153	2	A173	2	A191	A201	2

	CurrentAcc	NumMonths	CreditHistory	Purpose	CreditAmount	Savings	EmployDuration	PayBackPercent	Gender	Debtors	...	Collateral	Age	OtherPayBackPlan	Property	ExistingCredit	Job	Dependents	Telephone	Foreignworker	CreditStatus
0	LT200	6	other	radio/tv	1169	none	GE7	4	1	none	...	real_estate	1	none	own	2	skilled_employee	1	1	1	1
1	LT200	48	none/paid	radio/tv	5951	LT500	1-4	2	0	none	...	real_estate	0	none	own	1	skilled_employee	1	0	1	0
2	None	12	other	education	2096	LT500	4-7	2	1	none	...	real_estate	1	none	own	1	unskilled-resident	2	0	1	1
3	LT200	42	none/paid	furniture/equip	7882	LT500	4-7	2	1	guarantor	...	savings/life_insurance	1	none	free	1	skilled_employee	2	0	1	1
4	LT200	24	Delay	CarNew	4870	LT500	1-4	3	1	none	...	unknown/none	1	none	free	2	skilled_employee	2	0	1	0

	NumMonths	CreditAmount	PayBackPercent	Gender	ResidenceDuration	Age	ExistingCredit	Dependents	Telephone	Foreignworker	...	OtherPayBackPlan_none	Property_free	Property_own	Job_skilled_employee	Job_unskilled-resident	CreditStatus
0	6	0.050567	4	1	4	1	2	1	1	1	...	1	0	1	1	0	1
1	48	0.313690	2	0	2	0	1	1	0	1	...	1	0	1	1	0	0
2	12	0.101574	2	1	3	1	1	2	0	1	...	1	0	1	0	1	1
3	42	0.419941	2	1	4	1	1	2	0	1	...	1	1	0	1	0	1
4	24	0.254209	3	1	4	1	2	2	0	1	...	1	1	0	1	0	0

	CurrentAcc_None	NumMonths	CreditHistory_Delay	CreditHistory_none/paid	Collateral_savings/life_insurance	CurrentAcc_GE200	Purpose_repairs	Purpose_radio/tv	Gender	Age	CreditStatus
0	0	6	0	0	0	0	0	1	1	1	1
1	0	48	0	1	0	0	0	1	0	0	0
2	1	12	0	0	0	0	0	0	1	1	1
3	0	42	0	1	1	0	0	0	1	1	1
4	0	24	1	0	0	0	0	0	1	1	0
...	...	...	...	...	...	...	...	...	...	...	...
995	1	12	0	1	0	0	0	0	0	1	1
996	0	30	0	1	1	0	0	0	1	1	1
997	1	12	0	1	0	0	0	1	1	1	1
998	0	45	0	1	0	0	0	1	1	0	0
999	0	45	0	0	0	0	0	0	1	1	1

	CurrentAcc_None	NumMonths	CreditHistory_Delay	CreditHistory_none/paid	Collateral_savings/life_insurance	CurrentAcc_GE200	Purpose_repairs	Purpose_radio/tv	Gender	Age	CreditStatus
0	0	6	0	0	0	0	0	1	1	1	1
1	0	48	0	1	0	0	0	1	0	0	0
2	1	12	0	0	0	0	0	0	1	1	1
3	0	42	0	1	1	0	0	0	1	1	1
4	0	24	1	0	0	0	0	0	1	1	0
...	...	...	...	...	...	...	...	...	...	...	...
995	1	12	0	1	0	0	0	0	0	1	1
996	0	30	0	1	1	0	0	0	1	1	1
997	1	12	0	1	0	0	0	1	1	1	1
998	0	45	0	1	0	0	0	1	1	0	0
999	0	45	0	0	0	0	0	0	1	1	1

	CurrentAcc_None	NumMonths	CreditHistory_Delay	CreditHistory_none/paid	Collateral_savings/life_insurance	CurrentAcc_GE200	Purpose_repairs	Purpose_radio/tv	Gender	Age	CreditStatus
0	0	6	0	0	0	0	0	1	1	1	1
1	0	48	0	1	0	0	0	1	0	0	0
2	1	12	0	0	0	0	0	0	1	1	1
3	0	42	0	1	1	0	0	0	1	1	1
4	0	24	1	0	0	0	0	0	1	1	0
...	...	...	...	...	...	...	...	...	...	...	...
995	1	12	0	1	0	0	0	0	0	1	1
996	0	30	0	1	1	0	0	0	1	1	1
997	1	12	0	1	0	0	0	1	1	1	1
998	0	45	0	1	0	0	0	1	1	0	0
999	0	45	0	0	0	0	0	0	1	1	1