인공지능 공부/Fandas

2021-04-20 판다스 원 - 핫인코딩 실습

원핫인코딩 = 범주형 데이터를 모형이 인식할 수 있도로 숫자형 변환
onehot_sex = pd.get_dummies(ndf['sex'])
ndf = pd.concat([ndf, onehot_sex], axis = 1)

onehot_embarked = pd.get_dummies(ndf['embarked'], prefix = 'town')
ndf = pd.concat([ndf, onehot_embarked], axis = 1)

ndf.drop(['sex', 'embarked'], axis=1, inplace=True)
print(ndf.head())
   survived  pclass   age  sibsp  parch  female  male  town_C  town_Q  town_S
0         0       3  22.0      1      0       0     1       0       0       1
1         1       1  38.0      1      0       1     0       1       0       0
2         1       3  26.0      0      0       1     0       0       0       1
3         1       1  35.0      1      0       1     0       0       0       1
4         0       3  35.0      0      0       0     1       0       0       1
데이터셋 구분 - 훈련용/검증용
X=ndf[['pclass', 'age','sibsp','parch', 'female', 'male', 'town_C', 'town_Q', 'town_S']]
y=ndf['survived']

from sklearn import preprocessing
X = preprocessing.StandardScaler().fit(X).transform(X)

#train data와 test data로 구분(30%)

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.3, random_state=10)
print('train data 개수', X_train.shape)
print('test data 개수', X_test.shape)
train data 개수 (499, 9)
test data 개수 (215, 9)
KNN 분류모형 가져오기
from sklearn.neighbors import KNeighborsClassifier

knn= KNeighborsClassifier(n_neighbors=5)

knn.fit(X_train, y_train)
y_hat = knn.predict(X_test)

print(y_hat[0:10])
print(y_test.values[0:10])
[0 0 1 0 0 1 1 1 0 0]
[0 0 1 0 0 1 1 1 0 0]
from sklearn import metrics
knn_matrix = metrics.confusion_matrix(y_test, y_hat)
print(knn_matrix)
[[109  16]
 [ 25  65]]
knn_report = metrics.classification_report(y_test, y_hat)
print(knn_report)
              precision    recall  f1-score   support

           0       0.81      0.87      0.84       125
           1       0.80      0.72      0.76        90

    accuracy                           0.81       215
   macro avg       0.81      0.80      0.80       215
weighted avg       0.81      0.81      0.81       215