728x90
반응형
SMALL
from sklearn.datasets import make_classification, load_iris
from sklearn.cluster import KMeans
import numpy as np
import pandas as pd
import seaborn as sb
import matplotlib.pyplot as plt
import mglearn
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import GridSearchCV
import warnings
warnings.simplefilter('ignore')
X,y = mglearn.datasets.make_forge()
x_train, x_test, y_train, y_test = train_test_split( X,y,
test_size=0.2) #defalult 75% 25% ,7:3
KNN분류
mglearn.discrete_scatter( X[:,0],X[:,1],y)
plt.legend( ['0class','1class'])
plt.title('KNN Example')
plt.xlabel('x')
plt.ylabel('y')
plt.show()
mglearn.plots.plot_knn_classification(n_neighbors=1)
mglearn.plots.plot_knn_classification(n_neighbors=3)
model_knn = KNeighborsClassifier(n_neighbors=1)
model_knn.fit(x_train,y_train)
[OUT]:
KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
metric_params=None, n_jobs=None, n_neighbors=1, p=2,
weights='uniform')
x_test
[OUT]:
array([[ 8.7337095 , 2.49162431],
[ 8.68937095, 1.48709629],
[ 8.92229526, -0.63993225],
[ 8.69289001, 1.54322016],
[ 8.67494727, 4.47573059],
[ 9.15072323, 5.49832246]])
model_knn.predict(x_test)
[OUT]:
array([0, 0, 0, 0, 1, 1])
y_test
[OUT]:
array([0, 0, 0, 0, 1, 1])
model_knn.score(x_test,y_test)
[OUT]:
1.0
gridsearch 이용하여 최적 n찾기
param_value = {'n_neighbors':[1,2,3,4,5]}
gridSearch = GridSearchCV(KNeighborsClassifier(),param_grid=param_value)
gridSearch.fit(x_train,y_train)
[OUT]:
GridSearchCV(cv=None, error_score=nan,
estimator=KNeighborsClassifier(algorithm='auto', leaf_size=30,
metric='minkowski',
metric_params=None, n_jobs=None,
n_neighbors=5, p=2,
weights='uniform'),
iid='deprecated', n_jobs=None,
param_grid={'n_neighbors': [1, 2, 3, 4, 5]},
pre_dispatch='2*n_jobs', refit=True, return_train_score=False,
scoring=None, verbose=0)
gridSearch.best_params_
[OUT]:
{'n_neighbors': 2}
gridSearch.best_score_
[OUT]:
0.9
gridSearch.best_estimator_.predict(x_test)
[OUT]:
array([0, 0, 0, 0, 0, 1])
gridSearch.predict(x_test)
[OUT]:
array([0, 0, 0, 0, 0, 1])
연습문제
iris 데이터셋을 KNN을 이용하여 분류하시오
Solution
iris = load_iris()
iris_df = pd.DataFrame(iris.data)
iris_df.columns = iris['feature_names']
iris_df['specis'] = iris.target
iris_df
x_data = iris_df.iloc[:,:-1]
y_data = iris_df.iloc[:,-1]
x_train, x_test, y_train, y_test = train_test_split(x_data, y_data, test_size=0.2,
random_state=42,stratify=y_data)
param_value = {'n_neighbors':range(10)} # 보통 3~5개
knn_model = GridSearchCV(KNeighborsClassifier(),param_grid=param_value) # CV none : 5번 default
knn_model.fit(x_train,y_train)
[OUT]:
GridSearchCV(cv=None, error_score=nan,
estimator=KNeighborsClassifier(algorithm='auto', leaf_size=30,
metric='minkowski',
metric_params=None, n_jobs=None,
n_neighbors=5, p=2,
weights='uniform'),
iid='deprecated', n_jobs=None,
param_grid={'n_neighbors': range(0, 10)}, pre_dispatch='2*n_jobs',
refit=True, return_train_score=False, scoring=None, verbose=0)
print("best parameter :",knn_model.best_params_)
print("best score :",knn_model.best_score_)
print("predict :",knn_model.predict(x_test).tolist())
print("y_test :",y_test.tolist())
[OUT]:
best parameter : {'n_neighbors': 6}
best score : 0.9833333333333334
predict : [0, 2, 1, 1, 0, 1, 0, 0, 2, 1, 2, 2, 2, 1, 0, 0, 0, 1, 1, 1, 0, 2, 1, 2, 2, 1, 1, 0, 2, 0]
y_test : [0, 2, 1, 1, 0, 1, 0, 0, 2, 1, 2, 2, 2, 1, 0, 0, 0, 1, 1, 2, 0, 2, 1, 2, 2, 1, 1, 0, 2, 0]
번외문제
예측값과 실제값의 차이(틀린 개수)를 구하시오
Solution
x = knn_model.predict(x_test).tolist()
y = y_test.tolist()
cnt = 0
for i in range(len(x)):
if x[i]!=y[i]:
cnt+=1
print(i,x[i],y[i])
print('틀린 개수 :',cnt)
[OUT]:
19 1 2
틀린 개수 : 1
review
- iris data set는 정규화를 굳이 안해도 되는 정제된 데이터지만 보통은 정규화 필수
728x90
반응형
LIST
'코딩으로 익히는 Python > 모델링' 카테고리의 다른 글
[Python] 15. 이미지분류 : mnist, MLPClassifier (0) | 2021.01.24 |
---|---|
[Python] 14. NN : XOR문제, MLPClassifier (0) | 2021.01.22 |
[Python] 12. 다중분류 (0) | 2021.01.21 |
[Python] 11. softmax (4) | 2021.01.21 |
[Python] 10. confusion matrix : precision,recall,f1,ROC (2) | 2021.01.21 |