[Python] 23. PCA(차원축소),T-SNE

728x90

SMALL

from sklearn.datasets import load_iris, load_wine
from mpl_toolkits.mplot3d import Axes3D # 3차원 시각화 가능
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
import matplotlib.pyplot as plt
import seaborn as sns

PCA(차원축소)

3차원까지 시각화 가능하나 4차원 이상은 무리 -> 차원축소 이용

iris = load_iris()
irisDF = pd.DataFrame(iris.data,columns=iris.feature_names)
irisDF

color=[]
for n in iris.target:
    if n==0:
        color.append('r')
    elif n==1:
        color.append('g')
    else:
        color.append('b')

fig = plt.figure(figsize = (8,8))
ax = fig.gca(projection='3d' )
ax.scatter(irisDF['sepal length (cm)'],irisDF['sepal width (cm)'],
           irisDF['petal length (cm)'],  alpha=0.5,c=color)
ax.set_xlabel('Sepal lenth')
ax.set_ylabel('Sepal width')
ax.set_zlabel('Petal length')
plt.show()

model_pipe = make_pipeline(StandardScaler(),PCA())
model_pipe.fit(irisDF)

[OUT] :

Pipeline(memory=None,
         steps=[('standardscaler',
                 StandardScaler(copy=True, with_mean=True, with_std=True)),
                ('pca',
                 PCA(copy=True, iterated_power='auto', n_components=None,
                     random_state=None, svd_solver='auto', tol=0.0,
                     whiten=False))],
         verbose=False)

model = PCA(n_components=2) # default는 자기 차원 그대로
pcaf = model.fit_transform(irisDF)
pcaf.shape

[OUT] :

(150, 2)

# plt.scatter사용
plt.scatter(pcaf[:,0],pcaf[:,1],c=color)
plt.plot()

# dataframe으로 만든 후 seaborn사용
pcaDF = pd.DataFrame(pcaf)
pcaDF.columns = ['x','y']
pcaDF['target'] = iris.target
pcaDF

sns.lmplot(x='x',y='y',data=pcaDF,fit_reg=False,hue='target',scatter_kws={'s':150})
plt.show()

model = PCA(n_components=1)
pca1f = model.fit_transform(irisDF)
pca1f.shape # 1차원

[OUT] :

(150, 1)

xf = pca1f[:,0]
yf = len(xf)*[0]
plt.scatter( xf, yf , c=color)
plt.show()

연습문제

wine 데이터셋을 2차원으로 차원축소 후 분류를 확인하시오

Solution

wine = load_wine()

wineDF = pd.DataFrame(wine.data,columns=wine.feature_names)
wineDF

model_pipe = make_pipeline(StandardScaler(),PCA(n_components=2))
pcaf = model_pipe.fit_transform(wineDF)

# plt.scatter사용
plt.scatter(pcaf[:,0],pcaf[:,1],c=color)
plt.plot()

# dataframe으로 만든 후 seaborn사용
pcaDF = pd.DataFrame(pcaf)
pcaDF.columns = ['x','y']
pcaDF['target'] = wine.target
pcaDF

sns.lmplot(x='x',y='y',data=pcaDF,fit_reg=False,hue='target',scatter_kws={'s':150})
plt.show()

번외

# 만약 model을 다시 원래대로(차원축소 이전으로) 돌리고 싶다면
original_model = model_pipe.inverse_transform(pcaf)
original_model.shape # 원래 값

[OUT] :

(178, 13)

pcaf.shape # 차원축소 값

[OUT] :

(178, 2)

wineDF.values[0] # 원래 값

[OUT] :

array([1.423e+01, 1.710e+00, 2.430e+00, 1.560e+01, 1.270e+02, 2.800e+00,
       3.060e+00, 2.800e-01, 2.290e+00, 5.640e+00, 1.040e+00, 3.920e+00,
       1.065e+03])

model_pipe.transform([wineDF.values[0]]) # 차원축소 값

[OUT] :

array([[ 3.31675081, -1.44346263]])

결론

차원축소를 통해 시각화 가능, 피쳐가 너무 많아 트레이닝 어려울 때 차원축소 후 트레이닝 가능

728x90

LIST

'코딩으로 익히는 Python > 모델링' 카테고리의 다른 글

[Python] 24. 시계열 예측 (0)	2021.02.01
[Python] 22. Kmeans (0)	2021.01.26
[Python] 21. SVM(서포트벡터머신) (0)	2021.01.26
[Python] 20. 나이브베이즈 (0)	2021.01.26
[Python] 19. MLP : pima-indians 예제 (0)	2021.01.26

Seize the Data

[Python] 23. PCA(차원축소),T-SNE

'코딩으로 익히는 Python > 모델링' 카테고리의 다른 글

티스토리툴바

[Python] 23. PCA(차원축소),T-SNE

'코딩으로 익히는 Python > 모델링' 카테고리의 다른 글

'코딩으로 익히는 Python/모델링' Related Articles

티스토리툴바