본문 바로가기

728x90
반응형
SMALL

데이터전처리

(13)
[Python] 23. PCA(차원축소),T-SNE from sklearn.datasets import load_iris, load_wine from mpl_toolkits.mplot3d import Axes3D # 3차원 시각화 가능 import matplotlib.pyplot as plt import pandas as pd import numpy as np from sklearn.decomposition import PCA from sklearn.preprocessing import StandardScaler from sklearn.pipeline import make_pipeline import matplotlib.pyplot as plt import seaborn as sns PCA(차원축소) 3차원까지 시각화 가능하나 4차원 이상은 무리 -> 차..
[Python] 22. Kmeans from sklearn.cluster import KMeans import numpy as np import pandas as pd import seaborn as sns import matplotlib.pyplot as plt import mglearn from sklearn.datasets import load_iris from sklearn.model_selection import GridSearchCV import warnings warnings.simplefilter('ignore') Kmeans mglearn.plots.plot_kmeans_algorithm() # 랜덤하게 점 찍은 후 가까운 점들 분류하고 센터로 이동 분류 이동 ... 데이터 불러오기 (pd.read_csv) python 파..
[Python] 11. softmax import numpy as np def fn(x): print(x/x.sum()) a = np.array([2.0,1.0,0.1]) fn(a) [OUT]: [0.64516129 0.32258065 0.03225806] -> 전체 합에서 차지하는 비율 Softmax def softmax(x): e = np.exp(x) print(e) print( e/np.sum(e)) a = np.array([2.0,1.0,0.1]) softmax(a) [OUT]: [7.3890561 2.71828183 1.10517092] [0.65900114 0.24243297 0.09856589] -> e^x를 하여 확률이 높은곳에 가중치를 더 주는 형식 review - 다중분류 시 사용되는 softmax
[Python] 7. Sigmoid 함수 import numpy as np import pandas as pd import matplotlib.pyplot as plt import math print(2**2) print(2**3) print(2**(-2)) print(2**(-3)) [OUT]: 4 8 0.25 0.125 math.e [OUT]: 2.718281828459045 Sigmoid (시그모이드 함수) x=0: 0.5이상의 값을 뱉음 def sigmoid(z): return 1/(1+math.e**(-z)) print(sigmoid(-100)) print(sigmoid(-10)) print(sigmoid(-1)) print(sigmoid(0)) print(sigmoid(1)) print(sigmoid(10)) print(sigmoid..
[Python] 5. 문자열encoding : LabelEncoder, OneHotEncoder, get_dummies(), make_column_transformer 예제 import pandas as pd import numpy as np import seaborn as sns from sklearn.datasets import load_boston, load_iris from sklearn.linear_model import Ridge,Lasso,ElasticNet,LinearRegression from sklearn.preprocessing import PolynomialFeatures from sklearn.pipeline import make_pipeline from sklearn.preprocessing import StandardScaler from sklearn.neural_network import MLPRegressor from sklearn.model_..
[Python] 4. 다중선형회귀 : 릿지L2규제, 라쏘L1규제, 엘라스틱넷 import pandas as pd import numpy as np import seaborn as sns from sklearn.datasets import load_boston, load_iris from sklearn.linear_model import Ridge,Lasso,ElasticNet,LinearRegression from sklearn.preprocessing import PolynomialFeatures from sklearn.pipeline import make_pipeline from sklearn.preprocessing import StandardScaler from sklearn.neural_network import MLPRegressor from sklearn.model_..
[Python] 3. 다중선형회귀 import pandas as pd import numpy as np import seaborn as sns from sklearn.datasets import load_boston, load_iris from sklearn.datasets import fetch_california_housing from sklearn.linear_model import LinearRegression,Ridge, SGDRegressor from sklearn.neural_network import MLPRegressor from sklearn.model_selection import train_test_split from sklearn.model_selection import GridSearchCV from sklear..
[Python] 2. 대표값, 산포도, 정규화, 도수 import numpy as np import pandas as pd import matplotlib.pyplot as plt from scipy.stats import mode 데이터 불러오기 (pd.read_csv) python 파일 경로에 data3 폴더 만든 후 다음의 ch2_scores_em.csv파일 넣어놓기 대표값 평균값 중앙값 최빈값 df = pd.read_csv('data3/ch2_scores_em.csv', index_col='student number') df.head() scores = df['english'].values scores [OUT] : array([42, 69, 56, 41, 57, 48, 65, 49, 65, 58, 70, 47, 51, 64, 62, 70, 71, ..
[Python] 23. pandas DataFrame MultiIndex column : levels, get_level_values(),pd.MultiIndex.from_product()예제 import pandas as pd import numpy as np import matplotlib.pyplot as plt from matplotlib import rc import matplotlib rc('font', family='AppleGothic') plt.rcParams['axes.unicode_minus'] = False data = np.random.randint(1,10, size=4) data array([2, 9, 5, 7]) data = np.random.randint(30,101, size=(4,6) ) data array([[94, 42, 91, 67, 54, 94], [87, 30, 83, 31, 92, 65], [83, 43, 99, 43, 37, 59], [82, 88..
[Python] 13. pandas DataFrame 추가 수정 삭제 검색 정렬 : dropna(subset=[]),fillna(),isna(),sort_values(),sort_index()예제 import pandas as pd import numpy as np data = {'eng':[10,30,50,70], 'kor':[20,40,60,80], 'math':[90,50,20,70]} df = pd.DataFrame(data, index=['a','b','c','d'] ) df 추가 컬럼 추가 df['my1'] =[1,2,3,4] #컬럼이 있으면 수정, 없으면 추가 df['my2'] = df['kor'] + df['eng'] df 행 추가 data = {'eng':[10,30,50,70], 'kor':[20,40,60,80], 'math':[90,50,20,70]} df = pd.DataFrame(data, index=['a','b','c','d'] ) df df.loc['e'] =[1,2..
[Python] 10. pandas DataFrame 속성 : ndim, shape, len(), size, T, index, keys(), columns, values, dtypes, info() 예제 import pandas as pd import numpy as np data = {'eng':[10,30,50,70], 'kor':[20,40,60,80], 'math':[90,50,20,70]} df = pd.DataFrame(data, index=['a','b','c','d'] ) df df.ndim #몇차원 [OUT] : 2 df.shape #(행, 열) [OUT] : (4, 3) df.shape[0] #행의 갯수 [OUT] : 4 len(df) #행의 갯수 [OUT] : 4 df.size #데이터의 갯수 [OUT] : 4 df.T df.index [OUT] : Index(['a', 'b', 'c', 'd'], dtype='object') df.keys() [OUT] : Index(['eng', ..
[Python] 09. pandas DataFrame import pandas as pd import numpy as np d1 = [[1,2],[3,4],[5,6]] d2 = [(1,2),(3,4),(5,6)] d3 = [{'kor':1,'eng':2}, {'kor':3,'eng':4}, {'kor':5,'eng':6}, ] d4 = {'kor':[1,3,5],'eng':[2,4,6]} df1 = pd.DataFrame(d1, index=['a','b','c'],columns=['eng','kor']) df1 df2 = pd.DataFrame(d2) df2 df3 = pd.DataFrame(d3) df3 df4 = pd.DataFrame(d4) df4 review - DataFrame 만들기
[Python] 02. numpy 속성 변환 함수 : astype(),reshape(),dtype예제 import numpy as np numpy.array의 핵심 속성 array() dtype size shape ndim .T 핵심 속성 예제 #1 arr = np.array([11,22,33,44,55]) # arr이라는 array 생성 arr [OUT] : array([11, 22, 33, 44, 55]) arr.dtype [OUT] : dtype('int64') arr.size [OUT] : 5 arr.shape [OUT] : (5,) arr.ndim [OUT] : 1 arr.T # arr == arr.T [OUT] : array([11, 22, 33, 44, 55]) 핵심속성 예제 #2 arr1 = np.array([[11,22],[33,44],[55,66]]) # arr1이라는 array 생성 ..

728x90
반응형
LIST