[Python] 07. pandas Series 통계/시각화 : idxmax(),idxmin(),nlargest(),nsmallest(),quantile(),cut(),to

728x90

SMALL

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import rc
import matplotlib

rc('font', family='AppleGothic')
plt.rcParams['axes.unicode_minus'] = False

data = {'aa':10,'bb':20,'cc':30,'dd':40,'ee':50}
sr = pd.Series(data, name='국어점수')
sr

[OUT] :

aa    10
bb    20
cc    30
dd    40
ee    50
Name: 국어점수, dtype: int64

sr.idxmax()
# numpy에서는 argmax()이고 pandas에서는 idxmax()

[OUT] :

'ee'

sr.idxmin()

[OUT] :

'aa'

연습문제

# 국어 점수가 45 이하인 데이터 중 가장 큰 값?

# solution

sr[sr<=45].max()

[OUT] :

40

sr.head(2)

[OUT] :

aa    10
bb    20
Name: 국어점수, dtype: int64

sr.tail(2)

[OUT] :

dd    40
ee    50
Name: 국어점수, dtype: int64

sr.nlargest(2) # top을 구해주는 함수 -> 역정렬하고 2개만 가지고오는 일 할필요없음

[OUT] :

ee    50
dd    40
Name: 국어점수, dtype: int64

sr['cc'] = 40

sr

[OUT] :

aa    10
bb    20
cc    40
dd    40
ee    50
Name: 국어점수, dtype: int64

sr.nlargest(2)

[OUT] :

ee    50
cc    40
Name: 국어점수, dtype: int64

sr.nlargest(2,keep='last')

[OUT] :

ee    50
dd    40
Name: 국어점수, dtype: int64

sr.nlargest(2,keep='all')

[OUT] :

ee    50
cc    40
dd    40
Name: 국어점수, dtype: int64

sr.nsmallest(2)

[OUT] :

aa    10
bb    20
Name: 국어점수, dtype: int64

sr.sum()

[OUT] :

160

sr.mean()

[OUT] :

32.0

sr.std()

[OUT] :

16.431676725154983

sr.median()

[OUT] :

40.0

sr.quantile([0.25,0.5,0.75])

[OUT] :

0.25    20.0
0.50    40.0
0.75    40.0
Name: 국어점수, dtype: float64

sr.unique()

[OUT] :

array([10, 20, 40, 50])

sr.value_counts()

[OUT] :

40    2
20    1
10    1
50    1
Name: 국어점수, dtype: int64

def fn(v):
    print('v=',v)
    print('=========')
    if v>30:
        return v+1
    else:
        return v+2
sr.apply(fn)

[OUT] :

v= 10
=========
v= 20
=========
v= 40
=========
v= 40
=========
v= 50
=========
Out[71]:
aa    12
bb    22
cc    41
dd    41
ee    51
Name: 국어점수, dtype: int64

sr.apply(lambda v:v+1 if v>30 else v+2)

[OUT] :

aa    12
bb    22
cc    41
dd    41
ee    51
Name: 국어점수, dtype: int64

연습문제

# 국어점수가 40점 이상이면 합격 아니면 불합격

# solution

sr.apply(lambda v:'합격' if v>=40 else '불합격')

[OUT] :

aa    불합격
bb    불합격
cc     합격
dd     합격
ee     합격
Name: 국어점수, dtype: object

pd.cut(sr,5) # 범위 나누기
# 9.96 < aa <= 18.0
# 18.0 < bb <= 26.0 ...
# ( : 불포함, ] : 포함

[OUT] :

aa    (9.96, 18.0]
bb    (18.0, 26.0]
cc    (34.0, 42.0]
dd    (34.0, 42.0]
ee    (42.0, 50.0]
Name: 국어점수, dtype: category
Categories (5, interval[float64]): [(9.96, 18.0] < (18.0, 26.0] < (26.0, 34.0] < (34.0, 42.0] < (42.0, 50.0]]

 pd.cut(sr,5).value_counts()

[OUT] :

(34.0, 42.0]    2
(42.0, 50.0]    1
(18.0, 26.0]    1
(9.96, 18.0]    1
(26.0, 34.0]    0
Name: 국어점수, dtype: int64

pd.cut(sr,[0,20,40,60])

[OUT] :

aa     (0, 20]
bb     (0, 20]
cc    (20, 40]
dd    (20, 40]
ee    (40, 60]
Name: 국어점수, dtype: category
Categories (3, interval[int64]): [(0, 20] < (20, 40] < (40, 60]]

 pd.cut(sr,[0,20,40,60]).value_counts().sort_index()

[OUT] :

(0, 20]     2
(20, 40]    2
(40, 60]    1
Name: 국어점수, dtype: int64

pd.cut(sr,[0,20,40,60],labels = ['C','B','A'])

[OUT] :

aa    C
bb    C
cc    B
dd    B
ee    A
Name: 국어점수, dtype: category
Categories (3, object): ['C' < 'B' < 'A']

sr.to_csv('a.csv') # csv파일로 내보내기

for n in sr:
    print(n)
# sr.values 즉, 값만 추출

[OUT] :

10
20
40
40
50

for n in sr.index:
    print(n)
# sr.index 즉, 인덱스만 추출

[OUT] :

aa
bb
cc
dd
ee

for n in sr.items():
    print(n)
# tuple로 인덱스, 값 추출

[OUT] :

('aa', 10)
('bb', 20)
('cc', 40)
('dd', 40)
('ee', 50)

for i,v in sr.items():
    print(i,v)
# 언패킹

[OUT] :

aa 10
bb 20
cc 40
dd 40
ee 50

시각화

sr.plot()
plt.show()

sr.plot(kind='bar',figsize=(8,6),title='성적데이터',legend=True,grid=True,ylim=(0,100),rot=45)
plt.show()

sr.plot(kind='barh') # 수평방향 bar차트
plt.show()

sr.plot(kind='hist',bins=5)
plt.show()

sr.hist()
plt.show()

sr.hist(bins=[0,20,40,60])
plt.show()

sr.plot(kind='pie',autopct="%.2f")
plt.show()

sr.plot(kind='box')
plt.show()

review
- unpacking

728x90

LIST

'코딩으로 익히는 Python > Pandas' 카테고리의 다른 글

[Python] 09. pandas DataFrame (0)	2021.01.09
[Python] 08. pandas Series Example : apply(lambda),pd.cut().value_counts().sort_index(),items(),unpacking (1)	2021.01.09
[Python] 06. pandas Series 문자열 데이터 검색 : str.contains(),str.replace(),정규표현식메타기호 예제 (0)	2021.01.06
[Python] 05. pandas Series 추가,수정,삭제,검색,정렬 : loc(),drop(),append(),inplace=True예제 (0)	2021.01.06
[Python] 04. pandas Series 연산 : 산술관계논리(element wise),isin(),between()예제 (0)	2021.01.06

Seize the Data

[Python] 07. pandas Series 통계/시각화 : idxmax(),idxmin(),nlargest(),nsmallest(),quantile(),cut(),to_csv() 예제

'코딩으로 익히는 Python > Pandas' 카테고리의 다른 글

티스토리툴바

[Python] 07. pandas Series 통계/시각화 : idxmax(),idxmin(),nlargest(),nsmallest(),quantile(),cut(),to_csv() 예제

'코딩으로 익히는 Python > Pandas' 카테고리의 다른 글

'코딩으로 익히는 Python/Pandas' Related Articles

티스토리툴바