[Python] 06. pandas Series 문자열 데이터 검색 : str.contains(),str.replace(),정규표현식메타기호 예제

728x90

SMALL

import pandas as pd
import numpy as np

sr = pd.Series(['홍길동','이순신','김철수','김순이',
               '이홍김'] )
sr.index=['aa','bb','cc','dd','ee']
sr

[OUT] :

aa    홍길동
bb    이순신
cc    김철수
dd    김순이
ee    이홍김
dtype: object

# 데이터가 문자열 시리즈객체의 str속성을 이용한다.
# sr.index.str 
# 문자열 index
sr.str # 데이터가 문자열

문자열 인덱싱 슬라이싱

sr.str[0] # 성만 가져오기

[OUT] :

aa    홍
bb    이
cc    김
dd    김
ee    이
dtype: object

sr.str[1:]

[OUT] :

aa    길동
bb    순신
cc    철수
dd    순이
ee    홍김
dtype: object

sr.str[-1]

[OUT] :

aa    동
bb    신
cc    수
dd    이
ee    김
dtype: object

문자열 데이터 검색

sr.str.contains('김')

[OUT] :

aa    False
bb    False
cc     True
dd     True
ee     True
dtype: bool

sr[sr.str.contains('^김')] # '김'으로 시작

[OUT] :

cc    김철수
dd    김순이
dtype: object

sr[sr.str.contains('김$')] # '김'으로 끝

[OUT] :

ee    이홍김
dtype: object

sr[sr.str.contains('[홍이]')] # []:문자의 집합 중 하나

[OUT] :

aa    홍길동
bb    이순신
dd    김순이
ee    이홍김
dtype: object

sr[sr.str.contains('[홍이]순')] # 홍순 Or 이순

[OUT] :

bb    이순신
dtype: object

sr[sr.str.contains('길동|순이')]

[OUT] :

aa    홍길동
dd    김순이
dtype: object

sr.str.replace('김','황')

[OUT] :

aa    홍길동
bb    이순신
cc    황철수
dd    황순이
ee    이홍황
dtype: object

sr.str.replace('^김','황')

[OUT] :

aa    홍길동
bb    이순신
cc    황철수
dd    황순이
ee    이홍김
dtype: object

sr.str.replace('김$','황')

[OUT] :

aa    홍길동
bb    이순신
cc    김철수
dd    김순이
ee    이홍황
dtype: object

sr.str.replace('김[철이]','황')

[OUT] :

aa    홍길동
bb    이순신
cc     황수
dd    김순이
ee    이홍김
dtype: object

sr.index.str.contains('a')

[OUT] :

array([ True, False, False, False, False])

sr[sr.index.str.contains('a')]

[OUT] :

aa    홍길동
dtype: object

review
- 정규표현식 메타 기호 : . ^ $ * + ? { } [ ] \ | ( ) 스파크에서도 많이 쓰이니 꼭 알아두기

728x90

LIST

[Python] 08. pandas Series Example : apply(lambda),pd.cut().value_counts().sort_index(),items(),unpacking (1)	2021.01.09
[Python] 07. pandas Series 통계/시각화 : idxmax(),idxmin(),nlargest(),nsmallest(),quantile(),cut(),to_csv() 예제 (0)	2021.01.09
[Python] 05. pandas Series 추가,수정,삭제,검색,정렬 : loc(),drop(),append(),inplace=True예제 (0)	2021.01.06
[Python] 04. pandas Series 연산 : 산술관계논리(element wise),isin(),between()예제 (0)	2021.01.06
[Python] 03. pandas Series 인덱싱(Indexing),슬라이싱(Slicing) : boolean indexing,loc()과 iloc() 차이 예제 (0)	2021.01.02

Seize the Data