데이터 시각화(Data visualization)

데이터 시각화는 가장 낮은 수준의 분석이지만, 잘 사용한다면 복잡한 분석보다도 더 효율적이라고 한다.

빅데이터 분석, EDA를 할 때는 시각화는 필수이다.

* 필요 개념

1) 모집단(population)과 표본집단(sample)

- population : 분석 목표에 해당하는 '전체 대상'

- sample : 모집단에서 추출된, 모집단의 부분 집합

표본집단으로부터 모집단의 특성을 추정하기 위해 통계학적 기법들을 사용함.

1) 단일 변수에 대한 분포를 확인 할 때

- histogram: 도수분포표 시각화

# Histogram
# setosa - petal length
df[df['variety'] == 'Setosa']['petal.length'].plot.hist()
plt.show()

- density plot(KDE: Kernel Density Estimation) : 커널 함수를 사용해 데이터 분포로 부터 해당 확률 변수의 확률 밀도 함수(PDF)를 추정하는 비모수적(non-parametic)방법

# Density plot # pandas 기우시안 커널 사용
df[df['variety'] == 'Setosa']['petal.length'].plot.density()
plt.show()

>>> histogram + density plot

ax = df[df['variety'] == 'Setosa']['petal.length'].plot.hist(density=True)
df[df['variety'] == 'Setosa']['petal.length'].plot.density(ax=ax)
plt.show()

- box plot

# Box plot
# df.boxplot()

# 한가지 attribute에 대해
# df.boxplot(column=['sepal.length'], by='variety', figsize=(12, 8))
# plt.show()

df.boxplot(by='variety', figsize=(12, 8))
plt.show()

>>> box plot 예시 그림

- violin plot

# Violin plot
fig, ax = plt.subplots(2, 2, figsize=(12, 8))

for i, col in enumerate(['petal.length', 'petal.width', 'sepal.length', 'sepal.width']):
    sns.violinplot(data=df, x='variety', y=col, ax=ax[i//2][i%2]) # 0 0 / 0 1 / 1 0 / 1 1

plt.show()

>>>> violin plot 예시 그림

저작자표시 비영리 변경금지

'Work > 데이터분석' 카테고리의 다른 글

타이타닉 데이터 분석 (0)	2022.05.12
데이터 분석엔 어떤 것을 공부하는게 유리할까? (0)	2022.05.11

그냥 청년이 살아가는 이야기

데이터 시각화(Data visualization)

'Work > 데이터분석' 카테고리의 다른 글

티스토리툴바

데이터 시각화(Data visualization)

'Work > 데이터분석' 카테고리의 다른 글

관련글

티스토리툴바