디시전트리기반 코드 실습 ::: Sevity Blog

디시전트리기반 코드 실습

2023. 10. 16. 15:31

아래 코드는 여기서 확인가능하다.

data는 sklearn에서 제공하는 wine data를 사용(178개 밖에 안되긴 한다)

import pandas as pd
from sklearn.datasets import load_wine
# 와인 데이터셋 로드
wine = load_wine(as_frame=True)
df = wine.data
# data 첫 5행 출력
print(df.head())
df = wine.target
# 정답 레이블 첫 5행 출력
print(df.head())


//출력
   alcohol  malic_acid   ash  alcalinity_of_ash  magnesium  total_phenols  flavanoids  nonflavanoid_phenols  proanthocyanins  color_intensity   hue  od280/od315_of_diluted_wines  proline
0    14.23        1.71  2.43               15.6      127.0           2.80        3.06                  0.28             2.29             5.64  1.04                          3.92   1065.0
1    13.20        1.78  2.14               11.2      100.0           2.65        2.76                  0.26             1.28             4.38  1.05                          3.40   1050.0
2    13.16        2.36  2.67               18.6      101.0           2.80        3.24                  0.30             2.81             5.68  1.03                          3.17   1185.0
3    14.37        1.95  2.50               16.8      113.0           3.85        3.49                  0.24             2.18             7.80  0.86                          3.45   1480.0
4    13.24        2.59  2.87               21.0      118.0           2.80        2.69                  0.39             1.82             4.32  1.04                          2.93    735.0
0    0
1    0
2    0
3    0
4    0
Name: target, dtype: int64

정답레이블은 0, 1, 2로 서로다른 와인 재배자를 뜻함

다음 코드를 통해 간단히 디시전트리, 랜덤포레스트, xgboost의 성능을 비교(교차검증 사용)

from sklearn.datasets import load_wine
from sklearn.model_selection import cross_val_score
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
import xgboost as xgb
import numpy as np

# 데이터 로딩
data = load_wine()
X, y = data.data, data.target

# 디시전 트리 모델 생성
dt = DecisionTreeClassifier()

# 랜덤 포레스트 모델 생성
rf = RandomForestClassifier()

# XGBoost 모델 생성
xg_cls = xgb.XGBClassifier()

# 교차 검증 수행 (5-fold CV)
cv_scores_dt = cross_val_score(dt, X, y, cv=5)
cv_scores_rf = cross_val_score(rf, X, y, cv=5)
cv_scores_xgb = cross_val_score(xg_cls, X, y, cv=5)

# 평균 정확도 출력
print(f'Decision Tree CV Accuracy: {np.mean(cv_scores_dt):.2f}')
print(f'Random Forest CV Accuracy: {np.mean(cv_scores_rf):.2f}')
print(f'XGBoost CV Accuracy: {np.mean(cv_scores_xgb):.2f}')

출력

$ python wine_test.py
Decision Tree CV Accuracy: 0.87
Random Forest CV Accuracy: 0.97
XGBoost CV Accuracy: 0.95

저작자표시

'AI, ML > ML' 카테고리의 다른 글

dense feature vs sparse feature (0)	2024.01.07
binning (0)	2023.12.28
그레디언트 부스팅 (Gradient Boosting) (0)	2023.10.15
랜덤 포레스트(random forest) (1)	2023.10.15
윈도우 환경에서 ML환경 구축 (0)	2022.03.09

Sevity Blog

디시전트리기반 코드 실습

'AI, ML > ML' 카테고리의 다른 글

+ Recent posts

티스토리툴바