KT 특강 2일차. 연습문제. heatmap, StandardScaler, RandomForestClassifier, 딥러닝

2023. 7. 11. 11:05자격증/KT-AICE Associate

반응형

Object Detection.

 

https://deepbaksuvision.github.io/Modu_ObjectDetection/posts/01_00_What_is_Object_Detection.html

 

01. Object Detection 이란? · GitBook

No results matching ""

deepbaksuvision.github.io

 

여기 사이트에서 computer vision dataset.

 

https://public.roboflow.com/object-detection/hard-hat-workers

 

Hard Hat Workers Object Detection Dataset

Download 7035 free images labeled with bounding boxes for object detection.

public.roboflow.com

이것이 대신 공부하게 해준다.

 

 

https://github.com/ultralytics/yolov5

 

GitHub - ultralytics/yolov5: YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite

YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite. Contribute to ultralytics/yolov5 development by creating an account on GitHub.

github.com

#판다스 파일 읽기.
df = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/인공지능 개발자 과정/kt특강/2일차/유사기출문제/국민건강보험공단_건강검진정보_20211229.xls', encoding='cp949')

#heamap 삽입. corr은 상관 관계 확인. annot=True는 숫자를 보여줌.
sns.heatmap(df.corr(), annot=True)

#barplot
sns.barplot(x="연령대 코드(5세단위)", y="신장(5Cm단위)", data=df)

#컬럼 조회.
df.columns

#info를 하면 Non-Null을 알수 있고, 데이터 타입을 알 수 있다.
df.info()

# fillna(0) null값을 0으로 채움.
df = df.fillna(0)

#치아 관련 데이터 삭제.
df = df.drop(['구강검진 수검여부', '치아우식증유무', '치석', '데이터 공개일자'], axis=1)

#데이터 전리하려면 이거 쓰셈.
#df.T

#원핫 인코딩 변환.
df_dummies=pd.get_dummies(df[['성별코드', '연령대 코드(5세단위)']], drop_first=True)



#학습시키기.

y = df.음주여부

X = df.drop(['음주여부'], axis=1)

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size= 0.3, random_state=10)


#스케일링 적용.
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
scaler.fit(X_train)

X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)


#랜덤포레스트로 학습시키기.
from sklearn.ensemble import RandomForestClassifier
rfc = RandomForestClassifier(n_estimators=30, max_features=3, max_depth=7, random_state=21)
rfc.fit(X_train, y_train)

#스코어 값 내기.
rfc.score(X_train,y_train)


#성능 평가.

from sklearn.metrics import confusion_matrix, classification_report

y_pred = rfc.predict(X_test)
conf_mat = confusion_matrix(y_test, y_pred)

sns.heatmap(conf_mat, annot=True)


print(f"classification_report: {classification_report(y_test, y_pred)}")
#딥러닝 모델을 만들기
#히든레이어 X개 이상
#dropout 설정 필요
#Early Stopping Model Checkpoint 적용

import tensorflow as tf


from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation, Dropout
from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping

model = Sequential()
model.add(Dense(128, activation = 'relu', input_shape=(X_train.shape[1],)))
model.add(Dropout(0.5))
model.add(Dense(128, activation = 'relu'))
model.add(Dropout(0.5))
model.add(Dense(64, activation = 'relu'))
model.add(Dropout(0.5))
model.add(Dense(32, activation = 'relu'))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))


model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['acc'])
mc = ModelCheckpoint(filepath='best.h5', verbose=1, monitor='val_loss', mode='auto', save_best_only=True)
es = EarlyStopping(monitor='val_loss', mode='auto', verbose=1, patience=5)


history= model.fit(X_train,y_train, epochs=50, 
                   validation_data=(X_test,y_test),
                #    validation_split=0.1,
                    verbose=1,
                    callbacks=[es,mc]
                   )
                   
print(history['acc'])

print(history.history['acc'])


plt.figure(figsize=(10,5))
plt.plot(history.history['acc'], 'red', label="acc")
plt.plot(history.history['val_acc'], 'blue', label='val_acc')

plt.title("타이틀이에요")
plt.xlabel('학습회수')
plt.ylabel('정확도')
plt.rc('font', family='Malgun Gothic') #한글 폰트 지정
plt.show()
728x90