[Deep Learning] ex03_활성화함수, 최적화함수, callback함수

Notice

Recent Posts

Recent Comments

Link

« 2025/06 »
일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Tags more

Archives

Today

Total

관리 메뉴

DeseoDeSeo

[Deep Learning] ex03_활성화함수, 최적화함수, callback함수 본문

Deep Learning

[Deep Learning] ex03_활성화함수, 최적화함수, callback함수

deseodeseo 2023. 10. 16. 15:09

➤ 경사 하강법(Gradient Descent Algorithm)

비용함수가 최소가 되는 w값(= 기울기가 가장 작을 때)
전체 데이터를 이용해 업데이트( 오차를 구하는데 시간, 비용 多확률적 경사하강법(Stochastic Gradient Descent)
확률적으로 선택된 일부 데이터를 이용해 업데이트

➤ Batch_size

-일반적으로 pc메모리의 한계 및 속도 저하 때문에 대부분 한번의 epoch에 모든 데이터를 한꺼번에 집어넣기 어려움.

○ batch_size를 줄인 경우

메모리 속도가 적음( 저사용 일 경우)
학습 속도가 느림, 정확도 up

○ batch_size를 높인 경우

메모리 속도가 큼
학습 속도가 큼, 정확도 down#일반적으로 디폴트 값은 32이며, 일반적으로 32, 64 사용 多

➤모멘텀(momentum)

경사하강법에 관성을 적용해 업데이트! 현재 batch뿐만 아니라 이전 batch데이터의 학습 결과도 반영.
특징: 가중치를 수정하기 전 이전 방향을 참고하여 업데이트
지그재그 형태로 이동하는 현상이 줄어듬. -네스테로프 모멘텀(Nesterov Accelrated Momentom)
기존의 모멘텀 방식으로 먼저 더한 다음 해당 방향으로 이동한다고 가정하고 기울기를 계산해본뒤 실제 업데이트에 반영( 불필요한 이동을 줄일 수 있다.)

➤ 에이다 그래드(Adaptive Gradient)

학습률 감소 방법을 적용해 업데이트 (멀리 있을 때는 보폭이 크게, 가까이 올때는 세부 조정을 통해서 보폭이 작게)
➜ 처음에는 크게 학습하다가 조금씩 작게 학습.

⛤보폭=학습률

학습을 진행하면서 학습률을 점차 줄여가는 방법.

-스텝(= Learning late?!)

-Adam(가장 최근에 사용하는 최적화(optimizer)) -

목표

활성화함수와 경사하강법 최적의 조합을 확인해보자.
모델링에 도움이 되는 callback함수(모델저장, 조기학습 중단)을 알아보자!

데이터로딩

from tensorflow.keras.datasets import mnist # 손글씨 데이터 불러오기
# 데이터 분리
(X_train, y_train),(X_test, y_test) = mnist.load_data()
# 크기확인
(X_train.shape, y_train.shape),(X_test.shape, y_test.shape)

➤활성화함수와 경사하강법 조합에 따른 성능비교

sigmoid + SGD 조합
relu + SGD 조합
relu + Adam 조합

# 라이브러리 불러오기import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, InputLayer, Flatten
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.optimizers import SGD, Adam

1. sigmoid + SGD조합

# 1) 신경망 설계
# 뼈대
model1=Sequential()
# 입력층
# 사진 데이터(2차원 -> 1차원)
model1.add(Flatten())
# 중간층(5층,64,128,256,128,64)
model1.add(Dense(units=64,activation='sigmoid'))
model1.add(Dense(units=128,activation='sigmoid'))
model1.add(Dense(units=256,activation='sigmoid'))
model1.add(Dense(units=128,activation='sigmoid'))
model1.add(Dense(units=64,activation='sigmoid'))

# 출력층
model1.add(Dense(units=10, activation='softmax'))

#2) 학습방법및 평가방법 설정
model1.compile(loss = 'sparse_categorical_crossentropy',
               optimizer = SGD(learning_rate = 0.01), #SGD 기본학습률: 0.01
               metrics = ['accuracy'])
# 3) 학습 # epochs = 20
h1=model1.fit(X_train,y_train, epochs=20, validation_split=0.2, batch_size=128)

2. relu + SGD모델

# 1) 신경망 설계
# 뼈대
model2=Sequential()
# 입력층
# 사진 데이터(2차원 -> 1차원)
model2.add(Flatten())
# 중간층(5층,64,128,256,128,64)
model2.add(Dense(units=64,activation='relu'))
model2.add(Dense(units=128,activation='relu'))
model2.add(Dense(units=256,activation='relu'))
model2.add(Dense(units=128,activation='relu'))
model2.add(Dense(units=64,activation='relu'))

# 출력층
model2.add(Dense(units=10, activation='softmax'))

#2) 학습방법및 평가방법 설정
model2.compile(loss = 'sparse_categorical_crossentropy',
               optimizer = SGD(learning_rate = 0.01), #SGD 기본학습률: 0.01
               metrics = ['accuracy'])
               
# 3) 학습 # epochs = 20
h2=model2.fit(X_train,y_train, epochs=20, validation_split=0.2, batch_size=128)
# 정확도가 첫번째보다 확 상승함.

3. adam

# 1) 신경망 설계
# 뼈대
model3=Sequential()
# 입력층
# 사진 데이터(2차원 -> 1차원)
model3.add(Flatten())
# 중간층(5층,64,128,256,128,64)
model3.add(Dense(units=64,activation='relu'))
model3.add(Dense(units=128,activation='relu'))
model3.add(Dense(units=256,activation='relu'))
model3.add(Dense(units=128,activation='relu'))
model3.add(Dense(units=64,activation='relu'))

# 출력층
model3.add(Dense(units=10, activation='softmax'))

#2) 학습방법및 평가방법 설정
model3.compile(loss = 'sparse_categorical_crossentropy',
               optimizer = Adam(learning_rate = 0.001), #SGD 기본학습률: 0.01
               metrics = ['accuracy'])
               
# 3) 학습 # epochs = 20
h=model3.fit(X_train,y_train, epochs=20, validation_split=0.2, batch_size=128)
# 두번째는 비 효율적이기에 adam이 가장 적절함.

from matplotlib.pyplot as plt
plt.figure(figsize=(15,5))
# sigmoid + SGD 조합
plt.plot(h1.history['accuracy'], label="sigmoid+SGD train acc")
plt.plot(h1.history['val_accuracy'], label="sigmoid+SGD validation acc")
# relu + SGD 조합
plt.plot(h2.history['accuracy'], label="relu+SGD train acc")
plt.plot(h2.history['val_accuracy'], label="relu+SGD validation acc")
# relu + Adam 조합
plt.plot(h3.history['accuracy'], label="relu+Adam train acc")
plt.plot(h3.history['val_accuracy'], label="relu+Adam validation acc")

plt.legend()
plt.show()

callback함수

모델 저장 및 조기학습 중단
모델 저장(ModelheckPoint)
- 딥러닝 모델 학습시 지정된 epoch를 다 끝내면 과대적합 有
- 중간에 일반화된 모델을 저장할 수 있는 기능임.
조기학습 중단(EarlyStopping)
- epoch를 크게 설정할 경우, 일정횟수 이상으로는 모델의 성능이 개선되지 않는 경우가 있다-> 시간낭비 -> 모델의 성능이 개선되지 않는 경우에는 조기중단이 필요

from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping
#모델 중간 저장
#모델 중간 멈춤

# 모델 저장
# 저장될 경로 작성
model_path='/content/drive/MyDrive/Colab Notebooks/DeepLearning/data/digit_model/dm_{epoch:02d}_{val_accuracy:0.2f}.hdf5'
mckp=ModelCheckpoint(filepath=model_path, #저장경로
                verbose=1,      # 로그출력 -> 1: 로그출력, 0 ->
                save_best_only=True, #모델 성능이 최고점을 갱신할 때마다 저장
                monitor='val_accuracy') #최고점의 기준치.

#조기학습 중단
early=EarlyStopping(monitor='val_accuracy', # 기준치
                    verbose=1, # 로그출력
                    patience=10) # 모델성능개선을 기다리는 최대횟수

# 3번째 조합으로 모델링

# 1) 신경망 설계
# 뼈대
model3=Sequential()
# 입력층
# 사진 데이터(2차원 -> 1차원)
model3.add(Flatten())
# 중간층(5층,64,128,256,128,64)
model3.add(Dense(units=64,activation='relu'))
model3.add(Dense(units=128,activation='relu'))
model3.add(Dense(units=256,activation='relu'))
model3.add(Dense(units=128,activation='relu'))
model3.add(Dense(units=64,activation='relu'))

# 출력층
model3.add(Dense(units=10, activation='softmax'))

#2) 학습방법및 평가방법 설정
model3.compile(loss = 'sparse_categorical_crossentropy',
               optimizer = Adam(learning_rate = 0.001), #SGD 기본학습률: 0.01
               metrics = ['accuracy'])

# 3) 학습 # epochs = 20
h=model3.fit(X_train,y_train, epochs=1000, validation_split=0.2, batch_size=128, callbacks=[mckp, early])

가장 우수한 모델 불러오기

from tensorflow.keras.models import load_model

# 이미 학습된 모델
best_model = load_model('/content/drive/MyDrive/Colab Notebooks/DeepLearning/data/digit_model/dm_15_0.97.hdf5')

저작자표시 비영리 변경금지 (새창열림)

'Deep Learning' 카테고리의 다른 글

[Deep Learning] ex04_개, 고양이 분류하기 (0)	2023.09.25
[Deep Learning] 그동안 학습 정리 및 CNN (0)	2023.09.25
[Deep Learning] ex02_손글씨 데이터 분류(다중 분류) (1)	2023.09.21
[Deep Learning] ex01_유방암데이터분류( 이진분류 ) (0)	2023.09.19
[Deep Learning] 퍼셉트론 (0)	2023.09.19

'Deep Learning' Related Articles

DeseoDeSeo

[Deep Learning] ex03_활성화함수, 최적화함수, callback함수 본문

[Deep Learning] ex03_활성화함수, 최적화함수, callback함수

➤ Batch_size

➤모멘텀(momentum)

➤ 에이다 그래드(Adaptive Gradient)

목표

➤활성화함수와 경사하강법 조합에 따른 성능비교

callback함수

'Deep Learning' 카테고리의 다른 글

티스토리툴바