Notice

Recent Posts

Recent Comments

Link

« 2025/07 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

대학원 일기

모델 저장과 콜백 본문

AI/인공지능 기초

모델 저장과 콜백

대학원생(노예) 2023. 12. 6. 10:38

5. 모델 저장과 콜백

학습시킨 모델을 저장하고 관리하는 것은 모델 관리, 더 나아가 MLOps(데이터 수집부터 모델 학습, 서비스 배포까지를 포함하는 시스템)의 시작점

5-2 MNIST 딥러닝 모델 예제

딥러닝을 처음 배우는 사람들이 만나게 되는 데이터셋 중 하나인 손으로 쓴 숫자들로 이루어진 이미지 데이터셋
아주 오래된 고전 데이터셋으로서 기계 학습 분야의 학습 및 테스트에 널리 사용
keras.datasets에 기본으로 포함

데이터 로드 및 전처리

MNIST 데이터셋을 로드하기 위해서 케라스의 데이터셋에 내장되어 있는 tensorflow.keras.datasets.mnist를 임포트합니다.

train_test_split() 함수를 이용해 학습용 데이터인 x_train_full와 y_train_full를 나누어서 70%는 학습용 데이터인 x_train와 y_train으로 사용하고, 30%는 검증 데이터인 x_val와 y_val로 사용

# 데이터 로드 및 학습용 데이터/검증용 데이터 분할
from tensorflow.keras.datasets import mnist
from sklearn.model_selection import train_test_split

(x_train_full, y_train_full), (x_test, y_test) = mnist.load_data(path='mnist.npz')

x_train, x_val, y_train, y_val = train_test_split(x_train_full, y_train_full,
                                                  test_size=0.3,
                                                  random_state=123)

print(f"전체 학습 데이터: {x_train_full.shape}  레이블: {y_train_full.shape}") # 전체 학습 데이터: (60000, 28, 28)  레이블: (60000,)
print(f"학습용 데이터: {x_train.shape}  레이블: {y_train.shape}") # 학습용 데이터: (42000, 28, 28)  레이블: (42000,)
print(f"검증용 데이터: {x_val.shape}  레이블: {y_val.shape}") # 검증용 데이터: (18000, 28, 28)  레이블: (18000,)
print(f"테스트용 데이터: {x_test.shape}  레이블: {y_test.shape}") # 테스트용 데이터:(10000, 28, 28)  레이블: (10000,)


# 실제 데이터 관찰
import numpy as np
import matplotlib.pyplot as plt
plt.style.use("seaborn-white")

num_sample = 6
random_idxs = np.random.randint(60000, size=num_sample)

plt.figure(figsize=(15, 3))
for i, idx in enumerate(random_idxs):
  img = x_train_full[idx, :]
  label = y_train_full[idx]

  plt.subplot(1, len(random_idxs), i+1)
  plt.axis('off')
  plt.title(f'Index: {idx}, Label: {label}')
  plt.imshow(img)


# x_train, x_val 그리고 x_test의 값들을 255로 나누어줌.
# 이미지의 픽셀이 표현하는 값의 범위가 0 ~ 255이기 때문에 최대값인 255로 나누어주면, 
# 값이 0 ~ 1 사이의 범위로 스케일링되어 학습에 용이
x_train = x_train / 255.
x_val = x_val / 255.
x_test = x_test / 255.


# y_train, y_val, 그리고 y_test의 경우에는 0부터 9까지의 숫자를 나타내는 레이블이기 때문에 총 10개의 값을 가지는 범주형. 
# 따라서 utils.to_categorical을 이용해 원-핫 인코딩을 수행. 
# 즉 정답에는 1을, 나머지의 값은 0을 부여.
y_train = utils.to_categorical(y_train)
y_val = utils.to_categorical(y_val)
y_test = utils.to_categorical(y_test)

모델 구성

딥러닝 모델 구성을 위해서 Sequential() 함수를 이용해 순차적으로 레이어를 구성

model = models.Sequential()
model.add(keras.Input(shape=(28, 28), name='input'))
model.add(layers.Flatten(input_shape=[28, 28], name='flatten'))
model.add(layers.Dense(100, activation='relu', name='dense1'))
model.add(layers.Dense(64, activation='relu', name='dense2'))
model.add(layers.Dense(32, activation='relu', name='dense3'))
model.add(layers.Dense(10, activation='softmax', name='output'))
model.summary()

# 생성한 모델의 구성을 그림을 통해 확인
utils.plot_model(model)

# 모델의 구성을 각 레이어의 모양까지 함께 그림
utils.plot_model(model, show_shapes=True)

모델 컴파일 및 학습

손실 함수로다중 분류로 클래스가 원-핫 인코딩 방식으로 되어 있을 때 사용하는 categorical_crossentropy를 지정
옵티마이저는 가장 기본적인 sgd을 지정
분류에 대한 성능을 확인하기 위한 지표로 accuracy를 지정

model.compile(loss='categorical_crossentropy',
              optimizer='sgd',
              metrics=['accuracy'])

딥러닝 모델을 학습하기 위해 fit을 실행

학습에 사용되는 데이터로 x_train과 y_train을 지정
학습을 반복하는 에폭 수(epochs)는 40으로 지정
배치 사이즈(batch_size)는 128로 지정
검증을 위해서 나누었던 데이터로 x_val와 y_val을 지정
모델이 학습을 진행하면서 각 에폭마다 지표 결과들을 history 로 저장

history = model.fit(x_train, y_train,
                    epochs=40,
                    batch_size=128,
                    validation_data=(x_val, y_val))
                    
# 저장된 형태를 보면 loss, accuracy, val_loss, val_accuracy로 구분되는걸 알 수 있음.
history.history.keys() 
# output: dict_keys(['loss', 'accuracy', 'val_loss', 'val_accuracy'])

학습 과정 시각화

history 에 저장된 결과 값들이 에폭이 진행되면서 변화되는 추이를 살펴보기 위함
첫 번째 차트에서는 loss 와 val_loss 를 함께 보여주고, 두 번째 차트에서는 accuracy 와 val_accuracy 를 함께 보여줍니다.

history_dict = history.history

loss = history_dict['loss']
val_loss = history_dict['val_loss']

epochs = range(1, len(loss) + 1)
fig = plt.figure(figsize=(12, 5))

ax1 = fig.add_subplot(1, 2, 1)
ax1.plot(epochs, loss, color='blue', label='train_loss')
ax1.plot(epochs, val_loss, color='red', label='val_loss')
ax1.set_title('Train and Validation Loss')
ax1.set_xlabel('Epochs')
ax1.set_ylabel('Loss')
ax1.grid()
ax1.legend()

accuracy = history_dict['accuracy']
val_accuracy = history_dict['val_accuracy']

ax2 = fig.add_subplot(1, 2, 2)
ax2.plot(epochs, accuracy, color='blue', label='train_accuracy')
ax2.plot(epochs, val_accuracy, color='red', label='val_accuracy')
ax2.set_title('Train and Validation Accuracy')
ax2.set_xlabel('Epochs')
ax2.set_ylabel('Accuracy')
ax2.grid()
ax2.legend()

plt.show()

모델 평가 및 예측

학습된 모델의 평가를 위해서 x_test와 y_test를 대상으로 evaluate() 함수를 동작

model.evaluate(x_test, y_test)

학습된 모델의 예측을 위해 x_test를 넣고, 그에 대한 예측 결과를 받아서 0번째 결과만 확인을 해보면, 최종 레이어인 유닛 수 10개의 softmax를 활성화 함수로 통과한 결과인 것을 알 수 있습니다.

pred_ys = model.predict(x_test)
print(pred_ys.shape) # (10000, 10)
print(pred_ys[0]) # [3.00171465e-04 2.77048821e-06 4.52486769e-04 4.60873212e-04 5.71262547e-07 9.82058964e-07 1.96415506e-09 9.98618841e-01 5.32661688e-05 1.10005836e-04]

결과값 중에서 가장 큰 수가 있는 위치값을 np.argmax()를 통해서 구하고, 예측한 레이블 값 arg_pred_y[0]과 실제 숫자 이미지인 x_test[0]를 출력

arg_pred_y = np.argmax(pred_ys, axis=1)

plt.title(f'Predicted label: {arg_pred_y[0]}')
plt.imshow(x_test[0])
plt.show()

모델이 제대로 분류를 수행했는지 알기 위해서 사이킷런(Scikit-Learn)에 포함된 classification_report를 임포트해서 각종 지표들을 한눈에 살펴봅니다.

from sklearn.metrics import classification_report
print(classification_report(np.argmax(y_test, axis=-1), np.argmax(pred_ys, axis=-1)))

각 레이블 별로 분류가 얼마나 잘 수행되었는지 한눈에 살펴볼 수 있도록 혼동 행렬(Confusion Matrix)을 이용해 시각화

from sklearn.metrics import confusion_matrix
import seaborn as sns
sns.set(style='white')

plt.figure(figsize=(8, 8))
cm = confusion_matrix(np.argmax(y_test, axis=1), np.argmax(pred_ys, axis=-1))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.xlabel('Predicted Label')
plt.ylabel('True Label')
plt.show()

5-3 모델 저장과 로드

모델을 열심히 학습시키고 저장을 안한다면 다시 처음부터 학습을 해야합니다. 즉, 어느 정도 학습이 되었다면 추후에 모델을 사용하기 위해서 저장할 필요가 있습니다. 모델을 저장할 때는 save() 함수를 사용하여 쉽게 저장할 수 있고, 다시 모델을 불러올 때는 load_model() 함수를 사용하면 됩니다.

Sequencial API 또는 Functional API를 사용한 경우에는 모델의 저장 및 로드가 가능하다.
Subclassing API 방식은 사용할 수가 없다.

Sequencial API 또는 Functional API의 저장 방법

model.save() -> 모델 저장
model.to_json() -> json형식으로 모델을 저장
model.to_yaml() -> yaml형식으로 모델을 저장

Subclassing API의 저장 방법 - 파라미터를 저장

save_weights()
load_weights()

model.save() 코드

저장

model.save('mnist_model.h5')

로드

loaded_model = models.load_model('mnist_model.h5')
loaded_model.summary()

로드한 모델로 예측

pred_ys2 = loaded_model.predict(x_test)
print(pred_ys2.shape)
print(pred_ys2[0])

model.to_json() 코드

model_json = model.to_json()

# json으로 저장한 모델을 읽어옵니다.
loaded_model = keras.models.model_from_json(model_json)

predictions = loaded_model.predict(x_test)

print(predictions.shape)
print(predictions[0])

추가 가중치 load해서 사용하는 방법

model.save_weights('mnist_model.h5')

model.load_weights('mnist_model.h5')

# 가중치를 load 하는 경우에는 compile하는 과정이 필요하다.
model.compile(optimizer='adm', loss='categorical_crossentropy', metrics=['accuracy'])

5-4 콜백(Callbacks)

모델을 fit() 함수를 통해 학습시키는 동안 callbacks 매개변수를 사용하여 학습 시작이나 끝에 호출할 객체 리스트를 여러 개 지정할 수 있습니다.

콜백의 대표적인 예로는 ModelCheckpoint, EarlyStopping, LearningRateScheduler, Tensorboard가 있습니다.

Import

from tensorflow.keras import callbacks

ModelCheckpoint

정기적으로 모델의 체크포인트를 저장하고, 문제가 발생할 때 복구하는데 사용합니다.

check_point_cb = callbacks.ModelCheckpoint('keras_mnist_model.h5', save_best_only=True)
history = model.fit(x_train, y_train, epochs=10,
                    validation_data=(x_val, y_val),
                    callbacks=[check_point_cb])

EarlyStopping

검증 성능이 한동안 개선되지 않을 경우 학습을 중단할 때 사용합니다.

일정 patience 동안 검증 세트에 대한 점수가 오르지 않으면 학습을 멈추게 됩니다. 모델이 향상되지 않으면 학습이 자동으로 중지되므로, epochs 숫자를 크게 해도 무방합니다. 학습이 끝난 후의 최상의 가중치를 복원하기 때문에 모델을 따로 복원할 필요가 없습니다.

check_point_cb = callbacks.ModelCheckpoint('keras_mnist_model.h5', save_best_only=True)
early_stopping_cb = callbacks.EarlyStopping(patience=3, monitor='val_loss',
                                  restore_best_weights=True)
history = model.fit(x_train, y_train, epochs=10,
                    validation_data=(x_val, y_val),
                    callbacks=[check_point_cb, early_stopping_cb])

LearningRateScheduler

최적화를 하는 동안 학습률(learning_rate)를 동적으로 변경할 때 사용합니다.

def scheduler(epoch, learning_rate):
# 에폭 수가 10 미만일 경우는 학습률을 그대로 하고, 10 이상이 되면 -0.1%씩 감소시키는 코드입니다.
if epoch < 10:
  return learning_rate
else:
  return learning_rate * tf.math.exp(-0.1)

round(model.optimizer.lr.numpy(), 5)

lr_scheduler_cb = callbacks.LearningRateScheduler(scheduler)
history = model.fit(x_train, y_train, epochs=15,
                    callbacks=[lr_scheduler_cb], verbose=0)
round(model.optimizer.lr.numpy(), 5)

Tensorboard

모델의 경과를 모니터링할 때 사용합니다.

텐서보드를 이용하여 학습과정을 모니터링하기 위해서는 logs 폴더를 만들고, 학습이 진행되는 동안 로그 파일을 생성합니다. 텐서보드에는 효율적인 모니터링을 위해서 여러가지 기능들을 제공하고 있습니다.

log_dir = './logs'
tensor_board_cb = [callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1, write_graph=True, write_images=True)]
model.fit(x_train, y_train, batch_size=32, validation_data=(x_val, y_val),
          epochs=30, callbacks=tensor_board_cb)

%load_ext tensorboard

%tensorboard --logdir {log_dir}

5-5 마무리

callback 적용 Code

# 기존에 사용했던 mnist의 데이터들을 활용해서 모델에 checkpoint를 만들고 Earlystopping을 설정해주세요.
check_point_cb = callbacks.ModelCheckpoint('keras_mnist_model.h5', save_best_only=True)
early_stopping_cb = callbacks.EarlyStopping(patience=3, monitor='val_loss',
                                  restore_best_weights=True)

history = model.fit(x_train, y_train, epochs=10,
                    validation_data=(x_val, y_val),
                    callbacks=[check_point_cb, early_stopping_cb])

'AI > 인공지능 기초' 카테고리의 다른 글

케라스 창시장에게 배우는 딥러닝 3장 (0)	2023.12.08
케라스 창시자에게 배우는 딥러닝 2장 (1)	2023.12.06
딥러닝 모델 학습 (0)	2023.12.06
딥러닝 구조와 모델 (0)	2023.12.06
텐서 (Tensor) (0)	2023.12.06

'AI/인공지능 기초' Related Articles

Comments