Lec 09-3: Dropout

4 minute read

Deep_Learning study Lec09-3

Boostcourse의 ‘파이토치로 시작하는 딥러닝 기초’를 통한 공부와 정리 Post

Goal of Study

드롭아웃(Dropout) 에 대해 알아본다.

Keyword

과최적화(Overfitting)

드롭아웃(Dropout)

1. 강의 요약

Overfitting

계속해서 데이터를 잘 대변하는 모델을 찾는것이 우리의 목적이다.

사람에 따라 어떤게 잘 fitting 된 것인지는 주관적인 차이가 있겠지만, 기계가 이를 수치적으로 판단할 수 있는데,
이에 따라, Underfitting됬는지 Overfitting 됬는지를 판단할 수 있다.

Underfitting의 경우에는 당연하게 데이터를 잘 대변하기 위한 학습이 덜 되거나, 너무 낮은 차원의 모델을 사용한다는 문제가 있을 수 있다.

Overfitting의 경우에는 너무 학습 데이터에 대해서만 fitting이 되어있고, 너무 높은 차원의 모델을 사용했다는 문제가 있을 수 있다.

왜 Overfitting이 문젠지 알아보자.

만약 Classifiacation에 관한 문제에서 Overfitting을 할 경우 Train중에는 Accuracy가 100%에 도달하겠지만,
이를 Test할 적에는 오히려 오분류 되거나 정확하지 않는 것을 확인할 수 있다.

그 이유는 Train-set과 Test-Set이 완벽하게 일치하지 않기 때문이고, 이러한 경우가 일반적이다.

그렇기에 Overfitting, 과최적화된 모델은 오히려 Error율을 증가시킬 수 있다는 문제점을 지니고 있다.

Overfitting에 대한 해결책으로 Training data를 늘리거나, 특징을 줄이거나, 정규화를 한다 등 다양한 방법이 있다.

그 중, Dropout이라는 방식에 대해 알아보자.

Dropout

드롭아웃은 학습 과정에서 신경망의 일부를 사용하지 않는 방법이다.

드롭아웃의 비율을 정해주는만큼 학습과정 마다 랜덤으로 그 비율만큼의 뉴런을 사용하지 않고, 나머지 뉴런만을 사용한다.

만약 드롭아웃의 비율을 0.5로 한다면 절반은 사용하고, 나머지는 사용하지 않게 된다.
(학습 과정(step)마다 인 것에 유의한다.)

이를 사용하게 되면 실질적으로 Overfitting을 방지할 수 있는 효과를 얻게 되며 성능의 향상이 일어난다.

학습 시에 인공 신경망이 특정 뉴런 또는 특정 조합에 너무 의존적이게 되는 것을 방지해주며
매번 다른 형태의 Network 구조들이 다른 결과를 내기 때문에, 서로 다른 신경망들을 사용하는 것 같은 효과를 내어 Overfitting을 방지한다.

mnist_nn_dropout

이전 포스팅_Xavier,deep에서와의 차이점을 먼저 확인하고 코드를 보기를 권장한다.

Dropout을 쓰기 위해 아래와 같은 과정이 추가되었다.
특히, Dropout의 비율을 정해주었고, dropout은 train할 때만 사용되기 때문에

Train & eval 모드를 함수를 통해 train시에만 dropout이 True이도록 하였다.

#추가 과정(... 은 중략)
#dropout 비율
drop_prob = 0.3

...

relu = torch.nn.ReLU()
dropout = torch.nn.Dropout(p=drop_prob)

...

# model
model = torch.nn.Sequential(linear1, relu, dropout,
                            linear2, relu, dropout,
                            linear3, relu, dropout,
                            linear4, relu, dropout,
                            linear5).to(device)

...

total_batch = len(data_loader)
model.train()    # set the model to train mode (dropout=True)
for epoch in range(training_epochs):
    avg_cost = 0

...

with torch.no_grad():
    model.eval()    # set the model to evaluation mode (dropout=False)

Full code

import torch
import torchvision.datasets as dsets
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import random

USE_CUDA = torch.cuda.is_available() # GPU를 사용가능하면 True, 아니라면 False를 리턴
device = torch.device("cuda" if USE_CUDA else "cpu") # GPU 사용 가능하면 사용하고 아니면 CPU 사용
print("다음 기기로 학습합니다:", device)

# for reproducibility
random.seed(777)
torch.manual_seed(777)
if device == 'cuda':
    torch.cuda.manual_seed_all(777)

다음 기기로 학습합니다: cpu

# parameters
learning_rate = 0.001
training_epochs = 15
batch_size = 100
drop_prob = 0.3

# MNIST dataset
mnist_train = dsets.MNIST(root='E:\DeepLearningStudy',
                          train=True,
                          transform=transforms.ToTensor(),
                          download=True)

mnist_test = dsets.MNIST(root='E:\DeepLearningStudy',
                         train=False,
                         transform=transforms.ToTensor(),
                         download=True)

# dataset loader
data_loader = torch.utils.data.DataLoader(dataset=mnist_train,
                                          batch_size=batch_size,
                                          shuffle=True,
                                          drop_last=True)

C:\Users\LeeJaeWon\anaconda3\envs\DeepLearningStudy\lib\site-packages\torchvision\datasets\mnist.py:498: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at  ..\torch\csrc\utils\tensor_numpy.cpp:180.)
  return torch.from_numpy(parsed.astype(m[2], copy=False)).view(*s)

# nn layers
linear1 = torch.nn.Linear(784, 512, bias=True)
linear2 = torch.nn.Linear(512, 512, bias=True)
linear3 = torch.nn.Linear(512, 512, bias=True)
linear4 = torch.nn.Linear(512, 512, bias=True)
linear5 = torch.nn.Linear(512, 10, bias=True)
relu = torch.nn.ReLU()
dropout = torch.nn.Dropout(p = drop_prob)

# xavier initialization
torch.nn.init.xavier_uniform_(linear1.weight)
torch.nn.init.xavier_uniform_(linear2.weight)
torch.nn.init.xavier_uniform_(linear3.weight)
torch.nn.init.xavier_uniform_(linear4.weight)
torch.nn.init.xavier_uniform_(linear5.weight)

Parameter containing:
tensor([[-0.0565,  0.0423, -0.0155,  ...,  0.1012,  0.0459, -0.0191],
        [ 0.0772,  0.0452, -0.0638,  ...,  0.0476, -0.0638,  0.0528],
        [ 0.0311, -0.1023, -0.0701,  ...,  0.0412, -0.1004,  0.0738],
        ...,
        [ 0.0334,  0.0187, -0.1021,  ...,  0.0280, -0.0583, -0.1018],
        [-0.0506, -0.0939, -0.0467,  ..., -0.0554, -0.0325,  0.0640],
        [-0.0183, -0.0123,  0.1025,  ..., -0.0214,  0.0220, -0.0741]],
       requires_grad=True)

# model
model = torch.nn.Sequential(linear1, relu, dropout,
                            linear2, relu, dropout,
                            linear3, relu, dropout,
                            linear4, relu, dropout,
                            linear5).to(device)

# define cost/loss & optimizer
criterion = torch.nn.CrossEntropyLoss().to(device)    # Softmax is internally computed.
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

total_batch = len(data_loader)
model.train()    # set the model to train mode (dropout=True)
for epoch in range(training_epochs):
    avg_cost = 0
    total_batch = len(data_loader)

    for X, Y in data_loader:
        # reshape input image into [batch_size by 784]
        # label is not one-hot encoded
        X = X.view(-1, 28 * 28).to(device)
        Y = Y.to(device)

        optimizer.zero_grad()
        hypothesis = model(X)
        cost = criterion(hypothesis, Y)
        cost.backward()
        optimizer.step()

        avg_cost += cost / total_batch

    print('Epoch:', '%04d' % (epoch + 1), 'cost =', '{:.9f}'.format(avg_cost))

print('Learning finished')

Epoch: 0001 cost = 0.311706781
Epoch: 0002 cost = 0.146726266
Epoch: 0003 cost = 0.112394512
Epoch: 0004 cost = 0.092713356
Epoch: 0005 cost = 0.084646218
Epoch: 0006 cost = 0.072477818
Epoch: 0007 cost = 0.067062818
Epoch: 0008 cost = 0.060241420
Epoch: 0009 cost = 0.060738977
Epoch: 0010 cost = 0.054180898
Epoch: 0011 cost = 0.050488908
Epoch: 0012 cost = 0.049583405
Epoch: 0013 cost = 0.047045611
Epoch: 0014 cost = 0.045497715
Epoch: 0015 cost = 0.045529798
Learning finished

# Test the model using test sets
with torch.no_grad():
    model.eval()    # set the model to evaluation mode (dropout=False)
    
    X_test = mnist_test.test_data.view(-1, 28 * 28).float().to(device)
    Y_test = mnist_test.test_labels.to(device)

    prediction = model(X_test)
    correct_prediction = torch.argmax(prediction, 1) == Y_test
    accuracy = correct_prediction.float().mean()
    print('Accuracy:', accuracy.item())

    # Get one and predict
    r = random.randint(0, len(mnist_test) - 1)
    X_single_data = mnist_test.test_data[r:r + 1].view(-1, 28 * 28).float().to(device)
    Y_single_data = mnist_test.test_labels[r:r + 1].to(device)

    print('Label: ', Y_single_data.item())
    single_prediction = model(X_single_data)
    print('Prediction: ', torch.argmax(single_prediction, 1).item())
    
    plt.imshow(mnist_test.test_data[r:r + 1].view(28, 28), cmap='Greys', interpolation='nearest')
    plt.show()

Accuracy: 0.9797999858856201
Label:  5
Prediction:  5

📣
본 포스팅의 학습 환경 : Anaconda, CPU, Pytorch, Jupyter Notebook
포스팅에서 오류나 궁금한 점은 Comments를 작성해주시면, 많은 도움이 됩니다.💡

Share on

Twitter Facebook LinkedIn

Lee Jaewon

Lec 09-3: Dropout

Boostcourse의 ‘파이토치로 시작하는 딥러닝 기초’를 통한 공부와 정리 Post

Goal of Study

Keyword

1. 강의 요약

Overfitting

Dropout

mnist_nn_dropout

Full code

Share on

Leave a comment

You may also enjoy

[Tips] Ubuntu(Linux) screen 사용법

[Tips] github.io에 여러 github.io theme 불러오기

[논문 리뷰] TRIPS: Trilinear Point Splatting for Real-Time Radiance Field Rendering

[논문 리뷰] STD: Stable Triangle Descriptor for 3D place recognition