Lec 10-5: Advanced CNN - VGG

8 minute read

Deep_Learning study Lec10-5

Boostcourse의 ‘파이토치로 시작하는 딥러닝 기초’를 통한 공부와 정리 Post

Goal of Study

유명한 CNN 모델들에 대해 알아본다.

Keyword

VGG

CNN

1. 강의 요약

VGG

이전 포스팅의 각종 Advanced CNN 소개에 이어서 VGG NetWork를 만들어보도록 하겠다.

VGGnet_layer

기본적인 형태는 다음과 같다.

모두 3x3 Conv, stride = 1, padding = 1로 이루어져 있어, 구조가 어려운 편은 아니다.(위의 사진은 VGG16)

기초적으로 다뤄야할 사항은 다음과 같다.

vgg11 ~ vgg19 까지 만들 수 있도록 되어있음

3x224x224 입력을 기준으로 만들도록 되어 있음(RGB)

input size가 다른 경우 VGG를 적용하려면 어떻게 해야 할까?

Layer를 먼저 다뤄보자.

import torch.nn as nn
import torch.utils.model_zoo as model_zoo

__all__ = [
    'VGG', 'vgg11', 'vgg11_bn', 'vgg13', 'vgg13_bn', 'vgg16', 'vgg16_bn',
    'vgg19_bn', 'vgg19',
]


model_urls = {
    'vgg11': 'https://download.pytorch.org/models/vgg11-bbd30ac9.pth',
    'vgg13': 'https://download.pytorch.org/models/vgg13-c768596a.pth',
    'vgg16': 'https://download.pytorch.org/models/vgg16-397923af.pth',
    'vgg19': 'https://download.pytorch.org/models/vgg19-dcbb9e9d.pth',
    'vgg11_bn': 'https://download.pytorch.org/models/vgg11_bn-6002323d.pth',
    'vgg13_bn': 'https://download.pytorch.org/models/vgg13_bn-abd245e5.pth',
    'vgg16_bn': 'https://download.pytorch.org/models/vgg16_bn-6c64b313.pth',
    'vgg19_bn': 'https://download.pytorch.org/models/vgg19_bn-c79401a0.pth',
}

위의 url들은 pytorch에서 VGG에 대해 이미 pre-training 시켜놓은 모델이다.

class VGG(nn.Module):
    def __init__(self, features, num_classes=1000, init_weights=True):
        super(VGG, self).__init__()
        
        self.features = features #convolution
        
        self.avgpool = nn.AdaptiveAvgPool2d((7, 7))
        
        self.classifier = nn.Sequential(
            nn.Linear(512 * 7 * 7, 4096),
            nn.ReLU(True),
            nn.Dropout(),
            nn.Linear(4096, 4096),
            nn.ReLU(True),
            nn.Dropout(),
            nn.Linear(4096, num_classes),
        )#FC layer
        
        if init_weights:
            self._initialize_weights()

    def forward(self, x):
        x = self.features(x) #Convolution 
        x = self.avgpool(x) # avgpool
        x = x.view(x.size(0), -1) #
        x = self.classifier(x) #FC layer
        return x

    def _initialize_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
                if m.bias is not None:
                    nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.BatchNorm2d):
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.constant_(m.bias, 0)

VGG 클래스이다.

먼저 input으로 받는값은 3개이며, feature에서 받아온 Convolution layer를 통해
forward과정에서 self.features에 input 값이 들어가서 나오게 되고,

Avg pooling 이후에 이를 펼쳐서, classifier에 들어가게 되어있다.

classifier는 코드에서도 볼 수 있듯이, nn.Sequential을 이용해 Fully Connected layer를 구성한 것이다.

이전 포스팅의 VGG Net에서 ConvNet에 대한 Configuration을 올려놓았다.

이를 살펴보면, 마지막에 Layer 하단에 FC Layer가 3개인데, nn.Sequential의 3개의 FC Layer가 이것과 같다는 것을 확인할 수 있다.

nn.Sequential에서 마지막으로 나오는 class의 개수(num_classes) 또한, 매개변수로써 빼먹지 말고 다뤄줘야한다.

Weight Initialization 과정을 살펴보기 위해,
self.modules에 대해 먼저 살펴보자.

“The self.modules() method returns an iterable to the many layers or “modules” defined in the model class.”
출처 : Pytorch discuss

즉, self.modules 메소드는 모델 클래스에 정의된 많은 레이어 또는 “모듈”에 대해 반복 가능한 항목을 반환한다고 설명하고 있는 것같다.

여기서는 features에 들어오는 Convolution Layer의 값을 하나씩 받아오면서, m이 어떤 것인지에 따라 Weight를 어떻게 초기화 시켜줄 것인지 나타낸 코드이다.

m에 Conv가 들어오면, kalming_normalization을 통해 weight를 초기화해주고,
bias는 0으로 초기화 해준다.

m이 BatchNorm2d이면, 이에 대해서도 Normalize를 잡아주고(전부다 1),
bias를 0으로 초기화 해준다.

Linear인 경우도 마찬가지로 진행됨을 알 수 있을 것이다.

그럼 이제, feature를 어떻게 넣어줄 지를 아래 make_layers와 cfg를 통해 알아보자.

def make_layers(cfg, batch_norm=False):
    layers = [] 
    in_channels = 3
    
    for v in cfg:
        if v == 'M':
            layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
        else:
            conv2d = nn.Conv2d(in_channels, v, kernel_size=3, padding=1)
            if batch_norm:
                layers += [conv2d, nn.BatchNorm2d(v), nn.ReLU(inplace=True)]
            else:
                layers += [conv2d, nn.ReLU(inplace=True)]
            in_channels = v
                     
    return nn.Sequential(*layers)

cfg = {
    'A': [64, 'M', 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'], #8 + 3 =11 == vgg11
    'B': [64, 64, 'M', 128, 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'], # 10 + 3 = vgg 13
    'D': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M'], #13 + 3 = vgg 16
    'E': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 256, 'M', 512, 512, 512, 512, 'M', 512, 512, 512, 512, 'M'], # 16 +3 =vgg 19
    'custom' : [64,64,64,'M',128,128,128,'M',256,256,256,'M']
}

cfg는 make_layers의 매개변수로 들어가서 nn.Conv2d의 out_channel이 된다.

즉, 처음의 input_channel 3에서 우리가 넣어준 cfg를 통해 out_channel을 결정해주며, Layer를 쌓아주는 것이다.

리스트의 'M'은 Maxpooling을 수행한다.

반환은 layer들을 nn.Sequential에 넣어준 것을 반환한다.

VGG의 숫자는 FC layer 3개를 더한 값임을 확인할 수 있다.

conv = make_layers(cfg['custom'], batch_norm=True)

BatchNorm을 추가한 conv 형태는 다음과 같다

Sequential(
(0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
(6): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(7): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(8): ReLU(inplace=True)
(9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(10): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(11): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(12): ReLU(inplace=True)
(13): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(14): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(15): ReLU(inplace=True)
(16): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(17): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(18): ReLU(inplace=True)
(19): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(20): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(21): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(22): ReLU(inplace=True)
(23): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(24): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(25): ReLU(inplace=True)
(26): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(27): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(28): ReLU(inplace=True)
(29): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
CNN = VGG(make_layers(cfg['custom']), num_classes=10, init_weights=True)
CNN
result

VGG(
(features): Sequential(
(0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace=True)
(2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): ReLU(inplace=True)
(4): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(5): ReLU(inplace=True)
(6): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(7): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(8): ReLU(inplace=True)
(9): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(10): ReLU(inplace=True)
(11): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(12): ReLU(inplace=True)
(13): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(14): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(15): ReLU(inplace=True)
(16): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(17): ReLU(inplace=True)
(18): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(19): ReLU(inplace=True)
(20): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(avgpool): AdaptiveAvgPool2d(output_size=(7, 7))
(classifier): Sequential(
(0): Linear(in_features=25088, out_features=4096, bias=True)
(1): ReLU(inplace=True)
(2): Dropout(p=0.5, inplace=False)
(3): Linear(in_features=4096, out_features=4096, bias=True)
(4): ReLU(inplace=True)
(5): Dropout(p=0.5, inplace=False)
(6): Linear(in_features=4096, out_features=10, bias=True)
)
)

완성된 VGG net의 결과를 확인해보면, 위와 같다.

학습 코드

아래의 코드는 로컬에서 진행할 수 있는 코드이다.

본인은 로컬에서 진행하다 학습시간이 오래걸려 CoLab에서 다시 진행했다.
Colab에서도 상당히 오래걸리고, Colab에서는 반응이 없거나 일정시간이 지나면 중단되는 경우가 있어

결과는 잠시 미뤄두고 코드를 우선으로 보겠다.

기본설정

import torch
import torch.nn as nn

import torch.optim as optim

import torchvision
import torchvision.transforms as transforms

import visdom

vis = visdom.Visdom()
vis.close(env="main")

Setting up a new session…
‘’

visdom을 사용하기 위한 설정.

def loss_tracker(loss_plot, loss_value, num):
    '''num, loss_value, are Tensor'''
    vis.line(X=num,
             Y=loss_value,
             win = loss_plot,
             update='append'
             )

device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(device)

torch.manual_seed(777)
if device =='cuda':
    torch.cuda.manual_seed_all(777)

cpu

transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainset = torchvision.datasets.CIFAR10(root='./cifar10', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=512,
                                          shuffle=True, num_workers=4)

testset = torchvision.datasets.CIFAR10(root='./cifar10', train=False,
                                       download=True, transform=transform)

testloader = torch.utils.data.DataLoader(testset, batch_size=4,
                                         shuffle=False, num_workers=4)

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

Files already downloaded and verified
Files already downloaded and verified

Colab에서는 num_workes가 2가 넘으면 안된다고 경고한다.

import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
# functions to show an image

def imshow(img):
    img = img / 2 + 0.5     # unnormalize
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.show()

# get some random training images
dataiter = iter(trainloader)
images, labels = dataiter.next()
vis.images(images/2 + 0.5)

# show images
#imshow(torchvision.utils.make_grid(images))

# print labels
print(' '.join('%5s' % classes[labels[j]] for j in range(4)))

truck car car cat

img show

위까지 해서 기본적인 설정들을 진행해준다.

VGG Net

import vgg
#import torchvision.models.vgg as vgg

주의점은 vgg.py를 같은 root에 생성해놓은 것이 아니라면 아래줄의 코드를 사용해야한다.
강의 참고

cfg = [32,32,'M', 64,64,128,128,128,'M',256,256,256,512,512,512,'M'] #13 + 3 =vgg16

cfg는 다음과 같다.

vgg16을 사용하게 된다.

class VGG(nn.Module):

    def __init__(self, features, num_classes=1000, init_weights=True):
        super(VGG, self).__init__()
        self.features = features
        #self.avgpool = nn.AdaptiveAvgPool2d((7, 7))
        self.classifier = nn.Sequential(
            nn.Linear(512 * 4 * 4, 4096),
            nn.ReLU(True),
            nn.Dropout(),
            nn.Linear(4096, 4096),
            nn.ReLU(True),
            nn.Dropout(),
            nn.Linear(4096, num_classes),
        )
        if init_weights:
            self._initialize_weights()

    def forward(self, x):
        x = self.features(x)
        #x = self.avgpool(x)
        x = x.view(x.size(0), -1)
        x = self.classifier(x)
        return x

    def _initialize_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
                if m.bias is not None:
                    nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.BatchNorm2d):
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.constant_(m.bias, 0)

본 포스팅위에서 Layer를 만드는 것과 같은 과정이다.

512 * 4 * 4 인 이유는 Max Pooling을 3번 진행하기 때문에 512채널의 4 by 4가 생성됨을 확인할 수 있을 것이다.

vgg16= VGG(vgg.make_layers(cfg),10,True).to(device)

num_classes는 10개로 받는다.

make_layers도 위에서 다루었다.

a=torch.Tensor(1,3,32,32).to(device)
out = vgg16(a)
print(out)

결과 확인

tensor([[nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]],
grad_fn=)

criterion = nn.CrossEntropyLoss().to(device)
optimizer = torch.optim.SGD(vgg16.parameters(), lr = 0.005,momentum=0.9)

lr_sche = optim.lr_scheduler.StepLR(optimizer, step_size=5, gamma=0.9)

이번 코드에서 새로 추가된 부분은 lr_sche이다.

학습 중에 learning rate를 고정시키는 것이 아닌 learning rate를 계속해서 수정하며, 조금 더 정밀한 학습을 할 수 있도록 해준다.

step_size=5는 sche.step을 5번씩 할 때마다 learning rate에 0.9(gamma=0.9)씩 곱하라라는 뜻이다.

lr_sche.step()은 아래 학습 코드에 있는 것을 확인할 수 있다.

loss_plt = vis.line(Y=torch.Tensor(1).zero_(),opts=dict(title='loss_tracker', legend=['loss'], showlegend=True))

print(len(trainloader))
epochs = 50

for epoch in range(epochs):
    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        inputs, labels = data
        inputs = inputs.to(device)
        labels = labels.to(device)
    
        #zero the parameter gradients
        optimizer.zero_grad()
    
        #forward + backward + optimize
        outputs = vgg16(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        lr_sche.step()
    
        #print statistics
        running_loss += loss.item()
        if i % 30 == 29:
            loss_tracker(loss_plt, torch.Tensor([running_loss/30]), torch.Tensor([i + epoch*len(trainloader)]))
            print('[%d, %5d] loss: %.3f' %
                 (epoch + 1, i + 1, running_loss / 30))
            running_loss = 0.0
print('Finished Training')

dataiter = iter(testloader)
images, labels = dataiter.next()

# print images
imshow(torchvision.utils.make_grid(images))
print('GroundTruth: ', ' '.join('%5s' % classes[labels[j]] for j in range(4)))

outputs = vgg16(images.to(device))

_, predicted = torch.max(outputs, 1)

print('Predicted: ', ' '.join('%5s' % classes[predicted[j]]
                              for j in range(4)))

correct = 0
total = 0

with torch.no_grad():
    for data in testloader:
        images, labels = data
        images = images.to(device)
        labels = labels.to(device)
        outputs = vgg16(images)
        
        _, predicted = torch.max(outputs.data, 1)
        
        total += labels.size(0)
        
        correct += (predicted == labels).sum().item()

print('Accuracy of the network on the 10000 test images: %d %%' % (
    100 * correct / total))

마무리

VGG Net에서 Layer의 구성방법과 어떻게 학습하는지에 대해 다루어 보았다.

이미 다른 포스팅들에서 MNIST를 다양한 방식으로 다루면서 학습 코드를 다뤄본 상태이기에
바뀐 부분만 간단히 설명하였다.

코드는 이해가 안간다면 강의에서 더 잘 설명해주시니 그것을 참고하면 좋을 것 같다.

참고자료

pytorch 딥러닝 강의

📣
본 포스팅의 학습 환경 : Anaconda, CPU, Pytorch, Jupyter Notebook
포스팅에서 오류나 궁금한 점은 Comments를 작성해주시면, 많은 도움이 됩니다.💡

Share on

Twitter Facebook LinkedIn

Lee Jaewon

Lec 10-5: Advanced CNN - VGG

Boostcourse의 ‘파이토치로 시작하는 딥러닝 기초’를 통한 공부와 정리 Post

Goal of Study

Keyword

1. 강의 요약

VGG

VGGnet_layer

학습 코드

기본설정

VGG Net

마무리

참고자료

Share on

Leave a comment

You may also enjoy

[Tips] Ubuntu(Linux) screen 사용법

[Tips] github.io에 여러 github.io theme 불러오기

[논문 리뷰] TRIPS: Trilinear Point Splatting for Real-Time Radiance Field Rendering

[논문 리뷰] STD: Stable Triangle Descriptor for 3D place recognition