Neural Network

1. What is DQN

๊ฐ•ํ™”ํ•™์Šต์—์„œ agent๋Š” environment๋ฅผ MDP๋ฅผ ํ†ตํ•ด์„œ ์ดํ•ด๋ฅผ ํ•˜๋Š”๋ฐ table ํ˜•ํƒœ๋กœ ํ•™์Šต์„ ๋ชจ๋“  state์— ๋Œ€ํ•œ action-value function์˜ ๊ฐ’์„ ์ €์žฅํ•˜๊ณ  update์‹œ์ผœ๋‚˜๊ฐ€๋Š” ์‹์œผ๋กœ ํ•˜๋ฉด ํ•™์Šต์ด ์ƒ๋‹นํžˆ ๋А๋ ค์ง‘๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ approximation์„ ํ•˜๊ฒŒ๋˜๊ณ  ๊ทธ approximation๋ฐฉ๋ฒ• ์ค‘์—์„œ nonlinear function approximator๋กœ deep neural network๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ action-value function(q-value)๋ฅผ approximateํ•˜๋Š” ๋ฐฉ๋ฒ•์œผ๋กœ deep neural network๋ฅผ ํƒํ•œ reinforcement learning๋ฐฉ๋ฒ•์ด Deep Reinforcement Learning(deepRL)์ž…๋‹ˆ๋‹ค. ๋˜ํ•œ action value function๋ฟ๋งŒ ์•„๋‹ˆ๋ผ policy ์ž์ฒด๋ฅผ approximateํ•  ์ˆ˜๋„ ์žˆ๋Š”๋ฐ ๊ทธ approximator๋กœ DNN์„ ์‚ฌ์šฉํ•ด๋„ DeepRL์ด ๋ฉ๋‹ˆ๋‹ค.

action value function์„ approximateํ•˜๋Š” deep neural networks๋ฅผ Deep Q-Networks(DQN)์ด๋ผ๊ณ  ํ•˜๋Š”๋ฐ ๊ทธ๋ ‡๋‹ค๋ฉด DQN์œผ๋กœ ์–ด๋–ป๊ฒŒ ํ•™์Šตํ• ๊นŒ์š”? DQN์ด๋ผ๋Š” ๊ฐœ๋…์€ DeepMind์˜ "Playing Atari with Deep Reinforcement Learning"๋ผ๋Š” ๋…ผ๋ฌธ์— ์†Œ๊ฐœ๋˜์–ด์žˆ์Šต๋‹ˆ๋‹ค. https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf

2. Artificial Neural Networks (ANN)

http://sanghyukchun.github.io/74/ deepRL์„ ํ•˜๋ ค๋ฉด ๋”ฅ๋Ÿฌ๋‹์˜ ๊ธฐ๋ณธ์ ์ธ ๊ฐœ๋…์— ๋Œ€ํ•ด์„œ ์•Œ ํ•„์š”๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ์œ„ ๋ธ”๋กœ๊ทธ์— ๊ฒŒ์‹œ๋œ ๋‚ด์šฉ์„ ๋ฐ”ํƒ•์œผ๋กœ ๊ด€๋ จ๋‚ด์šฉ์„ ์š”์•ฝํ•ด๋ดค์Šต๋‹ˆ๋‹ค. ์ € ๋˜ํ•œ ๋”ฅ๋Ÿฌ๋‹์€ ์ฒ˜์Œ ์ ‘ํ•˜๋Š” ๊ฒƒ์ด๋ผ ์•ž์œผ๋กœ๋„ ๊ณต๋ถ€๊ฐ€ ํ•„์š”ํ•  ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค. ๊ฐ•ํ™”ํ•™์Šต์ด ์‚ฌ๋žŒ์˜ ํ–‰๋™๋ฐฉ์‹์„ ๋ชจ๋ฐฉํ–ˆ๋‹ค๋ผ๊ณ  ํ•œ๋‹ค๋ฉด, artificial neural networks(์ค„์—ฌ์„œ neural networks)๋Š” ์‚ฌ๋žŒ์˜ ๋‡Œ์˜ ๊ตฌ์กฐ๋ฅผ ๋ชจ๋ฐฉํ–ˆ์Šต๋‹ˆ๋‹ค. ์ธ๊ณต์ง€๋Šฅ์ด ์‚ฌ๋žŒ์˜ ๋‡Œ๋ฅผ ๋ชจ๋ฐฉํ•˜๊ฒŒ ๋œ ๊ฒƒ์—๋Š” ์ปดํ“จํ„ฐ๊ฐ€ ๊ณ„์‚ฐ๊ณผ ๊ฐ™์€ ์ผ์—๋Š” ์‚ฌ๋žŒ๋ณด๋‹ค ๋›ฐ์–ด๋‚œ performance๋ฅผ ๋‚ด์ง€๋งŒ ๊ฐœ์™€ ๊ณ ์–‘์ด๋ฅผ ๊ตฌ๋ณ„ํ•˜๋Š” ์‚ฌ๋žŒ์ด๋ผ๋ฉด ๋ˆ„๊ตฌ๋‚˜ ๊ฐ„๋‹จํ•˜๊ฒŒ ํ•˜๋Š” ์ผ์€ ์ปดํ“จํ„ฐ๋Š” ํ•˜์ง€ ๋ชปํ–ˆ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์ด๋ฏธ ๋‡Œ์˜ ๊ตฌ์กฐ์— ๋Œ€ํ•ด์„œ๋Š” ์ˆ˜๋งŽ์€ ๋‰ด๋Ÿฐ๋“ค๊ณผ ์‹œ๋ƒ…์Šค๋กœ ๊ตฌ์„ฑ๋˜์–ด์žˆ๋‹ค๋Š” ๊ฒƒ์„ ์•Œ๊ณ  ๊ทธ๊ฒƒ์„ ์ˆ˜ํ•™์  ๋ชจ๋ธ๋กœ ๋งŒ๋“ค์–ด์„œ ์ปดํ“จํ„ฐ์˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜์— ์ ์šฉ์‹œํ‚ค๋Š” ๋ฐฉ๋ฒ•์„ ํƒํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

neural networks์˜ ์ˆ˜ํ•™์  ๋ชจ๋ธ์— ๋Œ€ํ•ด์„œ ๊ฐ„๋‹จํžˆ ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ๊ทธ ์ „์— ์‚ฌ๋žŒ์˜ ๋‰ด๋Ÿฐ์˜ ๊ตฌ์กฐ๋ฅผ ๋ณด๋ฉด http://arxiv.org/pdf/cs/0308031.pdf

๊ฐ Neuron๋“ค์€ synapse๋ฅผ ํ†ตํ•ด์„œ signal์„ ๋ฐ›์Šต๋‹ˆ๋‹ค. ๋งŒ์•ฝ signal์ด ์–ด๋–ค ํŠน์ •ํ•œ threshlod๋ฅผ ๋„˜์–ด๊ฐ„๋‹ค๋ฉด neuron์ด activate๋˜๊ณ  ๊ทธ ๋‰ด๋Ÿฐ์€ axon์„ ํ†ตํ•ด์„œ signal์„ ๋‹ค๋ฅธ synapse๋กœ ๋ณด๋ƒ…๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๊ตฌ์กฐ๋ฅผ ๋‰ด๋Ÿฐ์˜ ๋ชจ์–‘์„ ๋นผ๊ณ  process์œ„์ฃผ๋กœ ๋‹ค์‹œ ํ‘œํ˜„์„ ํ•ด๋ณด๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค. http://www.slideshare.net/imanog/artificial-neural-network-48027460

์‹œ๋ƒ…์Šค๋ฅผ ํ†ตํ•ด์„œ input์ด ๋“ค์–ด์˜ค๋ฉด ๋‰ด๋Ÿฐ์€ processs๋ฅผ ์ง„ํ–‰ํ•˜๊ณ  output์„ ๋‚ด๋†“์Šต๋‹ˆ๋‹ค. process๋Š” input์ด ๋“ค์–ด๊ฐ€์„œ output์ด ๋‚˜์˜ค๋Š” ์ผ์ข…์˜ ํ•จ์ˆ˜์ž…๋‹ˆ๋‹ค. ์ง€๊ธˆ์€ ๋‹จ์ˆœํžˆ ํ•˜๋‚˜์˜ input์— ํ•˜๋‚˜์˜ output๋งŒ์„ ์ ์—ˆ์ง€๋งŒ ์‚ฌ์‹ค์€ ์—ฌ๋Ÿฌ๊ฐœ์˜ input์ด ๋“ค์–ด์™€์„œ ์—ฌ๋Ÿฌ๊ฐœ์˜ output์ด ๋‚˜๊ฐ€๋Š” ๋ฐ ๊ทธ input๊ณผ output์€ ๋‰ด๋Ÿฐ ์‚ฌ์ด์˜ ์—ฐ๊ฒฐ์„ ํ†ตํ•ด์„œ ์ „๋‹ฌ๋ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌํ•œ ๊ตฌ์กฐ๋ฅผ ๋‹ค์‹œ ๊ทธ๋ฆผ์œผ๋กœ ํ‘œํ˜„ํ•˜์ž๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

์ด ๊ทธ๋ฆผ์—์„œ์™€ ๊ฐ™์ด ๋‰ด๋Ÿฐ์˜ ์‹œ๋ƒ…์Šค๊ฐ€ 10๊ฐœ๋ผ๊ณ  ๊ฐ€์ •ํ•ด๋ณด๋ฉด ์ด ์‹œ๋ƒ…์Šค๋“ค์„ ํ†ตํ•ด์„œ 10๊ฐœ์˜ ๋‹ค๋ฅธ input๋“ค์ด ๋“ค์–ด์˜ค๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ๋‰ด๋Ÿฐ์˜ process์— ๋“ค์–ด๊ฐ€๋Š” ๊ฐ’์€ ์ด 10๊ฐœ์˜ input๋“ค์˜ linear combination์ž…๋‹ˆ๋‹ค. ์ด process๋ฅผ ๊ฑฐ์นœ y๊ฐ’์€ ๋‹ค์‹œ ๋‹ค๋ฅธ ๋‰ด๋Ÿฐ๋“ค์˜ ์‹œ๋ƒ…์Šค๋กœ input์œผ๋กœ ๋“ค์–ด๊ฐ€๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์‚ฌ๋žŒ์˜ ๋‰ด๋Ÿฐ์˜ ๊ตฌ์กฐ๋ฅผ ๋ชจ๋ฐฉํ•ด์„œ ์ธ๊ณต์‹ ๊ฒฝ๋ง์„ ๊ตฌ์„ฑํ•˜๋ฉด, ๊ฐ neuron๋“ค์€ node๊ฐ€ ๋˜๊ณ  synapse๋ฅผ ํ†ตํ•ด์„œ ๋“ค์–ด์˜ค๋Š” signal์€ input์ด ๋˜๊ณ  ๊ฐ๊ฐ ๋‹ค๋ฅธ synapse๋ฅผ ํ†ตํ•ด์„œ ๋“ค์–ด์˜ค๋Š” signal๋“ค์˜ ์ค‘์š”๋„๊ฐ€ ๋‹ค๋ฅผ ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ weight๋ฅผ ๊ณฑํ•ด์ค˜์„œ ๋“ค์–ด์˜ค๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ์ด signal๋“ค์ด weight์™€ ๊ณฑํ•ด์ง„ ๊ฒƒ์ด ์œ„์—์„œ ์–ธ๊ธ‰ํ–ˆ๋˜ net input signal์ž…๋‹ˆ๋‹ค. ๊ทธ net input signal์„ ์‹์œผ๋กœ ํ‘œํ˜„ํ•ด๋ณด์ž๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

์‹œ๋ƒ…์Šค๋กœ ๋“ค์–ด์˜ค๋Š” ๊ฐ๊ฐ์˜ input์„ vector๋กœ ํ‘œํ˜„ํ•˜๊ณ  ๊ทธ input์— ๊ฐ๊ฐ ๊ณฑํ•ด์ง€๋Š” weight ๋˜ํ•œ ๊ทธ์— ๋”ฐ๋ผ vector๋กœ ๋งŒ๋“ค์–ด์„œ ๋‘ vector๋ฅผ ๊ณฑํ•ด์„œ input๊ณผ weight์˜ linear combination์„ ๋งŒ๋“ค์–ด์ค๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ ์ƒˆ๋กœ์šด ๊ฐœ๋…์ด ๋‚˜ํƒ€๋‚˜๋Š”๋ฐ b๋กœ ์จ์ง€๋Š” bias์ž…๋‹ˆ๋‹ค.

Bias๊ฐ€ linear combination์— ๋”ํ•ด์ ธ์„œ net input signal๋กœ ๋“ค์–ด๊ฐ€๋Š” ์ด์œ ๋Š” ๊ฐ„๋‹จํ•˜๊ฒŒ ๋งํ•˜์ž๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค. ์ขŒํ‘œํ‰๋ฉด์—์„œ (0,0)๊ณผ (5,5)์„ ์–ด๋– ํ•œ ์„ ์„ ๊ธฐ์ค€์œผ๋กœ ๊ตฌ๋ถ„ํ•˜๊ณ  ์‹ถ๋‹ค๊ณ  ์ƒ๊ฐํ•ด๋ด…์‹œ๋‹ค(์˜ˆ๋ฅผ ๋“ค๋ฉด, ๊ณ ์–‘์ด๊ณผ ๊ฐœ๋ฅผ ๊ตฌ๋ถ„ํ•˜๋Š” ๋ฌธ์ œ๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค). bias๊ฐ€ ์—†๋Š” y = ax๊ฐ™์€ ํ•จ์ˆ˜์˜ ๊ฒฝ์šฐ์—๋Š” ๋‘ ์ ์„ ๊ตฌ๋ถ„ํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์ด ์—†์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ y = ax + b๋Š” ์ด ๋‘ ์ ์„ ๊ตฌ๋ถ„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋˜ํ•œ ๋‹ค๋ฅธ ์‹์œผ๋กœ bias์˜ ํ•„์š”์„ฑ์„ ์„ค๋ช…ํ•˜์ž๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.http://stackoverflow.com/questions/2480650/role-of-bias-in-neural-networks

Modification of neuron WEIGHTS alone only serves to manipulate the shape/curvature of your transfer function, and not its equilibrium/zero crossing point. The introduction of BIAS neurons allows you to shift the transfer function curve horizontally (left/right) along the input axis while leaving the shape/curvature unaltered. This will allow the network to produce arbitrary outputs different from the defaults and hence you can customize/shift the input-to-output mapping to suit your particular needs.

์ฆ‰ ๋…ธ๋“œ๋กœ ๋“ค์–ด๊ฐ€๋Š” input๋“ค์— ๊ณฑํ•ด์ง€๋Š” weight(ํ•™์Šต์‹œํ‚ค๋ ค๋Š” ๋Œ€์ƒ)์„ ๋ณ€ํ™”์‹œ์ผœ๋ฉด ํ•จ์ˆ˜์˜ ๋ชจ์–‘๋งŒ ๋ณ€ํ™”์‹œํ‚ฌ ์ˆ˜ ์žˆ์ง€ ์™ผ์ชฝ/์˜ค๋ฅธ์ชฝ์œผ๋กœ ์ด๋™์‹œ์ผœ์„œ 0์ด ๋˜๋Š” point๋ฅผ ๋ณ€ํ˜•์‹œํ‚ฌ ์ˆ˜๋Š” ์—†์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ bias๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ์‚ฌ์šฉ์ž์˜ ์š”๊ตฌ์— ๋” ์œ ๋™์ ์œผ๋กœ ๊ทธ๋ž˜ํ”„๋ฅผ ์ด๋™ ๋ฐ ๋ณ€ํ˜•์‹œ์ผœ์„œ ํ•™์Šตํ•  ์ˆ˜๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.

input signal๋“ค๊ณผ weight๊ฐ€ ๊ณฑํ•ด์ง€๊ณ  bias๊ฐ€ ๋”ํ•ด์ง„ net input signal์ด node๋ฅผ activate์‹œํ‚ค๋Š”๋ฐ ๊ทธ ํ˜•์‹์„ function์œผ๋กœ ์ •์˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌํ•œ ํ•จ์ˆ˜๋ฅผ activation function์ด๋ผ ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๊ฐœ๋…๋“ค์„ ๋ชจ๋‘ ํ•ฉํ•ด์„œ ๊ทธ๋ฆผ์œผ๋กœ ๋‚˜ํƒ€๋‚ธ artificial neuron์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

f๋ผ๊ณ  ํ‘œํ˜„๋˜์–ด ์žˆ๋Š” activation function์˜ ๊ฐ€์žฅ ๊ฐ„๋‹จํ•œ ํ˜•ํƒœ๋Š” ๋“ค์–ด์˜จ input๋“ค์˜ ํ•ฉ์ด ์–ด๋–ค Threshold๋ณด๋‹ค ๋†’์œผ๋ฉด 1์ด ๋‚˜์˜ค๊ณ  ๋‚ฎ์œผ๋ฉด 0์ด ๋‚˜์˜ค๋Š” ํ˜•ํƒœ์ผ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ์ด๋Ÿฐ ํ˜•ํƒœ์˜ activation function์˜ ๊ฒฝ์šฐ์—๋Š” ๋ฏธ๋ถ„์ด ๋ถˆ๊ฐ€๋Šฅํ•˜๊ณ  ๋”ฐ๋ผ์„œ gradient descent๋ฅผ ๋ชป ์“ฐ๊ธฐ ๋•Œ๋ฌธ์— ๊ทธ ์ด์™ธ์˜ ๋ฏธ๋ถ„๊ฐ€๋Šฅ ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. gradient descent์— ๋Œ€ํ•ด์„œ๋Š” ๋’ค์—์„œ ์„ค๋ช…ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ๋ฐ‘์˜ ์‚ฌ์ง„์€ activation function์˜ ์˜ˆ์‹œ์ž…๋‹ˆ๋‹ค.

์œ„์—์„œ ๋งํ•œ ๊ฐ€์žฅ ๊ฐ„๋‹จํ•œ activation function์˜ ํ˜•ํƒœ๋Š” ์ฒซ๋ฒˆ์งธ ๊ทธ๋ฆผ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค. ์œ„์—์„œ ์–ธ๊ธ‰ํ–ˆ๋“ฏ์ด ์ด ํ•จ์ˆ˜ ๋Œ€์‹ ์— ๋ฏธ๋ถ„๊ฐ€๋Šฅํ•œ ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜๊ฒŒ ๋˜์—ˆ๊ณ  ๊ทธ ์ค‘์— ๋Œ€ํ‘œ์ ์ธ ํ•จ์ˆ˜๊ฐ€ ์„ธ๋ฒˆ์งธ ๊ทธ๋ž˜ํ”„์ธ sigmoid Function์ž…๋‹ˆ๋‹ค. sigmoid function์ด๋ž€ ๋ฌด์—‡์ผ๊นŒ์š”? ์‹์œผ๋กœ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ํ‘œํ˜„๋ฉ๋‹ˆ๋‹ค.

activation function์˜ ์˜ˆ์‹œ์—๋Š” sigmoid๋ง๊ณ ๋„ ์„ธ ๊ฐ€์ง€ ๋‹ค๋ฅธ ํ•จ์ˆ˜๋“ค์ด ์žˆ๋Š”๋ฐ ์ด ํ•จ์ˆ˜๋“ค์€ ๋‹ค non-linearํ•ฉ๋‹ˆ๋‹ค. ๊ทธ ์ด์œ ๋Š” activation function์ด linearํ•  ๊ฒฝ์šฐ์—๋Š” ์•„๋ฌด๋ฆฌ ๋งŽ์€ neuron layer๋ฅผ ์Œ“๋Š”๋‹ค ํ•˜๋”๋ผ๋„ ๊ทธ๊ฒƒ์ด ๊ฒฐ๊ตญ ํ•˜๋‚˜์˜ layer๋กœ ํ‘œํ˜„๋˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.

  • sigmoid function

  • tanh function

  • absolute function

  • ReLU function

๊ฐ€์žฅ ์‹ค์šฉ์ ์ธ activation function์€ ReLU function์ด๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ์ €ํฌ ๋˜ํ•œ ReLU function์„ activation function์œผ๋กœ ์‚ฌ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค. ReLU๋ž€ ์–ด๋–ค ํ•จ์ˆ˜์ผ๊นŒ์š”? http://cs231n.github.io/neural-networks-1/

์œ„์˜ ์™ผ์ชฝ ๊ทธ๋ฆผ์ฒ˜๋Ÿผ x๊ฐ€ 0๋ณด๋‹ค ์ž‘๊ฑฐ๋‚˜ ๊ฐ™์„๋•Œ๋Š” y๊ฐ€ 0์ด ๋‚˜์˜ค๊ณ  x๊ฐ€ 0๋ณด๋‹ค ํด๋•Œ๋Š” x๊ฐ€ ๋‚˜์˜ค๋Š” ํ•จ์ˆ˜๋ฅผ ReLU(The Rectified Linear Unit)์ด๋ผ๋Š” ํ•จ์ˆ˜์ž…๋‹ˆ๋‹ค. ์œ„ ๊ธ€์— ์จ์ ธ์žˆ๋“ฏ์ด ์ตœ๊ทผ ๋ช‡ ๋…„๋™์•ˆ ์œ ๋ช…ํ•ด์ง€๊ณ  ์žˆ๋Š” ํ•จ์ˆ˜์ž…๋‹ˆ๋‹ค. ์‚ฌ์‹ค ๋”ฅ๋Ÿฌ๋‹์ด ์ตœ๊ทผ์— ๊ฐ‘์ž๊ธฐ ๊ธ‰๋ถ€์ƒํ•œ ์ด์œ ๋Š” ์—„์ฒญ ํ˜์‹ ์ ์ธ ๋ณ€ํ™”๊ฐ€ ์žˆ์—ˆ๋˜ ๊ฒƒ์ด ์•„๋‹ˆ๊ณ  activationํ•จ์ˆ˜๋ฅผ sigmoid์—์„œ ReLU๋กœ ๋ฐ”๊พธ๋Š” ๋“ฑ์˜ ์ž‘์€ ๋ณ€ํ™”๋“ค์˜ ์˜ํ–ฅ์ด ํฌ๋‹ค๊ณ  ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

sigmoidํ•จ์ˆ˜์— ๋น„ํ•ด์„œ ReLUํ•จ์ˆ˜๋Š” ์–ด๋– ํ•œ ์žฅ์ ์ด ์žˆ์„๊นŒ์š”? ์œ„ ๊ทธ๋ฆผ์—์„œ ๋ณด๋“ฏ์ด ReLU์˜ ์ง์„ ์ ์ธ ํ˜•ํƒœ์™€ sigmoidํ•จ์ˆ˜์ฒ˜๋Ÿผ ์ˆ˜๋ ดํ•˜๋Š” ํ˜•ํƒœ๊ฐ€ ์•„๋‹Œ ์ ์ด ReLU์˜ stochastic gradient descent ๊ฐ€ ๋” ์ž˜ ์ˆ˜๋ ดํ•˜๊ฒŒ ํ•ด์ค๋‹ˆ๋‹ค. ๋˜ํ•œ ์ƒ๋Œ€์ ์œผ๋กœ sigmoidํ•จ์ˆ˜์— ๋น„ํ•ด์„œ ๊ณ„์‚ฐ๋Ÿ‰์ด ์ค„ ๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ์žฅ์ ์ด ์žˆ์œผ๋ฉด ๋‹จ์ ๋„ ์žˆ๋Š” ๋ฒ•์ž…๋‹ˆ๋‹ค. ๋‹จ์ ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค. Learning rate์— ๋”ฐ๋ผ์„œ ์ค‘๊ฐ„์— ์ตœ๋Œ€ 40%์ •๋„์˜ network๊ฐ€ "die"ํ•  ์ˆ˜ ์žˆ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ๋‹จ, learning rate๋ฅผ ์ž˜ ์กฐ์ ˆํ•˜๋ฉด ์ด ๋ฌธ์ œ๋Š” ๊ทธ๋ ‡๊ฒŒ ํฌ์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

์•ž์—์„œ ์‚ดํŽด๋ณธ artificial neuron๋“ค์„ network๋กœ ํ‘œํ˜„ํ•˜๋ฉด ์œ„์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค. ์‚ฌ์‹ค์€ ์‚ฌ๋žŒ ๋‡Œ์˜ ๋‰ด๋Ÿฐ๋“ค์€ ์ด๋ณด๋‹ค ์ƒ๋‹นํžˆ ๋” ๋ณต์žกํ•˜๊ฒŒ ์—ฐ๊ฒฐ๋˜์–ด ์žˆ์ง€๋งŒ ๋จธ์‹ ๋Ÿฌ๋‹์—์„œ ์‚ฌ์šฉํ•˜๋Š” neural network๋Š” ํ›จ์”ฌ ๊ฐ„๋‹จํ•œ ํ˜•ํƒœ์ž…๋‹ˆ๋‹ค. ์œ„์—์„œ ๋ณด์ด๋Š” ๋™๊ทธ๋ผ๋ฏธ๋“ค์€ ๋‰ด๋Ÿฐ์— ํ•ด๋‹นํ•˜๋Š” node๋“ค์ž…๋‹ˆ๋‹ค. ์ด node๋“ค์€ ๊ฐ ์ธต์œผ๋กœ ๋ถ„๋ฅ˜๋  ์ˆ˜ ์žˆ๊ณ  ๊ฐ™์€ ์ธต์•ˆ์—์„œ๋Š” ์—ฐ๊ฒฐ๋˜์–ด ์žˆ์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ์ •๋ณด์˜ ๋ฐฉํ–ฅ์€ ์™ผ์ชฝ์—์„œ ์˜ค๋ฅธ์ชฝ์œผ๋กœ ํ˜๋Ÿฌ๊ฐ€๋Š”๋ฐ ๊ทธ๋ ‡์ง€ ์•Š์€ network๋„ ์žˆ์Šต๋‹ˆ๋‹ค(RNN). ๋ณดํ†ต์€ node๋“ค์ด fully-connected๋˜์–ด ์žˆ์–ด์„œ ํ•œ node์—์„œ ๋‚˜์˜จ output๋“ค์€ ๋‹ค์Œ ์ธต์˜ ๋ชจ๋“  node์— input์œผ๋กœ ๋“ค์–ด๊ฐ€๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ์™ผ์ชฝ๊ณผ ์˜ค๋ฅธ์ชฝ์€ ๋‘˜ ๋‹ค neural network์ด์ง€๋งŒ ์ฐจ์ด๋Š” ์˜ค๋ฅธ์ชฝ์˜ network๋Š” hidden layer๊ฐ€ 2์ธต์ธ ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๊ณ  hidden layer๊ฐ€ 2์ธต ์ด์ƒ์ธ neural network๋ฅผ deep neural network๋ผ๊ณ  ๋ถ€๋ฆ…๋‹ˆ๋‹ค.

3. SGD(Stochastic Gradient Descent) and Back-Propagation

(1) SGD

์ง€๊ธˆ๊นŒ์ง€๋Š” deep neural network๊ฐ€ ๋ฌด์—‡์ธ์ง€์— ๋Œ€ํ•ด์„œ ์‚ดํŽด๋ณด์•˜์Šต๋‹ˆ๋‹ค. ๋‹ค์‹œ ์ด ๊ธ€์˜ ์ฒ˜์Œ์œผ๋กœ ๋Œ์•„๊ฐ€์„œ DQN์ด๋ž€ action-value function์„ deep neural network๋กœ approximationํ•œ ๊ฒƒ์„ ๋งํ•ฉ๋‹ˆ๋‹ค. ๊ฐ•ํ™”ํ•™์Šต์˜ ๋ชฉํ‘œ๋Š” optimal policy๋ฅผ ๊ตฌํ•˜๋Š” ๊ฒƒ์ด๊ณ  ๊ฐ state์—์„œ optimalํ•œ action value function์„ ์•Œ๊ณ  ์žˆ์œผ๋ฉด q๊ฐ’์ด ํฐ action์„ ์ทจํ•˜๋ฉด ๋˜๋Š” ๊ฒƒ์ด๋ฏ€๋กœ ๊ฒฐ๊ตญ์€ q-value๋ฅผ ๊ตฌํ•˜๋ฉด ๊ฐ•ํ™”ํ•™์Šต ๋ฌธ์ œ๋ฅผ ํ’€๊ฒŒ๋ฉ๋‹ˆ๋‹ค. ์ด q-value๋Š” DNN(deep neural networks)๋ฅผ ํ†ตํ•ด์„œ ๋‚˜์˜ค๊ฒŒ ๋˜๋Š”๋ฐ ๊ฒฐ๊ตญ DNN์„ ํ•™์Šต์‹œํ‚ค๋Š” ๊ฒƒ์ด ๋ชฉํ‘œ๊ฐ€ ๋˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

๋”ฐ๋ผ์„œ approximationํ•˜์ง€ ์•Š์•˜์„ ๋•Œ์™€ ๋‹ค๋ฅธ ๊ฒƒ์€ q-table์„ ๋งŒ๋“ค์–ด์„œ ๊ฐ๊ฐ์˜ q-value๋ฅผ updateํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๊ณ  DNN์•ˆ์˜ weight์™€ bias๋ฅผ updateํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ๊ทธ๋ ‡๋‹ค๋ฉด ์–ด๋–ป๊ฒŒ updateํ• ๊นŒ์š”?

์ด ๋•Œ ์ด์ „์— ๋ฐฐ์› ๋˜ Stochastic Gradient Descent๊ฐ€ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. ์ •๋ฆฌํ•˜์ž๋ฉด gradient descent๋ผ๋Š” ๊ฒƒ์€ w๋ฅผ parameter๋กœ ๊ฐ€์ง€๋Š” J๋ผ๋Š” objective function์„ minimizeํ•˜๋Š” ๋ฐฉ๋ฒ•์ค‘์˜ ํ•˜๋‚˜๋กœ์„œ w์— ๋Œ€ํ•œ J์˜ gradient์˜ ๋ฐ˜๋Œ€๋ฐฉํ–ฅ์œผ๋กœ w๋ฅผ updateํ•˜๋Š” ๋ฐฉ์‹์„ ๋งํ•ฉ๋‹ˆ๋‹ค.

์ด๋Ÿฐ์‹์œผ๋กœ update๋ฅผ ํ•˜๊ฒŒ๋˜๋Š”๋ฐ ๋ชจ๋“  ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด์„œ gradient๋ฅผ ๊ตฌํ•ด์„œ ํ•œ ๋ฒˆ updateํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๊ณ  sampling์„ ํ†ตํ•ด์„œ ์ˆœ์ฐจ์ ์œผ๋กœ updateํ•˜๊ฒ ๋‹ค๋Š” gradient descent๋ฐฉ๋ฒ•์ด stochastic gradient descent์ž…๋‹ˆ๋‹ค. ์•„๋ž˜ ํŽ˜์ด์ง€๋ฅผ ์ฐธ๊ณ ํ•ด๋ณด๋ฉด ๊ทธ๋ ‡๊ฒŒ ํ•  ๊ฒฝ์šฐ ์ˆ˜๋ ดํ•˜๋Š” ์†๋„๊ฐ€ ํ›จ์”ฌ ๋น ๋ฅด๋ฉฐ online์œผ๋กœ๋„ ํ•™์Šตํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์žฅ์ ์ด ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ ํ•˜๋‚˜ ์ค‘์š”ํ•œ ์ ์€ gradient descent๋ฐฉ๋ฒ•์€ local optimum์œผ๋กœ ๊ฐˆ ์ˆ˜ ์žˆ๋‹ค๋Š” ๋‹จ์ ์ด ์žˆ์Šต๋‹ˆ๋‹ค. http://sebastianruder.com/optimizing-gradient-descent/

(2) Back-Propagation

์ด gradient๋ฅผ ๊ตฌํ–ˆ๋‹ค๋ฉด DNN์˜ ์•ˆ์— ์žˆ๋Š” parameter๋“ค์„ ์–ด๋–ป๊ฒŒ updateํ• ๊นŒ์š”? ๋‹ค์‹œ DNN์•ˆ์—์„œ data๊ฐ€ ์ „๋‹ฌ๋˜์–ด๊ฐ€๋Š” ๊ณผ์ •์„ ์ƒ๊ฐํ•ด๋ด…์‹œ๋‹ค. input์ด ๋“ค์–ด๊ฐ€๋ฉด layer๋“ค์„ ๊ฑฐ์ณ๊ฐ€๋ฉฐ ouput layer์— ๋„๋‹ฌํ•œ data๊ฐ€ output์ด ๋˜์–ด์„œ ๋‚˜์˜ค๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

parameter๋ฅผ SGD๋กœ updateํ•  ๋•Œ๋Š” ๊ทธ ๋ฐ˜๋Œ€ ๋ฐฉํ–ฅ์œผ๋กœ ๊ฐ€๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ๊ทธ ์ด๋ฆ„์ด Back-Propagation์ด๋ผ๋Š” ์ด๋ฆ„์ด ๋ถ™์Šต๋‹ˆ๋‹ค. Tensorflow๋ฅผ ์‚ฌ์šฉํ•  ๊ฒฝ์šฐ์—๋Š” ๊ทธ๋Ÿฌํ•œ ์‹๋“ค์ด libraryํ™” ๋˜์žˆ์–ด์„œ ์—ฌ๊ธฐ์„œ ๋‹ค๋ฃฐ ๋‚ด์šฉ์€ ์•„๋‹Œ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

Last updated

Was this helpful?