[KDD 2020] USAD: UnSupervised Anomaly Detection on Multivariate Time Series
Paper url : https://dl.acm.org/doi/pdf/10.1145/3394486.3403392

Author and affiliation

Figure 01 : paper snapshot
โ€‹

Background

์ฃผ์–ด์ง„ ํ•™์Šต ๋ฐ์ดํ„ฐ ์…‹์—์„œ ๊ธฐ์กด ๊ด€์ธก๊ณผ๋Š” ์ƒ์ดํ•˜์—ฌ ๋‹ค๋ฅธ ๋ฉ”์ปคํ‹ฐ์ฆ˜์— ์˜ํ•ด ์ƒ์„ฑ๋˜์—ˆ๋‹ค๊ณ  ํŒ๋‹จํ• ๋งŒํ•œ ๊ด€์ธก๊ฐ’์„ anomaly sample์ด๋ผ๊ณ  ๋ถ€๋ฅด๋ฉฐ, ์ด์ƒ์น˜๋ฅผ ํƒ์ง€ํ•˜๋Š” task๋ฅผ anomaly detection (AD)์ด๋ผ๊ณ  ๋ถ€๋ฅธ๋‹ค. Anomaly detection task๋Š” abnormal์— ๋Œ€ํ•œ label์˜ ์œ ๋ฌด์— ๋”ฐ๋ผ supervised setting๊ณผ unsupervised setting์œผ๋กœ ๋‚˜๋‰œ๋‹ค. ๋Œ€๋ถ€๋ถ„์˜ real world์—์„œ๋Š” label์ด ์กด์žฌํ•˜์ง€ ์•Š์•„ unsupervised setting์—์„œ์˜ ๋†’์€ ์„ฑ๋Šฅ์„ ๊ธฐ๋กํ•˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜๋“ค์ด ๊ฐ๊ด‘์„ ๋ฐ›๊ณ  ์žˆ๋‹ค.
Unsupervised anomaly detection on multivariate time series (USAD)๋Š” ์ด๋ฆ„์—์„œ ์•Œ ์ˆ˜ ์žˆ๋“ฏ์ด ๋‹ค๋ณ€๋Ÿ‰ ์‹œ๊ณ„์—ด์—์„œ unsupervised setting์œผ๋กœ anomaly detection task๋ฅผ ์ง„ํ–‰ํ•˜๋Š” ๋ชจ๋ธ์ด๋‹ค. ๊ฐ ์šฉ์–ด๋“ค์˜ ์˜๋ฏธ๋ฅผ ์‚ดํŽด๋ณด์ž. ์•ž์„œ ์ •์˜ํ•œ anomaly ๋ง๊ณ  ๋ˆˆ์— ๋„๋Š” ๋‹จ์–ด๋Š” multivariate time series์ด๋‹ค. Multivariate time series๋ž€ ๊ฐ ์‹œ๊ฐ„ ๋‹จ์œ„๋งˆ๋‹ค ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๊ฐ’์„ ๊ฐ€์ง€๋Š” ์‹œ๊ณ„์—ด์„ ๋œปํ•œ๋‹ค. ๊ทธ๋ฆผ์œผ๋กœ ํ‘œํ˜„ํ•˜๋ฉด Figure 02์™€ ๊ฐ™๋‹ค [1].
Figure 02 : Multivariate time series example
๋‹ค์Œ์œผ๋กœ unsupervised anomaly detection์€ ํฌ๊ฒŒ autoencoder (AE) ๊ธฐ๋ฐ˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜, variational autoenc-oder (VAE) ๊ธฐ๋ฐ˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜, generative adversarial networks (GANs) ๊ธฐ๋ฐ˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ์กด์žฌํ•œ๋‹ค. Unsuperv-ised anomaly detection ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ AE -> VAE ํ˜น์€ AE -> GANs๋กœ ๋ณ€ํ•˜๋Š” ์ถ”์„ธ์ธ๋ฐ ๊ทธ ์ด์œ ์— ๋Œ€ํ•ด์„œ๋Š” ์ถ”ํ›„์— posting ํ•˜๋„๋ก ํ•˜๊ฒ ๋‹ค. USAD์˜ ์ž‘๋™ ๊ณผ์ •์„ ์„ค๋ช…ํ•˜๊ธฐ ์ „์— ๋ณธ ๋…ผ๋ฌธ์˜ contribution์„ ์ดํ•ดํ•˜๊ธฐ ์œ„ํ•ด์„œ AE ๊ธฐ๋ฐ˜ AD์™€ GANs ๊ธฐ๋ฐ˜ AD์˜ ์žฅ๋‹จ์ ์„ ์•Œ์•„๋ณด์ž.
โ€‹

Autoencoder based anomaly detection

๋จผ์ € anomaly detection์€ ํ•™์Šต ๋‹จ๊ณ„(training phase)์™€ ํƒ์ง€ ๋‹จ๊ณ„(detection phase)๋กœ ๊ตฌ๋ถ„ํ•  ์ˆ˜ ์žˆ๋‹ค. AE ๊ธฐ๋ฐ˜ ๋ชจ๋ธ์˜ ํ•™์Šต ๋‹จ๊ณ„์—์„œ๋Š” ์ •์ƒ ๋ฐ์ดํ„ฐ๋ฅผ ์••์ถ•&๋ณต์› ๊ณผ์ •์„ ๊ฑฐ์น˜๋Š”๋ฐ, ์ด ๋•Œ ๋ณต์›๋œ ์‹œ๊ณ„์—ด๊ณผ ์›๋ณธ ์‹œ๊ณ„์—ด ๊ฐ„์˜ ์ฐจ์ด์ธ reconstruction error๋ฅผ minimize ํ•˜์—ฌ ์ •์ƒ ๋ฐ์ดํ„ฐ๋ฅผ ์ž˜ ๋ณต์›ํ•˜๋Š” ๋ชจ๋ธ์„ ๊ตฌ์ถ•ํ•œ๋‹ค.
์ด์ฒ˜๋Ÿผ ์ •์ƒ ๋ฐ์ดํ„ฐ๋งŒ์„ ํ•™์Šตํ•œ AE ๋ชจ๋ธ์€ ๋น„์ •์ƒ ๋ฐ์ดํ„ฐ๋ฅผ ์ž…๋ ฅํ•˜์˜€์„ ๊ฒฝ์šฐ ํฐ ๊ฐ’์˜ reconstruction error๋ฅผ ๊ฐ–๋Š”๋‹ค. ํ•™์Šต์—์„œ ๋ณด์ง€ ๋ชปํ•œ data๊ธฐ์— ์ œ๋Œ€๋กœ ๋ณต์›ํ•˜์ง€ ๋ชปํ•˜๋Š” ๊ฒƒ์ธ๋ฐ, ํƒ์ง€ ๋‹จ๊ณ„์—์„œ๋Š” ์œ„ ์„ฑ์งˆ์„ ํ™œ์šฉํ•˜์—ฌ ์ •์ƒ๊ณผ ์ด์ƒ ๋ฐ์ดํ„ฐ๊ฐ€ ํ˜ผ์žฌํ•ด ์žˆ๋Š” ๋ฐ์ดํ„ฐ๋ฅผ AE์˜ input์œผ๋กœ ๋„ฃ์€ ํ›„ ์ด๋ฅผ ๋ณต์›ํ•˜์˜€์„ ๋•Œ ๋ฐœ์ƒํ•œ reconstruction error๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ threshold๋ฅผ ๋„˜๊ธฐ๋ฉด ์ด์ƒ, ๋„˜๊ธฐ์ง€ ์•Š์œผ๋ฉด ์ •์ƒ์œผ๋กœ ํŒ๋‹จํ•œ๋‹ค. ์ด ๋•Œ ์ด์ƒ ํƒ์ง€์˜ ๊ธฐ์ค€์ด ๋˜๋Š” reconstruction error๋ฅผ anomaly score๋ผ๊ณ  ๋ถ€๋ฅธ๋‹ค.
AE ๊ธฐ๋ฐ˜์˜ anomaly detection ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ๊ฒฝ์šฐ ํ•™์Šต์ด ์šฉ์ดํ•˜๋‹ค๋Š” ์žฅ์ ์ด ์žˆ์ง€๋งŒ ์ •์ƒ ๋ฐ์ดํ„ฐ์˜ ๋ถ„ํฌ์™€ ์œ ์‚ฌํ•œ ๋น„์ •์ƒ ๋ฐ์ดํ„ฐ๊ฐ€ ๋“ค์–ด์˜ฌ ๊ฒฝ์šฐ ์ด๋ฅผ ์ž˜ ๊ตฌ๋ณ„ํ•˜์ง€ ๋ชปํ•œ๋‹ค๋Š” ๋‹จ์ ์ด ์กด์žฌํ•œ๋‹ค. ์ด๋Š” AE์˜ ์••์ถ• ๊ณผ์ •์—์„œ ๋ณต์›์— ๋ถˆํ•„์š”ํ•œ ์ •๋ณด๋ฅผ ์ œ๊ฑฐํ•˜๊ธฐ ๋•Œ๋ฌธ์ธ๋ฐ ํ•™์Šต ๋‹จ๊ณ„์—์„œ ์ •์ƒ ๋ฐ์ดํ„ฐ๋งŒ์„ ์‚ฌ์šฉํ•œ๋‹ค๋Š” ํŠน์„ฑ๊ณผ ๋งž๋ฌผ๋ ค ๋น„์ •์ƒ์„ ํƒ์ง€ํ•  ์ˆ˜ ์žˆ๋Š” abnormal information์ด ์†Œ๊ฑฐ๋˜๋Š” ํŠน์ง•์ด ์žˆ๋‹ค. ๋‹ค์‹œ ๋งํ•˜๋ฉด ์ •์ƒ ๋ฐ์ดํ„ฐ๋งŒ์„ ํ•™์Šตํ•œ AE์˜ ๊ฒฝ์šฐ ๋น„์ •์ƒ ๋ฐ์ดํ„ฐ๊ฐ€ ๋“ค์–ด์˜ค๋”๋ผ๋„ ์ตœ๋Œ€ํ•œ ์ •์ƒ์ฒ˜๋Ÿผ ๋ณต์›ํ•˜๋Š” ์„ฑ์งˆ์ด ์กด์žฌํ•˜๊ธฐ์— ๋ฏธ์„ธํ•œ ์ฐจ์ด์˜ anomaly sample์„ ๊ฒ€์ถœํ•˜์ง€ ๋ชปํ•œ๋‹ค๋Š” ๋‹จ์ ์ด ์กด์žฌํ•œ๋‹ค.
โ€‹
Figure 03 : Autoencoder based anomaly detection architecture

โ€‹

Generative adversarial networks based anomaly detection

GANs๋Š” ๊ฐ€์ƒ ๋ฐ์ดํ„ฐ(fake data)๋ฅผ ์ƒ์„ฑํ•˜๋Š” generator์™€ ์‹ค์ œ(real)์™€ fake data๋ฅผ ๊ตฌ๋ถ„ํ•˜๋Š” discriminator๋กœ ๊ตฌ์„ฑ๋œ๋‹ค. GANs ๊ธฐ๋ฐ˜ anomaly detection์€ real data๋ฅผ normal๋กœ, fake data๋ฅผ abnormal๋กœ ํŒ๋‹จํ•˜์—ฌ ํ•™์Šต์„ ์ง„ํ–‰ํ•˜๋ฉฐ ์ „๋ฐ˜์ ์ธ ๊ตฌ์กฐ๋Š” Figure 04์™€ ๊ฐ™๋‹ค [2].
Figure 04 : GANs based anomaly detection architecture (GANomaly)
GANs ๊ธฐ๋ฐ˜ ๋ฐฉ๋ฒ•์˜ ๊ฒฝ์šฐ input sequence์˜ ์••์ถ• ๋ฐ ๋ณต์›์„ generator๊ฐ€ ๋‹ด๋‹นํ•œ๋‹ค. ์ด ๋•Œ generator์˜ ๋ชฉ์ ์€ discriminator๋ฅผ ์†์ด๋Š” ๊ฒƒ์ด๊ธฐ์— generator์˜ encoder, decoder๊ฐ€ real ์ •๋ณด ๋ฟ ์•„๋‹ˆ๋ผ fake์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ํฌํ•จํ•˜๋„๋ก ๊ฐ•์ œ๋œ๋‹ค. ์ฆ‰ discriminator์˜ ๋„์ž…์œผ๋กœ ์••์ถ•๊ณผ ๋ณต์›์„ ๋‹ด๋‹นํ•˜๋Š” encoder, decoder๊ฐ€ ๋ณด๋‹ค ์ž์„ธํ•˜๊ฒŒ ๋น„์ •์ƒ ๋ฐ์ดํ„ฐ๋ฅผ ๊ตฌ๋ถ„ํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋˜์–ด ๊ธฐ์กด AE๊ธฐ๋ฐ˜ ๋ชจ๋ธ์˜ ๋‹จ์ ์„ ๋ณด์™„ํ•˜๋Š” ํŠน์ง•์ด ์žˆ๋‹ค. ๋‹ค๋งŒ com-puter vision ๋“ฑ์—์„œ ๋ฐœ์ƒํ•˜๋Š” GANs์˜ ์ผ๋ฐ˜์ ์ธ ๋ฌธ์ œ์ ๊ณผ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ์•ˆ์ •์ ์ธ ํ•™์Šต์ด ์–ด๋ ต๋‹ค๋Š” ๋‹จ์ ์ด ์กด์žฌํ•œ๋‹ค.
USAD๋Š” ๋‘ ๊ฐ€์ง€ ๋ฐฉ๋ฒ•์˜ ์žฅ์ ์„ ๋ชจ๋‘ ์ทจํ•˜๋Š” ๋ชจ๋ธ์ด๋‹ค. ํ•™์Šต์ด ์‰ฝ๊ณ  ์•ˆ์ •์ ์ธ ๊ฒฐ๊ณผ๋ฅผ ๋‚ผ ์ˆ˜ ์žˆ๋Š” AE์˜ ์žฅ์ ๊ณผ discriminator์˜ ๋„์ž…์œผ๋กœ abnormal information์„ ํฌํ•จํ•  ์ˆ˜ ์žˆ๋Š” GANs์˜ ์žฅ์ ์„ ๊ฒฐํ•ฉํ•˜์˜€๋‹ค. AE๋ฅผ ์‚ฌ์šฉํ•˜๋˜ adversarial training์„ ์ ์šฉํ•˜์—ฌ ๋ณด๋‹ค ์ƒ์„ธํ•œ anomaly detection์„ ์ถ”๊ตฌํ•˜์˜€๋‹ค.
โ€‹

Methods

Figure 05 : USAD architecture
โ€‹
USAD์˜ architecture๋Š” Figure 05์™€ ๊ฐ™๋‹ค. Adversarial training์„ ์ ์šฉํ•˜๊ธฐ ์œ„ํ•ด 2๊ฐœ์˜ decoder๋ฅผ ์‚ฌ์šฉํ•˜์˜€๋‹ค๋Š” ์  ์™ธ์—๋Š” ๊ธฐ์กด์˜ AE ๊ธฐ๋ฐ˜์˜ anomaly detection๊ณผ ๋™์ผํ•œ ๊ตฌ์กฐ๋ฅผ ๊ฐ–๋Š”๋‹ค. Notation์„ ์ž ์‹œ ์ •๋ฆฌํ•˜๋ฉด ๋‘ ๊ฐœ์˜ decoder๋Š” ๊ฐ๊ฐ
D1D_1
๊ณผ
D2D_2
๋กœ ํ‘œ๊ธฐ๋˜๋ฉฐ ์ด๋“ค์€ ๋™์ผํ•œ encoder network
E E
๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค.
WW
๋ฅผ training data ์•ˆ์— ์žˆ๋Š” sequence์˜ window๋ผ๊ณ  ํ•˜์˜€์„ ๋•Œ ๋‘ ๊ฐœ์˜ Decoder์— ๋”ฐ๋ผ AE๋Š” ๊ฐ๊ฐ
AE1=D1(E(W))AE_1=D_1(E(W))
,
AE2=D2(E(W))AE_2=D_2(E(W))
์™€ ๊ฐ™์ด ํ‘œ๊ธฐ๋œ๋‹ค. ์ด์ œ USAD์˜ ํ•™์Šต ๋‹จ๊ณ„์™€ ํƒ์ง€ ๋‹จ๊ณ„์— ๋Œ€ํ•ด์„œ ์•Œ์•„๋ณด์ž.
โ€‹

Training process

USAD๋Š” ๋‘ ๋‹จ๊ณ„๋ฅผ ๊ฑฐ์ณ์„œ ํ•™์Šต์ด ์ง„ํ–‰๋œ๋‹ค. ์ฒซ ๋ฒˆ์งธ ๋‹จ๊ณ„๋Š” AE training ๋‹จ๊ณ„๋กœ ๊ธฐ์กด์˜ AE๋ฅผ ํ•™์Šตํ•˜๋Š” ๊ฒƒ๊ณผ ๋™์ผํ•œ ๊ณผ์ •์„ ๊ฑฐ์นœ๋‹ค. ์ด๋ฅผ ์ˆ˜์‹์œผ๋กœ ํ‘œํ˜„ํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.
ํ•™์Šต์˜ ์ฒซ ๋ฒˆ์งธ ๋‹จ๊ณ„์ธ AE training ๊ณผ์ •์—์„œ๋Š” ๊ฐ๊ฐ์˜ AE๊ฐ€ training data์— ์†ํ•ด์žˆ๋Š” input
WW
(real, normal)๋ฅผ ์ž˜ ๋ณต์›ํ•˜๋„๋ก ํ•™์Šต์„ ์ง„ํ–‰ํ•œ๋‹ค. ๋‹ค์Œ์œผ๋กœ ๋‘ ๋ฒˆ์งธ ๋‹จ๊ณ„์ธ adversarial training ๊ณผ์ •์„ ์•Œ์•„๋ณด์ž.
Adversarial training ๊ณผ์ •์—์„œ ๊ฐ๊ฐ์˜ autoencoder (
AE1,AE2AE_1, AE_2
)๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋ชฉ์ ์„ ์ง€๋‹Œ๋‹ค.
  • Train
    AE2AE_2
    to distinguish the real data from the data coming from
    AE1AE_1
    , and train
    AE1AE_1
    to fool
    AE2AE_2
โ€‹
AE2AE_2
๋Š” real data์ธ
WW
์™€
AE1AE_1
๋กœ ๋ถ€ํ„ฐ ๋ณต์›๋œ fake data์ธ
AE1(W)AE_1(W)
๋ฅผ ๊ตฌ๋ถ„ํ•˜๋„๋ก ํ•™์Šตํ•œ๋‹ค. ๋ฐ˜๋ฉด
AE1AE_1
์€
AE2AE_2
์˜ real๊ณผ fake์— ๋Œ€ํ•œ ํŒ๋ณ„ ์„ฑ๋Šฅ์„ ์ €ํ•˜์‹œํ‚ค๋Š” ๊ฒƒ์„ ๋ชฉ์ ์œผ๋กœ ํ•™์Šต์ด ์ง„ํ–‰๋œ๋‹ค. ์ฆ‰
AE1AE_1
์€ GANs์—์„œ generator์˜ ์—ญํ• ์„,
AE2AE_2
๋Š” discriminator์˜ ์—ญํ• ์„ ์ˆ˜ํ–‰ํ•œ๋‹ค. Adversarial training ๋‹จ๊ณ„์—์„œ์˜ ๋ชฉ์ ํ•จ์ˆ˜๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ํ‘œ๊ธฐํ•  ์ˆ˜ ์žˆ๋‹ค.
๊ฐ๊ฐ์˜ ์ˆ˜์‹์ด ๊ฐ–๋Š” ์˜๋ฏธ๋ฅผ ์‚ดํŽด๋ณด์ž. ๋จผ์ € generator ์—ญํ• ์„ ์ˆ˜ํ–‰ํ•˜๋Š” ์ฒซ ๋ฒˆ์งธ autoencoder๋Š” real data
WW
์™€ fake์— ๋Œ€ํ•œ
AE2AE_2
์˜ output์ธ
AE2(AE1(W))AE_2(AE_1(W))
๊ฐ„์˜ ์ฐจ์ด๋ฅผ ์ตœ์†Œํ™” ์‹œ์ผœ์•ผ discriminator๋ฅผ ์†์ผ ์ˆ˜ ์žˆ๋‹ค. ๋ฐ˜๋Œ€๋กœ discriminator ์—ญํ• ์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๋‘ ๋ฒˆ์งธ autoencoder๋Š” fake data๊ฐ€ input์œผ๋กœ ๋“ค์–ด ์™”์„ ๋•Œ real data์™€์˜ ์ฐจ์ด๋ฅผ ์ตœ๋Œ€ํ™” ์‹œ์ผœ์•ผ(=large reconstruction error) real๊ณผ fake๋ฅผ ์ž˜ ๊ตฌ๋ณ„ํ•˜๊ณ  ์žˆ๋‹ค๊ณ  ๋ณผ ์ˆ˜ ์žˆ๋‹ค. ๋”ฐ๋ผ์„œ
AE1AE_1
์€ fake์— ๋Œ€ํ•œ
AE2AE_2
์˜ reconstruction error๋ฅผ ์ค„์ด๋„๋ก ํ•™์Šตํ•˜๊ณ , ๋ฐ˜๋Œ€๋กœ
AE2AE_2
๋Š” fake์— ๋Œ€ํ•œ reconstruction error๋ฅผ ํ‚ค์šฐ๋„๋ก ํ•™์Šตํ•œ๋‹ค.
์œ„์˜ ๋‘ ๊ฐ€์ง€ training phase๋ฅผ ํ•˜๋‚˜์˜ loss function์œผ๋กœ ํ‘œ๊ธฐํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.
๋จผ์ € ์œ„ ์ˆ˜์‹์˜ ์„ธ๋ถ€ term์— ๋Œ€ํ•ด์„œ ์„ค๋ช…ํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค. ๊ฐ๊ฐ์˜ loss์—์„œ ์•ž์˜ term์€ real data์— ๋Œ€ํ•œ reconstruction error๋กœ์จ ์›๋ณธ ๋ฐ์ดํ„ฐ๋ฅผ ์ž˜ ๋ณต์›ํ•˜๋Š” AE training์˜ loss์— ํ•ด๋‹นํ•œ๋‹ค. AE training term์„ ํ†ตํ•ด์„œ ์›๋ž˜์˜ input์„ ์ž˜ ๋ณต์›ํ•˜๋„๋ก ํ•™์Šต์ด ์ง„ํ–‰๋œ๋‹ค. ๋‹ค์Œ์œผ๋กœ ๋‘ ๋ฒˆ์งธ term์˜ ๊ฒฝ์šฐ adversarial training์˜ loss๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋‹ค. ์•ž์—์„œ ์‚ดํŽด๋ณธ minimize, maximize term์ด ๋ถ€ํ˜ธ๋กœ์จ ๊ฐ๊ฐ +,-๋กœ ํ‘œ๊ธฐ๊ฐ€ ๋œ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ ๋‘ term์„ ํ•ฉํ•  ๋•Œ ๊ณฑํ•ด์ง€๋Š” ๋ถ„๋ชจ์˜
nn
์€ ํ•™์Šต ์ค‘์ธ epoch์„ ๋œปํ•˜์—ฌ ํ•™์Šต ์ดˆ๋ฐ˜์—๋Š” real data์— ๋Œ€ํ•œ reconstruction error์— ๊ฐ€์ค‘์น˜๋ฅผ ์ฃผ๊ณ  ํ•™์Šต ํ›„๋ฐ˜์—๋Š” adversarial training์— ๊ฐ€์ค‘์น˜๋ฅผ ์ฃผ๋Š” ์—ญํ• ์„ ํ•œ๋‹ค.
์ด์ œ ๊ฐ๊ฐ์˜ AE ๊ด€์ ์—์„œ loss๋ฅผ ์ดํ•ดํ•ด๋ณด์ž.
AE1AE_1
์— ์ ์šฉ๋˜๋Š” loss๋ฅผ ์‚ดํŽด๋ณด๋ฉด real data์— ๋Œ€ํ•œ reconstruction error์™€ fake data์— ๋Œ€ํ•œ
AE2AE_2
์˜ reconstruction error๊ฐ€ ๋ชจ๋‘ ์ตœ์†Œ์ผ ๋•Œ ์ตœ์†Œ๊ฐ’์„ ๊ฐ–๋Š”๋‹ค. ์ด๋ฅผ ํ†ตํ•ด
AE2AE_2
๊ฐ€ fake data์— ๋Œ€ํ•œ ํŒ๋ณ„๋ ฅ์ด ๋–จ์–ด์ง€๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•˜๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. ๋ฐ˜๋Œ€๋กœ
AE2AE_2
์— ์ ์šฉ๋˜๋Š” loss๋ฅผ ์‚ดํŽด๋ณด๋ฉด real data์— ๋Œ€ํ•œ reconstruction error๊ฐ€ ์ตœ์†Œ์ด๊ณ , fake data์— ๋Œ€ํ•œ
AE2AE_2
์˜ reconstruction error๊ฐ€ ์ตœ๋Œ€์ผ ๋•Œ ์ตœ์†Œ๊ฐ’์„ ๊ฐ–๋Š”๋‹ค. ์ฆ‰ fake data๊ฐ€ ๋“ค์–ด์™”์„ ๋•Œ anomaly๊ฐ€ ๋“ค์–ด์™”๋‹ค๋Š” ์‹ ํ˜ธ์ธ reconstruction error๋ฅผ ํฌ๊ฒŒ ๋‚ด๋ฑ‰๋„๋ก ํ•™์Šต์ด ์ง„ํ–‰๋œ๋‹ค. GANs ๊ธฐ๋ฐ˜ anomaly detection์—์„œ real์€ normal, fake์€ abnormal๋กœ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์„ ๊ฐ์•ˆํ•˜๋ฉด
AE2AE_2
๋Š” ๊ฒฐ๊ตญ normal๊ณผ abnormal์˜ ๋ฏธ์„ธํ•œ ์ฐจ์ด๋ฅผ ๊ทน๋Œ€ํ™”์‹œํ‚ค๋Š” ์—ญํ• ์„ ํ•˜๊ฒŒ ๋œ๋‹ค. ๊ธฐ์กด์˜ AE ๊ธฐ๋ฐ˜ anomaly detection model๊ณผ ๋น„๊ตํ•˜์—ฌ ๋†’์€ ์„ฑ๋Šฅ์„ ๋ณด์ผ ์ˆ˜ ์žˆ๋Š” ์ด์œ ๊ฐ€ ์—ฌ๊ธฐ์— ์žˆ๋‹ค.
โ€‹

Detection process

ํ•™์Šต์ด ์™„๋ฃŒ๋œ USAD๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์‹ค์ œ anomaly detection์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๊ณผ์ •์„ ์•Œ์•„๋ณด๊ฒ ๋‹ค. ๋จผ์ € USAD์˜ anomaly score ์‚ฐ์ถœ ๊ณต์‹์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.
โ€‹
W^\hat{W}
์€ unseen data, ์ฆ‰ train data์— ์กด์žฌํ•˜์ง€ ์•Š์€ ์ƒˆ๋กœ์šด sequence์˜ window๋ฅผ ๋œปํ•˜๋ฉฐ normal๊ณผ abnor-mal์ด ํ˜ผ์žฌ๋˜์–ด์žˆ๋‹ค. USAD์˜ anomaly score๋Š” input๊ณผ
AE1AE_1
์˜ reconstruction error์™€ ๋”๋ถˆ์–ด input๊ณผ fake์— ๋Œ€ํ•œ
AE2AE_2
์˜ reconstruction error์˜ ๊ฐ€์ค‘ํ•ฉ์œผ๋กœ ์‚ฐ์ถœํ•  ์ˆ˜ ์žˆ๋‹ค. USAD์˜ anomaly score๋Š” ๋’ท term์— ํ•ด๋‹นํ•˜๋Š” fake์— ๋Œ€ํ•œ
AE2AE_2
์˜ reconstruction error๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ •์ƒ๊ณผ ๋งค์šฐ ์œ ์‚ฌํ•œ ๋ถ„ํฌ๋ฅผ ์ง€๋‹Œ ๋น„์ •์ƒ ๋ฐ์ดํ„ฐ๊ฐ€ ๋“ค์–ด์˜ค๋”๋ผ๋„ ์ด๋ฅผ ํƒ์ง€ํ•  ์ˆ˜ ์žˆ๊ฒŒ๋œ๋‹ค.
๋‘ ๊ฐ€์ง€ term์„ ๊ฒฐํ•ฉํ•  ๋•Œ ์‚ฌ์šฉ๋˜๋Š” hyper-parameter์ธ
ฮฑ\alpha
์™€
ฮฒ\beta
ํ•ฉ์€ 1๋กœ ์„ค์ •์ด ๋˜๋ฉฐ parameter setting์— ๋”ฐ๋ผ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๊ฒฐ๊ณผ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค.
โ€‹
ฮฒ\beta
๊ฐ€ ์ปค์ง„๋‹ค๋Š” ๊ฒƒ์€ ์ •์ƒ ๋ถ„ํฌ์™€ ์•ฝ๊ฐ„๋งŒ ๋‹ฌ๋ผ์ง€๋”๋ผ๋„ ํฐ anomaly score๋ฅผ ๋ฐœ์ƒ์‹œํ‚ค๊ธฐ ๋•Œ๋ฌธ์— detection์˜ ํšŸ์ˆ˜๋Š” ์ฆ๊ฐ€ํ•˜๊ณ , ์ด์— ๋”ฐ๋ผ detection sensitivity๊ฐ€ ๋†’์•„์ง€๊ฒŒ ๋œ๋‹ค. ๋ฐ˜๋Œ€๋กœ real data์— ๋Œ€ํ•œ reconstruction error๋ฅผ ๋”์šฑ ํฌ๊ฒŒ ๋ฐ˜์˜ํ•  ๊ฒฝ์šฐ ๋ฏธ์„ธํ•œ anomaly detection์€ ๋ถˆ๊ฐ€ํ•˜์—ฌ detection ์ˆ˜๋Š” ๊ฐ์†Œํ•˜๊ณ  detection sensitivity๋Š” ๋‚ฎ์•„์ง€๊ฒŒ ๋œ๋‹ค.
โ€‹

Experiments

USAD๋Š” ์ด 5๊ฐ€์ง€ public datasets์™€ 1๊ฐ€์ง€ private dataset์„ ํ™œ์šฉํ•˜์—ฌ ์„ฑ๋Šฅ์„ ๊ฒ€์ฆํ•˜์˜€๋‹ค. 5๊ฐ€์ง€ public datasets์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.
  • Secure water treatment (SWatT)
  • Water distribution (WADI)
  • Server machine dataset (SMD)
  • Soil moisture active passive (SMAP)
  • Mars science laboratory (MSL)
๊ฐ๊ฐ์˜ datasets์— ๋Œ€ํ•œ ํ‰๊ฐ€ ์ง€ํ‘œ๋Š” precision, recall, F1 score, F1 star๋กœ ๊ตฌ์„ฑ๋˜๋ฉฐ F1 star์˜ ๊ฒฝ์šฐ precision๊ณผ recall์˜ ํ‰๊ท ์„ ์‚ฌ์šฉํ•˜์—ฌ ๋„์ถœ๋œ ๊ฐ’์ด๋‹ค.
๋‹ค์Œ์€ 5๊ฐ€์ง€ datasets์— ๋Œ€ํ•œ ๋น„๊ต ๋ชจ๋ธ๊ณผ์˜ ์„ฑ๋Šฅ ๋น„๊ตํ‘œ์ด๋‹ค. ์ฒซ ๋ฒˆ์งธ ํ‘œ์˜ without/with๋Š” point-adjust ์ ์šฉ ์—ฌ๋ถ€๋ฅผ ๋œปํ•˜๋ฉฐ point-adjust๋ž€ ๊ฐ๊ฐ์˜ observation/time-point์— ๋Œ€ํ•ด์„œ ๋…๋ฆฝ์ ์œผ๋กœ anomaly๋ฅผ detecting ํ•œ๊ฒƒ์œผ๋กœ ์ดํ•ดํ•˜์˜€๋‹ค.
Point-adjust : detect each observation/time-point independently and assigns a label to single time-point
Figure 06 : Performance table 01
๋‹ค์Œ์€ 5๊ฐ€์ง€ datasets์— ๋Œ€ํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜๋ณ„ average performance (standard deviation)์„ ๊ธฐ๋กํ•œ ํ‘œ์ด๋‹ค.
Figure 07 : Performance table 02
์„ฑ๋Šฅ ๋น„๊ต ํ‘œ๋ฅผ ์ •๋ฆฌํ•˜๋ฉด ๋Œ€๋ถ€๋ถ„์˜ ๊ฒฝ์šฐ์—์„œ USAD๊ฐ€ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. ํŠนํžˆ AE๋ฅผ ํ™œ์šฉํ•œ anomaly detection๊ณผ ๋น„๊ตํ•˜๋ฉด ์›”๋“ฑํ•œ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ค€๋‹ค. ๋‹ค๋งŒ MAD-GAN, TadGAN์„ ๋น„๋กฏํ•œ GANs ๊ธฐ๋ฐ˜์˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜๊ณผ ๋น„๊ต์‹œ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ค„ ์ˆ˜ ์žˆ์„์ง€๋Š” ๋ฏธ์ง€์ˆ˜๋‹ค. Adversarial training์ด ์ ์šฉ๋œ ๋ชจ๋ธ์ธ ๋งŒํผ GANs ๊ธฐ๋ฐ˜ anomaly detection๊ณผ์˜ ์„ฑ๋Šฅ ๋น„๊ต๋„ ์žˆ์—ˆ์œผ๋ฉด ์ข‹์•˜์„ ๊ฒƒ ๊ฐ™๋‹ค.
๋‹ค์Œ์€ SWaT data์— ๋Œ€ํ•œ hyper-parameter๋ณ„ ์„ฑ๋Šฅ์„ ๊ธฐ๋กํ•œ ํ‘œ์ด๋‹ค. ์•ž์„œ ๋งํ•œ๋Œ€๋กœ
ฮฒ\beta
๊ฐ’์ด ์ปค์ง์— ๋”ฐ๋ผ detection์˜ ์ˆ˜๊ฐ€ ๋งŽ์•„์ง์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. ์‹ ๊ธฐํ•˜๊ฒŒ๋„
ฮฒ\beta
๋ฅผ 1๋กœ ๋‘์–ด adversarial term๋งŒ์„ ๋ฐ˜์˜ํ–ˆ์„ ๋•Œ๊ฐ€ F1 score ๊ธฐ์ค€์œผ๋กœ ๊ฐ€์žฅ ์ข‹์•˜๋˜ ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค. ์ด๋Š” SWaT data์˜ anomaly sample์ด normal sample๊ณผ์˜ ์ฐจ์ด๊ฐ€ ํฌ๊ฒŒ ๋‚˜์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์ด ์•„๋‹Œ๊ฐ€ ์ถ”์ธกํ•ด๋ณธ๋‹ค.
Figure 08 : Performance table 03
๋งˆ์ง€๋ง‰์œผ๋กœ USAD์—์„œ adversarial training์˜ ํšจ๊ณผ๋ฅผ ํ™•์ธํ•˜๊ธฐ ์œ„ํ•œ ablation study ๊ฒฐ๊ณผ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.
Figure 09 : Ablation study (adversarial training)
3๊ฐ€์ง€ datasets์— ๋Œ€ํ•ด์„œ ์ˆœ์ˆ˜ adversarial์˜ ์„ฑ๋Šฅ์€ ๊ฐ€์žฅ ๋‚ฎ์œผ๋‚˜ AE์™€ ๊ฒฐํ•ฉํ–ˆ์„ ๋•Œ ๊ฐ€์žฅ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์ธ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. ์•ž์—์„œ hyper-parameter๋ณ„ ๋ชจ๋ธ ์„ฑ๋Šฅ๊ณผ๋Š” ๋‹ค๋ฅธ ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜์˜จ์ ์ด ์˜์•„ํ•˜๊ธฐ ํ–ˆ๋‹ค.
USAD๋Š” ๊ธฐ์กด์˜ AE ๊ธฐ๋ฐ˜์˜ anomaly detection ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ์žฅ์ ๊ณผ GANs ๊ธฐ๋ฐ˜์˜ anomaly detection ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ์žฅ์ ์„ ๊ฒฐํ•ฉํ•œ ๋ฐฉ๋ฒ•์ด๋‹ค. ํ•ด๋‹น ๋ฐฉ๋ฒ•์˜ ์„ฑ๋Šฅ ํ–ฅ์ƒ ํšจ๊ณผ๋Š” ์‹คํ—˜์„ ํ†ตํ•ด์„œ AE ๊ธฐ๋ฐ˜์˜ ๋ฐฉ๋ฒ•๋ณด๋‹ค๋Š” ํšจ์œจ์ ์ž„์„ ๋ณด์˜€์œผ๋‚˜ GANs ๊ธฐ๋ฐ˜์˜ ๋ฐฉ๋ฒ•๊ณผ์˜ ๋น„๊ต๋Š” ์กด์žฌํ•˜์ง€ ์•Š์•˜๋‹ค. ๋˜ํ•œ ๋„์ถœ๋œ anomaly score๋กœ๋ถ€ํ„ฐ abno-rmal์„ ํŒ๋‹จํ•˜๋Š” thresholding์— ๋Œ€ํ•œ ์ž์„ธํ•œ ์„ค๋ช…์ด ์กด์žฌํ•˜์ง€ ์•Š์•˜๋‹ค.

Reference

[1] https://link.springer.com/article/10.1007/s10994-019-05815-0?shared-article-renderer
[2] Akcay, S., Atapour-Abarghouei, A., & Breckon, T. P. (2018, December). Ganomaly: Semi-supervised anomaly detection via adversarial training. In Asian conference on computer vision (pp. 622-637). Springer, Cham.
โ€‹
โœ“\checkmark
๋…ผ๋ฌธ์˜ concept ๋ฐ idea ์œ„์ฃผ๋กœ ์ •๋ฆฌํ•˜์—ฌ ์ž์„ธํ•œ ์ˆ˜์‹์ด๋‚˜ ๋‚ด์šฉ์— ์˜ค๋ฅ˜๊ฐ€ ์žˆ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
โ€‹
โœ“\checkmark
๋ณธ ํฌ์ŠคํŒ…์€ SKT AI Fellowship 3๊ธฐ, unsupervised time-series anomaly detection ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๊ฐœ๋ฐœ์— ํ™œ์šฉํ•œ ๋ชจ๋ธ์„ ์„ค๋ช…ํ•˜๊ธฐ ์œ„ํ•ด ์ž‘์„ฑํ•˜์˜€์Šต๋‹ˆ๋‹ค.