[ICLR 2020] Distance-Based Learning from Errors for Confidence Calibration

Paper url: https://arxiv.org/pdf/1912.01730.pdf

Author and affiliation

Figure 01 : paper snapshot

Background

์˜ค๋Š˜๋‚  neural net์€ ์˜ˆ์ „๋ณด๋‹ค ์ •ํ™•๋„๋Š” ๋งŽ์ด ํ–ฅ์ƒ๋˜์—ˆ์ง€๋งŒ calibration์€ ์ข‹์ง€ ์•Š์€ ํŽธ์— ์†ํ•œ๋‹ค. Calibration์ด๋ž€ ๋ชจํ˜•์˜ ์ถœ๋ ฅ๊ฐ’์ด ์‹ค์ œ confidence๋ฅผ ๋ฐ˜์˜ํ•˜๋Š” ๊ฒƒ์„ ๋งํ•˜๋ฉฐ calibrated confidence๋ผ๊ณ ๋„ ํ•œ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด COVID-19์˜ ์–‘์„ฑ๊ณผ ์Œ์„ฑ์„ ๋ถ„๋ฅ˜ํ•˜๋Š” task๊ฐ€ ์ฃผ์–ด์กŒ๋‹ค๊ณ  ๊ฐ€์ •ํ•ด๋ณด์ž. ํ™˜์ž A์— ๋Œ€ํ•œ ๋ชจํ˜•์˜ ์ถœ๋ ฅ์ด 0.8์ด ๋‚˜์™”์„ ๋•Œ, ์‹ค์ œ๋กœ๋„ 80% ํ™•๋ฅ ๋กœ ์–‘์„ฑ์ด๋ผ๋ฉด calibration์ด ์ž˜ ์ด๋ฃจ์–ด์กŒ๋‹ค๊ณ  ๋ณธ๋‹ค. ์ฆ‰ ๋ชจํ˜•์˜ ์ถœ๋ ฅ๊ฐ’๊ณผ(confidence) ์‹ค์ œ ํ™•๋ฅ ์„ ๋™์ผํ•˜๊ฒŒ ๋งŒ๋“œ๋Š” ๊ฒƒ์ด model calibration์˜ ๋ชฉ์ ์ด๋‹ค.
Calibration์ด ์ž˜ ์ด๋ฃจ์–ด์กŒ๋‹ค๋Š” ๊ฒƒ์„ ์–ด๋–ป๊ฒŒ ์ž…์ฆํ•  ์ˆ˜ ์žˆ์„๊นŒ? ๋งŒ์•ฝ ๋ชจํ˜•์˜ ์ถœ๋ ฅ๊ฐ’์ด ์‹ค์ œ confidence๋ฅผ ๋ฐ˜์˜ํ•œ๋‹ค๋ฉด confidence์™€ accuracy๊ฐ€ ์ผ์น˜ํ•ด์•ผํ•œ๋‹ค. ๋ชจ๋ธ์ด 0.8์˜ confidence๋กœ ์˜ˆ์ธกํ•œ sample๋“ค์˜ ๊ฒฝ์šฐ 0.8์˜ accuracy๋ฅผ ๊ฐ€์ง„๋‹ค๋ฉด confidence์™€ ์‹ค์ œ ํ™•๋ฅ ์€ ๋™์ผํ•˜๋‹ค๊ณ  ๋ณผ ์ˆ˜ ์žˆ๋‹ค. ๊ฐ•์•„์ง€์™€ ๊ณ ์–‘์ด๋ฅผ ๋ถ„๋ฅ˜ํ•˜๋Š” ํ•™์Šต๋œ ๋ชจ๋ธ์ด ์žˆ์„ ๋•Œ, ๋ชจ๋ธ์˜ confidence๊ฐ€ 0.8์ธ sample๋“ค์„ ๋ชจ์•„์„œ accuracy๋ฅผ ์ธก์ •ํ•  ๊ฒฝ์šฐ 0.8์— ๊ทผ์‚ฌํ•œ ๊ฐ’์„ ๊ฐ€์ ธ์•„ํ•œ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค.
๊ทธ๋Ÿฌ๋‚˜ ์•ž์„œ ๋งํ–ˆ๋“ฏ ์˜ค๋Š˜๋‚  neural net์€ over confidentํ•œ ๋ฌธ์ œ๋ฅผ ๊ฐ–๊ณ  ์žˆ๋‹ค. ์•„๋ž˜ ๊ทธ๋ฆผ์€ CNN์˜ ์ดˆ์ฐฝ๊ธฐ ๋ชจ๋ธ์ธ LeNet-5์™€ ๊ทธ์— ๋น„ํ•ด ๋งŽ์€ parameter๋ฅผ ๊ฐ–๊ณ  ์žˆ๋Š” ResNet์— ๋Œ€ํ•œ confidence-accuracy chart์ด๋‹ค. ๋„ํ‘œ์˜ ํ•ด์„์€ LeNet-5์˜ ๊ฒฝ์šฐ CIFAR-100 data set์— ๋Œ€ํ•ด์„œ ๋ชจ๋ธ์˜ confidence๊ฐ€ 0.4์ธ sample๋“ค์„ ๋ชจ์•„ accuracy๋ฅผ ์ธก์ •ํ•ด๋ณด์•˜๋”๋‹ˆ 0.4~0.5 ์ˆ˜์ค€์ด๋ผ๊ณ  ํŒ๋‹จํ•œ๋‹ค. ResNet์˜ ๊ฒฝ์šฐ model capacity๊ฐ€ ๋‚ฎ์€ LeNet-5์™€ ๋‹ฌ๋ฆฌ ๋Œ€๋ถ€๋ถ„์˜ confidence๊ฐ€ 1์— ๊ฐ€๊น๊ป˜ ์ ๋ ค์žˆ๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, confidence ๋ณ„๋กœ accuracy์™€์˜ gap ๋˜ํ•œ ํฐ ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋Š”๋ฐ ์ด๋Ÿฌํ•œ ํ˜„์ƒ์„ over confident๋ผ๊ณ  ํ•œ๋‹ค.
Figure 02 : confidence-accuracy chart
Model confidence๊ฐ€ ์‹ค์ œ ํ™•๋ฅ ์„ ๋ฐ˜์˜ํ•˜๋Š” ๊ฒƒ์ด ์™œ ์ค‘์š”ํ•œ๊ฐ€? ์ด๋Š” ํ˜„์—…์—์„œ neural net์ด ์ธ๊ฐ„์„ ๋Œ€์ฒดํ•  ๋•Œ ์•…์˜ํ–ฅ์„ ๋ผ์น  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ๊ฐ€๋ น ์˜๋ฃŒ ์—…๊ณ„์—์„œ ์งˆ๋ณ‘์„ ์ง„๋‹จํ•˜๋Š”๋ฐ 60% ํ™•๋ฅ ๊ณผ 90% ํ™•๋ฅ ์€ ์—„์—ฐํžˆ ๋‹ค๋ฅด๋‚˜ over confident model์„ ๋ฐ”ํƒ•์œผ๋กœ ์ฒ˜๋ฐฉ์„ ๋‚ด๋ฆฐ๋‹ค๋ฉด ๊ณผํ•œ ์ง„๋ฃŒ๋ฅผ ์ง„ํ–‰ํ•˜๊ฒŒ ๋  ๊ฒƒ์ด๋‹ค. ๋”ฐ๋ผ์„œ model calibration ํ–ฅ์ƒ์„ ์œ„ํ•œ ๋…ธ๋ ฅ์€ ๋‹ค์–‘ํ•˜๊ฒŒ ์กด์žฌํ•ด ์™”์œผ๋ฉฐ ๋Œ€ํ‘œ์ ์œผ๋กœ label smoothing, mix up ๋“ฑ์ด ์žˆ๋‹ค.
๋จผ์ € label smoothing์ด๋ž€ label์„ 0๊ณผ 1๋กœ ๋‘์–ด์„œ ํ•™์Šตํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹Œ smoothํ•˜๊ฒŒ ๋ถ€์—ฌํ•˜์—ฌ ๊ณผ๋„ํ•˜๊ฒŒ ํ•™์Šตํ•˜๋Š” ๊ฒƒ์„ ๋ง‰๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค. ๊ฐ•์•„์ง€์™€ ๊ณ ์–‘์ด์— ๋Œ€ํ•œ label์„ 0๊ณผ 1๋กœ ํ• ๋‹นํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹Œ 0.1, 0.9๋กœ ๋‘์–ด ํ•™์Šตํ•˜๋Š” ๊ฒƒ์ด ๊ทธ ์˜ˆ์ด๋‹ค. ์ด๋Š” ๋ชจ๋ธ์˜ ์˜ˆ์ธก๊ฐ’์ด ๊ทน๋‹จ์— ์น˜๋‹ซ์ง€ ์•Š๊ฒŒ ํ•ด์ฃผ์–ด regularization์— ๋„์›€์„ ์ฃผ๋ฉด์„œ model generalization๊ณผ calibration์— ๋„์›€์ด ๋œ๋‹ค.
๋‹ค์Œ์œผ๋กœ mix up์€ ๋‘ ๊ฐœ์˜ random sample์—์„œ ์„ ํ˜•๋ณด๊ฐ„(linear interpolation)์„ ์ ์šฉํ•˜์—ฌ ํ•™์Šตํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค. ์•„๋ž˜์˜ ๊ทธ๋ฆผ์ฒ˜๋Ÿผ ์‚ฌ์šฉ์ž๊ฐ€ ์ž„์˜๋กœ ์„ค์ •ํ•œ
ฮป\lambda
์— ๋”ฐ๋ผ์„œ sample image์™€ label์„ ๊ฐ€์ค‘ํ•ฉ ํ•˜์—ฌ ํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค. ๊ฐ€๋ น
ฮป\lambda
๋ฅผ 0.5๋กœ ํ• ๋‹นํ•  ๊ฒฝ์šฐ image๋ฅผ ๋ฐ˜๋ฐ˜ ์„ž๊ณ  label ๋˜ํ•œ ์•ต๋ฌด์ƒˆ 0.5, ๋„๋งˆ๋ฑ€ 0.5๋กœ ์„ž์–ด์„œ ํ•™์Šต์„ ์ง„ํ–‰ํ•œ๋‹ค. ์ด๋Š” label smoothing๊ณผ ๋™์ผํ•˜๊ฒŒ label์ด 0๊ณผ 1๋กœ ๊ทน๋‹จ๊ฐ’์œผ๋กœ ์น˜๋‹ซ๋Š” ํ˜„์ƒ์„ ๋ง‰๊ฒŒํ•ด์ฃผ์–ด calibration์— ๋„์›€์ด ๋œ๋‹ค.
Figure 03 : Mix-up example
๊ทธ๋Ÿฌ๋‚˜ ์œ„ ๋‘๊ฐ€์ง€ ๋ฐฉ๋ฒ• ๋ชจ๋‘ objective function์ด confidence estimation์„ ๋ชฉํ‘œ๋กœ ์‚ผ๊ณ ์žˆ์ง€ ์•Š๋‹ค. ๋‹ค๋ฅธ๋ง๋กœ classification์˜ ์„ฑ๋Šฅ์„ ๋†’์ด๋Š” ๋ฐฉ๋ฒ•์˜ ๋ถ€๊ฐ€์ ์ธ ์‚ฐ๋ฌผ์ด calibration์— ๋„์›€์„ ์ค€ ๊ฒƒ์ด์ง€ ์ง์ ‘์ ์œผ๋กœ calibration์„ target์œผ๋กœ ์‚ผ๊ณ ์žˆ์ง€ ์•Š๋‹ค๋Š” ๊ฒƒ์ด๋‹ค. ์ด๋Ÿฌํ•œ confidence scoring์„ ์ง์ ‘์ ์œผ๋กœ ํ•™์Šต์— ํ™œ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์œผ๋กœ๋Š” temperature scaling์ด ์กด์žฌํ•˜์ง€๋งŒ ํ•ด๋‹น ๋ฐฉ๋ฒ•์˜ ๊ฒฝ์šฐ classification๊ณผ calibration ๋‘ ๊ฐ€์ง€ task๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„ํ• ํ•ด์•ผํ•˜๊ธฐ ๋•Œ๋ฌธ์— data์˜ ๋ถ„๋ฐฐ๋Ÿ‰์— ๋”ฐ๋ผ ๊ฐœ๋ณ„ task์˜ ์„ฑ๋Šฅ์ด trade-off ์„ฑ์งˆ์„ ๋„๊ฒŒ ๋˜๋Š” ๋‹จ์ ์ด ์žˆ๋‹ค. ๋”ฐ๋ผ์„œ ๋ณธ ๋…ผ๋ฌธ์€ model calibration์„ ์ง์ ‘์ ์œผ๋กœ ํ•™์Šต์˜ ๋ชฉํ‘œ๋กœ ์‚ผ์œผ๋ฉฐ ์•ž์„œ ๋งํ•œ ๋‹จ์ ์„ ๊ฐœ์„ ํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ์‚ผ๋Š”๋‹ค.

Proposed Approach

Figure 04 : The training of DBLE
์ „์ฒด concept์„ ๋ณด๋ฉด ์œ„์™€ ๊ฐ™๋‹ค. ๋จผ์ € ๋ฐ์ดํ„ฐ๋ฅผ Query set๊ณผ Support set ๋‘ ๊ฐ€์ง€๋กœ ๋ถ„๋ฆฌํ•˜๋ฉฐ ๊ฐ data set์€ ์ค‘๋ณต๋˜๋Š” sample ์—†์ด ๋…๋ฆฝ์ ์œผ๋กœ ์กด์žฌํ•œ๋‹ค. Support set์€ ๊ฐœ๋ณ„ class ๋“ค์˜ ์ค‘์‹ฌ์ (
pip_{i}
)์„ ๊ตฌํ•˜๋Š”๋ฐ ์‚ฌ์šฉ๋œ๋‹ค. Query set์˜ sample๋“ค์€ classification model์„ ํ†ต๊ณผํ•ด ์ƒˆ๋กœ์šด space๋กœ mapping(
ฮผi\mu_{i}
)๋˜๋ฉฐ ์•ž์„œ ๊ตฌํ•ด์ง„ class์˜ ์ค‘์‹ฌ์ ์— ๊ฐ€๊นŒ์›Œ์ง€๋„๋ก classification model์„ ํ•™์Šตํ•œ๋‹ค.
์š”์•ฝํ•˜๋ฉด support set์œผ๋กœ class์˜ ์ค‘์‹ฌ์ ์„ ๊ตฌํ•˜๊ณ , query sample์ด ์ค‘์‹ฌ์ ๊ณผ์˜ distance๋ฅผ ์ค„์—ฌ๋‚˜๊ฐ€๋ฉฐ classification model์€ ์ตœ์ ์˜ mapping์„ ํ•™์Šตํ•œ๋‹ค. ๊ทธ ํ›„ classification์—์„œ ์˜ค๋ถ„๋ฅ˜๋œ ๋ฐ์ดํ„ฐ๋ฅผ ์ด์šฉํ•˜์—ฌ confidence model์„ ํ•™์Šตํ•œ๋‹ค. Confidence model์˜ ์—ญํ• ์€ ground-truth๊ฐ€ ์—†๋Š” ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด์„œ๋„ confidence๋ฅผ ์ถ”์ •ํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋งŒ๋“œ๋Š” ๊ฒƒ์ด๋ฉฐ ์ž์„ธํ•œ ์„ค๋ช…์€ ํ›„์ˆ ํ•˜๊ฒ ๋‹ค.

โ€‹

Episodic training

ํ•™์Šต ๋‹จ๊ณ„๋ฅผ ๋จผ์ € ์‚ดํŽด๋ณด๋ฉด, ์ผ๋ฐ˜์ ์ธ neural net์˜ ํ•™์Šต ๋ฐฉ์‹์ธ batch ๋‹จ์œ„ ํ•™์Šต๊ณผ๋Š” ๋‹ฌ๋ฆฌ ๋งค update๋งˆ๋‹ค K-shot, N-way sample์„ ์‚ฌ์šฉํ•˜๋Š” episodic training ๋ฐฉ์‹์„ ๋”ฐ๋ฅธ๋‹ค. Episodic training์ด๋ž€ ๋งค episode๋งˆ๋‹ค ์ „์ฒด class M์—์„œ N๊ฐœ์˜ sampled class๋ฅผ ๊ฐ€์ ธ์˜จ ํ›„(๋ฐ˜๋“œ์‹œ N=M์ผ ํ•„์š”๋Š” ์—†๋‹ค), N๊ฐœ์˜ class๋“ค์„ K๊ฐœ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ง€๋‹Œ support set๊ณผ query set์œผ๋กœ ๋ถ„๋ฆฌํ•œ๋‹ค. ์ฆ‰ episode ๋งˆ๋‹ค query์™€ support set์€ ๋ฐ”๋€Œ๋ฉฐ ์ถฉ๋ถ„ํžˆ ๋งŽ์€ episode๊ฐ€ ์ง„ํ–‰๋˜๋ฉด ์ „์ฒด data๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ•™์Šตํ•  ์ˆ˜ ์žˆ๋‹ค. Support set๊ณผ query set์˜ notation์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.
N๊ฐœ์˜ class๋“ค์˜ ์ค‘์‹ฌ์ (
pip_{i}
)์€ classification model์„ ํ†ต๊ณผํ•œ support sample๋“ค์˜ ํ‰๊ท ์„ ํ†ตํ•ด์„œ ๊ตฌํ•ด์ง€๋ฉฐ notation์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.
Episodic training์˜ loss function์€ ์•„๋ž˜์™€ ๊ฐ™์œผ๋ฉฐ ์ž์„ธํ•œ ์‚ฌํ•ญ์€ ์•„๋ž˜์—์„œ ์„ค๋ช…์„ ์ด์–ด๊ฐ€๊ฒ ๋‹ค.

โ€‹

Prototypical loss

๋จผ์ € prototypical loss๋Š” embedding๋œ query sample
ฮผi\mu_{i}
์™€ class ์ค‘์‹ฌ์ 
pip_{i}
์™€์˜ ๊ฑฐ๋ฆฌ๋ฅผ softmax๋ฅผ ์ทจํ•˜๋Š” ๊ฒƒ์œผ๋กœ ๊ตฌ์„ฑ ๋œ๋‹ค. ์ฆ‰ ํ˜•ํƒœ๋Š” ์ผ๋ฐ˜์ ์ธ softmax output ํ˜•ํƒœ์™€ ๋™์ผํ•˜๋‚˜ ๊ฐ’์„ ์ฑ„์šฐ๋Š” ๋ฐฉ์‹์—์„œ distance๊ฐ€ ๋“ค์–ด๊ฐ„๋‹ค๋Š” ์ฐจ์ด์ ์ด ์žˆ๋‹ค. ์ˆ˜์‹์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.
Softmax output์„ ๊ฐœ๋…์ ์œผ๋กœ ๋ฐ”๋ผ๋ณด์ž. ๋งŒ์•ฝ ๊ฐœ์™€ ๊ณ ์–‘์ด๋ฅผ ๋ถ„๋ฅ˜ํ•˜๋Š” ๋ฌธ์ œ๋ผ๋ฉด ์šฐ๋ฆฌ๋Š” ๋‘ ๊ฐœ์˜ element๋ฅผ ๋ณด์œ ํ•œ vector๋ฅผ output์œผ๋กœ ์–ป๊ฒŒ๋  ๊ฒƒ์ด๋‹ค. ๊ฐ๊ฐ์˜ element๋Š” class์— ํ•ด๋‹นํ•˜๋Š” ํ™•๋ฅ ๋กœ ํ‘œ๊ธฐ๊ฐ€ ๋˜๋Š”๋ฐ, ์œ„ ์ˆ˜์‹์„ ์‚ฌ์šฉํ•˜๋ฉด ํ•ด๋‹น ํ™•๋ฅ ์€ sample๊ณผ class ์ค‘์‹ฌ์ ๊ณผ์˜ ๊ฑฐ๋ฆฌ๋กœ์จ ์ธก์ •์ด ๋˜๋Š” ๊ฒƒ์ด๋‹ค.
๋งŒ์•ฝ ๊ฐ•์•„์ง€ ์ด๋ฏธ์ง€๋ฅผ ๋„ฃ์—ˆ์„ ๋•Œ ๊ฐ•์•„์ง€ class์— ๋†’์€ ํ™•๋ฅ ์„ ์–ป๊ธฐ ์œ„ํ•ด์„œ(๋‹ค๋ฅธ ๋ง๋กœ 1์— ๊ฐ€๊นŒ์šด ๊ฐ’์„ ๊ฐ–์Œ)๋Š” ์œ„ ์ˆ˜์‹์—์„œ
ฮผi\mu_{i}
์™€
pip_{i}
๊ฐ„์˜ distance ๊ฐ’์ด 0์— ๊ฐ€๊นŒ์›Œ์•ผํ•œ๋‹ค. Distance๊ฐ€ 0์— ๊ฐ€๊น๋‹ค๋Š” ๊ฒƒ์€ ํ•ด๋‹น sample์ด ๊ฐ•์•„์ง€ class ์ค‘์‹ฌ์ ๊ณผ ๋งค์šฐ ๊ฐ€๊นŒ์šด ์œ„์น˜์— mapping ๋˜์–ด์žˆ๋‹ค๋Š” ๊ฒƒ์„ ๋œปํ•œ๋‹ค.
๋ฐ˜๋Œ€๋กœ class์™€์˜ ์ค‘์‹ฌ์ ๊ณผ distance๊ฐ€ ๋ฉ€๋ฉด ํ•ด๋‹น class์˜ softmax output์€ 0์— ๊ฐ€๊นŒ์šด ๊ฐ’์„ ๊ฐ–๊ฒŒ ๋œ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ํ•™์Šต์„ ๊ฑฐ๋“ญํ•˜๋ฉด์„œ embedding space ์ƒ์—์„œ inter-class distance๋Š” ์ปค์ง€๊ณ  intra-class distance๋Š” ์ž‘์•„์ง€๊ฒŒ ๋œ๋‹ค. ๋”ฐ๋ผ์„œ ์œ„ loss๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ๊ฐ™์€ class ์•ˆ์˜ sample๋“ค์€ ๋น„์Šทํ•œ ๊ณต๊ฐ„์— mapping ๋˜๋ฉฐ ๋‹ค๋ฅธ class์˜ sample๋“ค์€ ๋ฐ€์–ด๋‚ด๋Š” ํšจ๊ณผ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค.
โ€‹

MNIST example

Support set, query set ๋ฐ prototypical loss์˜ ์—ญํ• ์— ๋Œ€ํ•ด์„œ MNIST ์˜ˆ์ œ๋ฅผ ํ†ตํ•ด ์ œ๋Œ€๋กœ ์ดํ•ดํ•ด๋ณด์ž. 10๊ฐœ class ๊ฐ๊ฐ์— ๋Œ€ํ•ด์„œ 1000๊ฐœ๋ฅผ support set์œผ๋กœ, ๋‚˜๋จธ์ง€ 5000๊ฐœ๋ฅผ query set์œผ๋กœ ๋‘์—ˆ๋‹ค๊ณ  ๊ฐ€์ •ํ•˜์ž. ์ด๋Š” 1000 shot, 10-way samples setting์— ํ•ด๋‹นํ•œ๋‹ค.
Support set์˜ 1000๊ฐœ sample๋“ค์„ ํ†ตํ•ด 1๋ถ€ํ„ฐ 10๊นŒ์ง€ class์˜ ์ค‘์‹ฌ์ (
pip_{i}
)์„ ๊ตฌํ•œ๋‹ค. Classification model์€ 1์— ํ•ด๋‹นํ•˜๋Š” sample์ด ๋“ค์–ด์˜ฌ ๊ฒฝ์šฐ
p1p_{1}
์— ๊ฐ€๊น๊ฒŒ mappingํ•˜๋„๋ก ํ•™์Šตํ•˜๋ฉฐ ๋‚˜๋จธ์ง€ class์˜ sample์— ๋Œ€ํ•ด์„œ๋„ ๋™์ผํ•˜๊ฒŒ ์ ์šฉ๋œ๋‹ค.
ํ•™์Šต์ด ์–ด๋Š์ •๋„ ์ง„ํ–‰๋˜์—ˆ๋‹ค๊ณ  ๊ฐ€์ •ํ•˜์ž. Classification model์„ ํ†ต๊ณผํ•œ class 1์˜ sample๋“ค์€ ์ž์‹ ์˜ ์ค‘์‹ฌ์ ์ธ
p1p_{1}
๊ณผ์˜ ๊ฑฐ๋ฆฌ๋Š” ๊ฐ€๊น๊ฒŒ mapping์ด ๋  ๊ฒƒ์ด๊ณ  ๋‚˜๋จธ์ง€
pnp_{n}
(n > 1 & n < 10)๊ณผ์˜ ๊ฑฐ๋ฆฌ๋Š” ๋ฉ€๋„๋ก mapping์ด ๋œ๋‹ค. ์ฆ‰ ๋ชจ๋ธ์˜ ์ถœ๋ ฅ ๊ด€์ ์—์„œ ๋ฐ”๋ผ๋ณด๋ฉด class์™€์˜ ๊ฑฐ๋ฆฌ๊ฐ€ ๊ฐ€๊นŒ์šธ ์ˆ˜๋ก softmax์˜ output๊ฐ’์€ 1์— ๊ฐ€๊นŒ์šธ ๊ฒƒ์ด๊ณ  ๊ฑฐ๋ฆฌ๊ฐ€ ๋จผ class์ผ ์ˆ˜๋ก 0์— ๊ฐ€๊น๊ฒŒ ์ถœ๋ ฅ ๋˜๊ธฐ์— class 1์˜ sample๋“ค์€ output์˜ ์ฒซ ๋ฒˆ์งธ element์— ๋Œ€ํ•œ ํ™•๋ฅ  ๊ฐ’์ด ๊ฐ€์žฅ ๋†’๊ฒŒ ์ธก์ •๋  ๊ฒƒ์ด๋‹ค.
์ฆ‰ ์œ„ prototypical loss๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ์ผ๋ฐ˜์ ์ธ ๋ถ„๋ฅ˜๊ธฐ์™€ ๋™์ผํ•œ ํ˜•ํƒœ๋กœ ์ถœ๋ ฅ์„ ๋‚ผ ์ˆ˜ ์žˆ์œผ๋ฉด์„œ ๊ฐœ๋ณ„ query sample๋“ค์„ class ์ค‘์‹ฌ์ ์œผ๋กœ ๋ชจ์ด๊ฒŒ ํ•˜๋Š” ํšจ๊ณผ๋ฅผ ๊ฐ€์ ธ์˜ฌ ์ˆ˜ ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ๊ณผ์ •์€ episode๊ฐ€ ๋ฐ˜๋ณต๋˜๋ฉด์„œ query set๊ณผ support set์— ํ•ด๋‹นํ•˜๋Š” sample๋“ค์ด ๋žœ๋คํ•˜๊ฒŒ ์„ž์ด๋ฉด์„œ ๋ชจ๋“  ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•œ๋‹ค.
โ€‹

Model inference

์œ„์—์„œ ์„œ์ˆ ํ•œ ๋ฐฉ์‹์œผ๋กœ model์ด ํ•™์Šต์ด ๋˜์—ˆ๋‹ค๋ฉด, inference ์‹œ์—๋Š” distance๋ฅผ ์–ด๋–ป๊ฒŒ ๊ตฌํ•˜๊ณ  ์˜ˆ์ธก๊ฐ’์„ ๋„์ถœํ•  ์ˆ˜ ์žˆ์„๊นŒ? ๋จผ์ € ๊ฐ class์˜ ์ค‘์‹ฌ์ ์€ training samples๋ฅผ ์ด์šฉํ•˜์—ฌ ๊ณ„์‚ฐํ•œ๋‹ค. ์ดํ›„ test sample๋“ค์„ ํ†ต๊ณผ์‹œ์ผœ mapping ๋œ
ฮผi\mu_{i}
์™€ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ๊ฑฐ๋ฆฌ์˜ label๋กœ ์˜ˆ์ธก์„ ์ง„ํ–‰ํ•œ๋‹ค. Notation์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.
๋”ฐ๋ผ์„œ mapping๋œ
ฮผi\mu_{i}
๊ฐ€ ground-truth center์™€ ๋น„์Šทํ•œ ์œ„์น˜์— mapping ๋˜์ง€ ์•Š์„์ˆ˜๋ก ์˜ค๋ถ„๋ฅ˜ ๋  ๊ฐ€๋Šฅ์„ฑ์ด ์ปค์ง€๊ฒŒ ๋œ๋‹ค. ์ •๋ฆฌํ•˜๋ฉด test sample๊ณผ class centroid vector๊ฐ„์˜ distance์— ๋น„๋ก€ํ•˜์—ฌ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์ด ๊ฒฐ์ •๋˜๊ธฐ์— distance๊ฐ€ ๊ณง model์˜ calibrated confidence๋ฅผ ํ‘œํ˜„ํ•œ๋‹ค๊ณ  ๋ณผ ์ˆ˜ ์žˆ๋‹ค.
โ€‹

Calibrated confidence of DBLE

๊ทธ๋ ‡๋‹ค๋ฉด ์‹ค์ œ๋กœ distance์™€ confidence์™€์˜ ๊ด€๊ณ„๋Š” ์–ด๋– ํ•œ์ง€ ์•Œ์•„๋ณด์ž. ์œ„ ๊ทธ๋ฆผ์˜ x์ถ•์€ ๊ฐœ๋ณ„ sample๋“ค์˜ class ์ค‘์‹ฌ์ ๊ณผ์˜ ๊ฑฐ๋ฆฌ๋ฅผ ๋œปํ•˜๋ฉฐ, y์ถ•์€ ๊ทธ ๋•Œ์˜ test accuracy์˜ ํ‰๊ท ์„ ๋œปํ•œ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด CIFAR-100์—์„œ ground-truth class ์ค‘์‹ฌ์ ๊ณผ์˜ ๊ฑฐ๋ฆฌ๊ฐ€ 5์ธ sample๋“ค์˜ test accuracy ํ‰๊ท ์€ ์•ฝ 0.95๋กœ ํ•ด์„ํ•œ๋‹ค. Legend์˜
dtd_{t}
๋Š” test sample
xix_{i}
์˜ ground-truth ์ค‘์‹ฌ์ ๊ณผ์˜ ๊ฑฐ๋ฆฌ๋ฅผ ๋œปํ•˜๋ฉฐ
dtโ€ฒd^{'}_{t}
๋Š” ์˜ˆ์ธก๋œ sample๋“ค์„ ํ‰๊ท ๋‚ธ ์ค‘์‹ฌ์ ๊ณผ์˜ ๊ฑฐ๋ฆฌ๋ฅผ ๋œปํ•œ๋‹ค. ๋„ํ‘œ์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋“ฏ์ด ground-truth ์ค‘์‹ฌ์ ๊ณผ์˜ ๊ฑฐ๋ฆฌ๊ฐ€ ๋ฉ€์–ด์งˆ์ˆ˜๋ก ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์ด ๋‚ฎ์•„์ง€๋ฉฐ, ์˜ˆ์ธก๋œ class ์ค‘์‹ฌ์ ์„ ์‚ฌ์šฉํ•œ ๊ฒƒ๋„ ๊ทธ๋Ÿฌํ•œ ๊ฒฝํ–ฅ์„ ๋ณด์ธ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด distance ๊ธฐ๋ฐ˜์˜ ๋ฐฉ๋ฒ•์€ calibrated confidence๊ฐ€ ๋†’๋‹ค๊ณ  ํŒ๋‹จํ•  ์ˆ˜ ์žˆ๋‹ค. ๋‹ค๋งŒ ์˜ˆ์ธก๋œ class ์ค‘์‹ฌ์ ์„ ์‚ฌ์šฉํ•  ๊ฒฝ์šฐ ground-truth๋ฅผ ์‚ฌ์šฉํ•œ ๊ฒƒ ๋งŒํผ ์ •ํ™•ํ•˜์ง„ ์•Š๋‹ค.
โ€‹

Confidence modeling by learning from errors

๋‹ค๋งŒ ์œ„์—์„œ ์ œ์‹œํ•œ ๋ฐฉ๋ฒ•์˜ ๊ฐ€์žฅ ํฐ ๋ฌธ์ œ์ ์€ distance๋ฅผ ์ธก์ •ํ•  ๋•Œ test sample์— ๋Œ€ํ•ด์„œ ground-truth label์ด ํ•„์š”ํ•˜๋‹ค๋Š” ๊ฒƒ์ด๋‹ค. ์šฐ๋ฆฌ๊ฐ€ ์‹ค์ œ๋กœ ํ’€์–ด์•ผ ํ•˜๋Š” ๋ฌธ์ œ๋Š” test sample์˜ label์ด ์—†๋Š” ๊ฒฝ์šฐ๋‹ค. ๋”ฐ๋ผ์„œ ๋…ผ๋ฌธ์˜ ์ €์ž๋“ค์€ classification์„ ํ•™์Šตํ•  ๋•Œ confidence๋ฅผ ์ธก์ •ํ•  ์ˆ˜ ์žˆ๋Š” ๋ชจ๋ธ์„ ๋™์‹œ์— ํ•™์Šตํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜๋ฉฐ ์ด๋ฅผ joint training์œผ๋กœ ํ‘œํ˜„ํ•œ๋‹ค.
์ €์ž๋“ค์€ confidence๋ฅผ ์ธก์ •ํ•˜๋Š” ๋ชจ๋ธ์„ confidence model๋กœ ์นญํ•œ๋‹ค. Confidence model์€
gฯ•g_{\phi}
๋กœ ํ‘œํ˜„๋˜๋ฉฐ sample์˜ mapping๊ฐ’
ฮผs\mu_{s}
์„ ๋ฐ›์•„์„œ
ฯƒs\sigma_{s}
๋ฅผ ์ถœ๋ ฅํ•˜๋Š” ๋ชจ๋ธ์ด๋‹ค. ์ด
ฯƒs\sigma_{s}
๊ฐ€ ํฌ๋‹ค๋ฉด ํ•ด๋‹น sample์€ ๋‚ฎ์€ confidence (ํŠน์ • class์˜ sample๋กœ ํŒ๋‹จํ•˜๊ธฐ์—” ์–ด๋ ค์›€)๋ฅผ ๊ฐ–๋Š”๋‹ค๊ณ  ์ดํ•ดํ•˜๋ฉด ๋œ๋‹ค.
ฯƒs=gฯ•(ฮผs)\sigma_{s} = g_{\phi}(\mu_{s})
Confidence model์€ classification model์—์„œ ์˜ค๋ถ„๋ฅ˜๋œ sample๋งŒ์„ ์‚ฌ์šฉํ•˜์—ฌ ํ•™์Šตํ•œ๋‹ค. ์˜ค๋ถ„๋ฅ˜๋œ sample๋งŒ์„ ํ™œ์šฉํ•˜๋Š” ์ด์œ ๋Š” ์ผ๋ฐ˜์ ์ธ class imbalance ์ƒํ™ฉ์„ ์ƒ๊ฐํ•˜๋ฉด ์ดํ•ดํ•˜๊ธฐ ์ˆ˜์›”ํ•œ๋ฐ, classification model์„ ํ•™์Šตํ•˜๊ฒŒ ๋˜๋ฉด ground-truth์™€์˜ distance๊ฐ€ ์ž‘์€ sample (์ •๋ถ„๋ฅ˜)๋“ค์ด ๋‹ค์ˆ˜, distance๊ฐ€ ํฐ sample (์˜ค๋ถ„๋ฅ˜)๋“ค์€ ์†Œ์ˆ˜๊ฐ€ ๋œ๋‹ค. ๋”ฐ๋ผ์„œ confidence model์€ ์†Œ์ˆ˜์˜ ์˜ค๋ถ„๋ฅ˜ sample๋“ค์— focusํ•˜์—ฌ ํ•™์Šตํ•˜๊ธฐ ์–ด๋ ค์šด๋ฐ confidence model์˜ ํ•™์Šต ๋ชฉํ‘œ๊ฐ€ ๋‚ฎ์€ confidence sample์— ๋Œ€ํ•ด ํฐ
ฯƒs\sigma_{s}
๋ฅผ ๋ถ€์—ฌํ•˜๋Š” ๊ฒƒ์ž„์„ ์ƒ๊ฐํ•˜๋ฉด ์ด๋Š” ํฐ ๋‹จ์ ์œผ๋กœ ์ž‘์šฉํ•œ๋‹ค. ๋”ฐ๋ผ์„œ ์˜ค๋ถ„๋ฅ˜๋œ sample๋งŒ์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด๋‹ค. ์›๋ฌธ์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.
If all data is used, training of
gฯ•g_{\phi}
would be dominated by the small distances of the correctly classified samples which would make it harder for
gฯ•g_{\phi}
capture the larger distances for the minor mis-classified samples.
โ€‹
์—ฌ๊ธฐ๊นŒ์ง€ ์ฝ์—ˆ์„ ๋•Œ '๋„๋Œ€์ฒด ์ด๊ฒŒ ๋ฌด์Šจ ๋ง์ด๋ฉฐ ์™œ ์ด๋Ÿฐ ๊ณผ์ •์ด ํ•„์š”ํ•˜์ง€?' ๋ผ๋Š” ์ƒ๊ฐ์ด ๋“ค ์ˆ˜ ์žˆ๋Š”๋ฐ, ํ›„์ˆ ํ•  ๋‚ด์šฉ๋“ค์„ ์ฝ๊ณ ๋‚˜์„œ ๋‹ค์‹œ ๋Œ์•„์˜จ๋‹ค๋ฉด ์ดํ•ด๊ฐ€ ์‰ฌ์šธ ๊ฒƒ์ด๋‹ค.
๋จผ์ € confidence๋ฅผ ์ถ”์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•๋ถ€ํ„ฐ ์‚ดํŽด๋ณด์ž.
  1. 1.
    ์˜ค๋ถ„๋ฅ˜๋œ sample๋“ค์˜ mapping ๊ฐ’
    ฮผs\mu_{s}
    ์™€
    ฯƒs\sigma_{s}
    ๋ฅผ parameter๋กœ ์‚ผ๋Š” gaussian distribution์—์„œ sample
    zsz_{s}
    ๋ฅผ ์ถ”์ถœํ•œ๋‹ค. ์—ฌ๊ธฐ์„œ
    ฯƒs\sigma_{s}
    ๋Š” confidence model์˜ output์ž„์„ ์žŠ์ง€๋ง์ž.
  2. 2.
    Sampling ๋œ ๊ฐ’
    zsz_{s}
    ๊ฐ€ ์ •๋ถ„๋ฅ˜๊ฐ€ ๋˜๋„๋ก confidence model์„ updateํ•œ๋‹ค. (์ดˆ๊ธฐ
    ฯƒs\sigma_{s}
    ๋Š” ์ž‘์€ ๊ฐ’์„ ๊ฐ–์ง€๋งŒ update๊ฐ€ ๋ฐ˜๋ณต๋ ์ˆ˜๋ก ๋Š” ์ปค์ง„๋‹ค.)
์ฆ‰ ์˜ค๋ถ„๋ฅ˜๋œ sample์ด๋ฉด์„œ ground-truth class ์ค‘์‹ฌ์ ๊ณผ์˜ ๊ฑฐ๋ฆฌ๊ฐ€ ๋ฉ€์ˆ˜๋ก
ฯƒs\sigma_{s}
๋Š” ํฐ ๊ฐ’์„ ๊ฐ–๊ฒŒ ๋œ๋‹ค. ๊ทธ๋ฆผ์œผ๋กœ ํ‘œํ˜„ํ•˜๋ฉด ์•„๋ž˜์™€ ๊ฐ™๋‹ค.
์‰ฝ๊ฒŒ ์„ค๋ช…ํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค. ๊ทธ๋ฆผ (a)๋ฅผ ๋ณด๋ฉด sample a,b๋Š” ์˜ค๋ถ„๋ฅ˜ ๋˜์—ˆ์œผ๋ฉฐ c๋Š” ์˜ฌ๋ฐ”๋ฅด๊ฒŒ ๋ถ„๋ฅ˜๋œ ์ƒํƒœ์ด๋‹ค. ์•ž์—์„œ ๋งํ–ˆ๋“ฏ, confidence model์€ ์˜ค๋ถ„๋ฅ˜๋œ sample๋งŒ์„ ์‚ฌ์šฉํ•˜๊ธฐ์— sample a,b๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ update๊ฐ€ ์ง„ํ–‰๋œ๋‹ค. ์šฐ์„  sample a๋งŒ ์‚ดํŽด๋ณด์ž. ์šฐ๋ฆฌ๋Š” ํ‰๊ท 
ฮผa\mu_{a}
์— ์ƒ์‘ํ•˜๋Š”
ฯƒa\sigma_{a}
๋ฅผ confidence model์„ ํ†ตํ•ด์„œ ๋„์ถœํ•  ์ˆ˜ ์žˆ๋‹ค. ์ดˆ๊ธฐ์˜ sigma๋Š” ์ž‘์€ ๊ฐ’์„ ๊ฐ–๊ฒŒ ๋˜๊ธฐ ๋•Œ๋ฌธ์— ํ•ด๋‹น ์ •๊ทœ๋ถ„ํฌ(๊ทธ๋ฆผ์˜ ์ ์„ ์œผ๋กœ ๋œ ์›)์—์„œ sample
zsz_{s}
๋ฅผ ์ถ”์ถœํ•˜๋”๋ผ๋„ decision boundary(๊ทธ๋ฆผ ์ค‘์•™์˜ ์ ์„ )๋ฅผ ๋„˜์–ด๊ฐˆ ์ˆ˜ ์—†๋‹ค.
๋”ฐ๋ผ์„œ confidence model์€ sample
zsz_{s}
๊ฐ€ decision boundary๋ฅผ ๋„˜์–ด์„œ ์˜ฌ๋ฐ”๋ฅด๊ฒŒ ๋ถ„๋ฅ˜๋˜๋„๋ก ๊ธฐ์กด๋ณด๋‹ค ๋” ํฐ
ฯƒa\sigma_{a}
๋ฅผ ์ถœ๋ ฅ ์‹œํ‚ค๋„๋ก ํ•™์Šต๋œ๋‹ค. ๊ทธ๋ฆผ(b)๋ฅผ ๋ณด๋ฉด confidence model์ด ํ•™์Šต๋œ ์ดํ›„ ํฐ ๊ฐ’์˜
ฯƒa\sigma_{a}
๋ฅผ ์ถœ๋ ฅ์‹œ์ผœ sample
zsz_{s}
๊ฐ€ decision boundary๋ฅผ ๋„˜์–ด ์˜ฌ๋ฐ”๋ฅด๊ฒŒ ๋ถ„๋ฅ˜๋˜๋Š” ๋ชจ์Šต์„ ๋ณด์—ฌ์ค€๋‹ค. ๊ทธ๋ฆผ์—์„œ ์•Œ ์ˆ˜ ์žˆ๋“ฏ์ด ground-truth class ์ค‘์‹ฌ์ ๋ณด๋‹ค ๊ฑฐ๋ฆฌ๊ฐ€ ๋จผ sample b๊ฐ€ a์— ๋น„ํ•ด ๋” ํฐ
ฯƒ\sigma
๋ฅผ ๊ฐ–๊ฒŒ ๋˜๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.
์ด๋ฅผ ์ˆ˜์‹์œผ๋กœ ํ‘œํ˜„ํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค. ์•ž์„œ ๋งํ–ˆ๋“ฏ,
ฮผs\mu_{s}
์™€
ฯƒs\sigma_{s}
๋ฅผ ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ ์‚ผ๋Š” gaussian distribution์—์„œ sample
zsz_{s}
๋ฅผ ์ถ”์ถœํ•œ๋‹ค.
์ดํ›„ sample
zsz_{s}
์™€ ground-truth class์˜ ์ค‘์‹ฌ์ ์„ ์ด์šฉํ•˜์—ฌ prototypical loss๋ฅผ ์ตœ์ ํ™” ํ•œ๋‹ค.
์ตœ์ ํ™” ๊ณผ์ •์—์„œ
ฮผs\mu_{s}
๋Š” ๊ณ ์ •๋œ parameter๊ธฐ ๋•Œ๋ฌธ์—
zsz_{s}
์™€ ์ค‘์‹ฌ์ „ ๊ฐ„์˜ ๊ฑฐ๋ฆฌ๊ฐ€ ๋ฉ€์ˆ˜๋ก ํฐ ๊ฐ’์˜
ฯƒs\sigma_{s}
๋ฅผ ์ถœ๋ ฅํ•˜๋„๋ก
gฯ•g_{\phi}
๊ฐ€ update๋œ๋‹ค.
๊ทธ๋Ÿฐ๋ฐ ํ•œ๊ฐ€์ง€ ์˜๋ฌธ์ ์ด ์žˆ๋‹ค. ์œ„ ์˜ˆ์‹œ์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋“ฏ ์ •๊ทœ๋ถ„ํฌ์—์„œ sampling์„ ์ง„ํ–‰ํ•˜๊ธฐ ๋•Œ๋ฌธ์—
zsz_{s}
๊ฐ€ ์šด์ข‹๊ฒŒ decision boundary๋ฅผ ๋„˜์–ด๊ฐˆ ์ˆ˜๋„ ์žˆ์ง€๋งŒ ๊ทธ๋ ‡์ง€ ์•Š์€ sample์ด ์ถ”์ถœ๋  ํ™•๋ฅ ์ด ๋” ๋†’์ง€ ์•Š์„๊นŒ? ๊ทธ๋ ‡๋‹ค๋ฉด ๋‹จ์ˆœํžˆ
ฯƒs\sigma_{s}
๋ฅผ ํ‚ค์šฐ๊ธฐ๋งŒ ํ•˜๋Š”๊ฒŒ ๋งž๋Š”๊ฑด๊ฐ€?
๋†€๋ž๊ฒŒ๋„ ์ €์ž๋“ค์€ ์ด๋Ÿฌํ•œ ์„ฑ์งˆ์„ ์ด์šฉํ•ด์„œ confidence๋ฅผ ์ถ”์ •ํ•œ๋‹ค. ์•ž์„œ ๋งํ•œ ๋ฐฉ์‹์œผ๋กœ ํ•™์Šต์ด ์™„๋ฃŒ๋œ
gฯ•g_{\phi}
๋Š” inference์‹œ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์‹์— ์˜ํ•ด์„œ confidence๋ฅผ ์ถœ๋ ฅํ•œ๋‹ค.
๋จผ์ € ๊ฐ๊ฐ์˜ test sample
xtx_{t}
์— ๋Œ€ํ•ด์„œ mapping๊ฐ’
ฮผt\mu_{t}
์™€ ์˜ˆ์ธก๋œ ๊ฐ’์œผ๋กœ ๋„์ถœ๋œ ์ค‘์‹ฌ์ 
pytโ€ฒp_{y_{t}^{'}}
์„ ๊ตฌํ•˜๊ณ , confidence model์„ ํ†ตํ•ด์„œ
ฯƒt\sigma_{t}
๋ฅผ ๊ตฌํ•œ๋‹ค. ์ดํ›„ ํ•ด๋‹น ์ •๊ทœ๋ถ„ํฌ์—์„œ
ztz_{t}
๋ฅผ U๋ฒˆ samplingํ•˜์—ฌ ๊ฐ๊ฐ์˜ prototypical loss์˜ ํ‰๊ท ์œผ๋กœ test sample์˜ confidence๋ฅผ ์ธก์ •ํ•œ๋‹ค.
๋งŒ์•ฝ ์•ž์—์„œ ์„ค๋ช…ํ•œ๋Œ€๋กœ ํŠน์ • sample์ด ์˜ค๋ถ„๋ฅ˜๋˜๊ณ  ground-truth class centroid vector์™€ ๊ฑฐ๋ฆฌ๊ฐ€ ๋ฉ€๊ฒŒ mapping์ด ๋˜์–ด ์žˆ์–ด
ฯƒt\sigma_{t}
๊ฐ€ ํฐ ๊ฐ’์„ ๊ฐ–๋Š”๋‹ค๋ฉด U๋ฒˆ์˜ prototypical loss์˜ ์ฐจ์ด๊ฐ€ ํด ๊ฒƒ์ด๋ฉฐ, ์ด๋ฅผ ํ‰๊ท ๋‚ธ ๊ฐ’์€ ๋ชจ๋“  class์— ๋Œ€ํ•ด์„œ ๋น„์Šทํ•œ ๊ฐ’์„ ๊ฐ–๊ฒŒ ๋œ๋‹ค. ์ฆ‰ ์˜ค๋ถ„๋ฅ˜๋œ sample์ผ ์ˆ˜๋ก(=์ค‘์‹ฌ์ ๊ณผ์˜ ๊ฑฐ๋ฆฌ๊ฐ€ ๋ฉ€์ˆ˜๋ก) ํŠน์ • class์— ๋Œ€ํ•œ confidence๊ฐ€ ๋‚ฎ๊ฒŒ ์ธก์ •๋œ๋‹ค.
โ€‹
์œ„ ๊ทธ๋ฆผ์„ ํ†ตํ•ด์„œ ์‰ฝ๊ฒŒ ์•Œ์•„๋ณด์ž. ๋งŒ์•ฝ ์˜ค๋ถ„๋ฅ˜๋˜๊ณ , ground-truth centroid vector์™€ ๊ฑฐ๋ฆฌ๊ฐ€ ๋จผ sample๋“ค์˜ ๊ฒฝ์šฐ ๋†’์€ sigma ๊ฐ’์„ ๊ฐ–๋Š”๋‹ค๊ณ  ์„ค๋ช…ํ–ˆ๋‹ค. ๋”ฐ๋ผ์„œ ํ•ด๋‹น ์ •๊ทœ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฅด๋Š” sampling ๊ฐ’
zsz_{s}
๋Š” ์ถ”์ถœํ•  ๋•Œ๋งˆ๋‹ค ๋งค๋ฒˆ ๋‹ค๋ฅธ softmax output ๊ฐ’์„ ๊ฐ–๊ฒŒ ๋  ๊ฒƒ์ด๋‹ค(์œ„ ๊ทธ๋ฆผ์˜ ์ขŒ์ธก). ๋”ฐ๋ผ์„œ ์ด๋ฅผ ํ‰๊ท ๋‚ด๋ฉด ๊ฐ class ๋ณ„ output ํ™•๋ฅ ๊ฐ’์€ ๋‚ฎ์€ confidence (ํŠน์ • class์— ์†ํ•  ํ™•๋ฅ ์ด ๋‚ฎ์Œ)๋ฅผ ๊ฐ–๊ฒŒ ๋œ๋‹ค.
๋ฐ˜๋ฉด ์ •๋ถ„๋ฅ˜๋˜๊ณ , ground-truth centroid vector์™€ ๊ฑฐ๋ฆฌ๊ฐ€ ๊ฐ€๊นŒ์šด sample๋“ค์˜ ๊ฒฝ์šฐ ๋‚ฎ์€ sigma ๊ฐ’์„ ๊ฐ–๊ธฐ์— sampling๋œ
zsz_{s}
๋“ค ๋˜ํ•œ ํฐ ์ฐจ์ด๊ฐ€ ์กด์žฌํ•˜์ง€ ์•Š๋Š”๋‹ค. ๋”ฐ๋ผ์„œ ํ‰๊ท ์„ ๋‚ด๋”๋ผ๋„ ํŠน์ • class์— ๋†’์€ confidence๋ฅผ ๊ฐ–๊ฒŒ ๋œ๋‹ค.
์ •๋ฆฌํ•˜๋ฉด confidence model์€ ์˜ค๋ถ„๋ฅ˜ ๋œ sample์— ๋Œ€ํ•ด์„œ sigma๋ฅผ ํ‚ค์›€์œผ๋กœ์จ n๋ฒˆ์˜ sampling ์‹œ ์ผ๊ด€์„ฑ์„ ์œ ์ง€ํ•˜์ง€ ๋ชปํ•˜๊ฒŒ ๋งŒ๋“ค์–ด ์ด๋ฅผ ํ‰๊ท  ๋‚ด์—ˆ์„ ๋•Œ ์–ด๋Š class์—๋„ ์†ํ•˜์ง€ ๋ชปํ•˜๋„๋ก confidence๋ฅผ ๋‚ฎ์ถ”๊ฒŒ ๋œ๋‹ค. ์œ„ ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•  ๊ฒฝ์šฐ test sample์— ๋Œ€ํ•œ label์ด ์—†๋”๋ผ๋„ confidence๋ฅผ ๊ตฌํ•  ์ˆ˜ ์žˆ๋‹ค.
โ€‹

Experiments

์ €์ž๋“ค์€ distance-based learning from errors (DBLE)์˜ calibration ํšจ๊ณผ๋ฅผ ๋น„๊ตํ•˜๊ธฐ ์œ„ํ•œ baseline์œผ๋กœ vanilla training, MC-Dropout, Temperature scaling, Mixup, Label smoothing, Trust Score๋ฅผ ์‚ฌ์šฉํ•˜์˜€๋‹ค.
Datasets์€ MNIST, CIFAR-10, CIFAR-100, Tiny-ImageNet์„ ์‚ฌ์šฉํ•˜์˜€์œผ๋ฉฐ ํ•˜๊ธฐ ํ‘œ์— ์ ์ธ -MLP, VGG11์€ ์‚ฌ์šฉ ๋ชจ๋ธ์„ ๋œปํ•œ๋‹ค.
ํ‰๊ฐ€์ง€ํ‘œ๋Š” accuracy, expected calibration error (ECE), negative log likelihood (NLL)๋ฅผ ์‚ฌ์šฉํ•˜์˜€๋‹ค. ECE์™€ NLL์˜ ์ˆ˜์‹์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค. ECE๋Š” accuracy์™€ confidence๊ฐ„์˜ ์ฐจ์ด๋กœ ์ •์˜๋˜๋ฉฐ ๋ณธ ํฌ์ŠคํŒ…์˜ ์„œ๋‘์— ๋‹ค๋ฃจ์—ˆ๋˜ ๋‚ด์šฉ๊ณผ ์ผ๋งฅ์ƒํ†ตํ•œ๋‹ค.
Baseline๊ณผ์˜ ์„ฑ๋Šฅ ๋น„๊ตํ‘œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.
Confidence model์„ ํ•™์Šตํ•  ๋•Œ ์˜ค๋ถ„๋ฅ˜๋œ sample ๋งŒ์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ์›”๋“ฑํžˆ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์ธ๋‹ค๋Š” ๊ฒƒ์€ ์•„๋ž˜ ํ‘œ๋ฅผ ํ†ตํ•ด์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.
๋…ผ๋ฌธ์„ ์ž‘์„ฑํ•˜๋Š” ์‚ฌ๋žŒ์˜ ์ž…์žฅ์œผ๋กœ ์ค‘์š”ํ•œ stance์ธ ๊ฒƒ ๊ฐ™๋‹ค. ๋ณธ์ธ๋“ค์ด ์ œ์•ˆํ•œ ๋ฐฉ๋ฒ•์ด ์ข‹์€ ๊ฒฐ๊ณผ๋กœ ์ด์–ด์ง€์ง€ ์•Š์„ ๋•Œ ํฌ๊ธฐํ•˜์ง€ ์•Š๊ณ  ๋Š์ž„์—†์ด ํƒ์ƒ‰ํ•ด์•ผ ์–ป์„ ์ˆ˜ ์žˆ๋Š” ๊ฒฐ๊ณผ๋ผ๋Š” ์ƒ๊ฐ์ด ๋“ค์—ˆ๋‹ค.
โ€‹
โ€‹
โœ“\checkmark
๋…ผ๋ฌธ์˜ concept ๋ฐ idea ์œ„์ฃผ๋กœ ์ •๋ฆฌํ•˜์—ฌ ์ž์„ธํ•œ ์ˆ˜์‹์ด๋‚˜ ๋‚ด์šฉ์— ์˜ค๋ฅ˜๊ฐ€ ์žˆ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.