[Transformer] Quantifying Attention Flow in Transformers

IT/Paper

[Transformer] Quantifying Attention Flow in Transformers

성진팍 2021. 3. 13. 13:32

Input Token의 상대적인 relevance에 따라 attention weight을 이용할때, attention weight, attention롤아웃 및 흐름을 고려해서 Input Token에 대한 attention 을 근사하는 2가지 방법을 제안하는 논문.

해당 방법은 정보의 흐름에 대한 보완적인 view를 제공하였음. 또한 오리지널 attention과 비교했을 때, ablation method 및 input gradient (두 방법 모두) 를 이용하여 얻은 Input Token의 importance score와 높은 상관관계를 생성했음을 보여주었음

$y=a_x$

단수/복수 예측하는 verb number 예측

해당 task와 데이터셋은

arxiv.org/pdf/2005.00928.pdf

저작자표시 비영리 변경금지 (새창열림)

'IT > Paper' 카테고리의 다른 글

Empirical study of the topology and geometry of deep networks, CVPR 2018 (0)	2021.03.21
[Transformer] An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (ViT) (0)	2021.03.18
[Optimization] LOOK AHEAD OPTIMIZER: K STEPS FORWARD, 1 STEP BACK, NeurIPS 2019 (0)	2021.03.06
[Optimization] SGDR: STOCHASTIC GRADIENT DESCENT WITH WARM RESTARTS, ICLR 2017 (0)	2021.03.06
[Noisy Label] Robust Inference via Generative Classifiers for Handling Noisy Labels, ICML 2019 (0)	2021.03.04

현재글[Transformer] Quantifying Attention Flow in Transformers

jin's blog

Endure

Adversarial Examples Are Not Bugs, They Are Features, R-CNN, Deconvolution Network, Fast R-CNN, intergrated gradient, TCAV, RL논문, Regularizing Trajectory Optimization with Denoising Autoencoders, Concept vector, Quantifying Attention Flow in Transformers, Interpretability Beyond Feature Attribution:Quantitative Testing with Concept Activation Vectors, vision transformer, smoothGrad, XAI, CAV, Learning Directed Exploration Strategies, Never Give Up, Axiomatic Attribution for Deep Networks, Paper리뷰,

Today :
Yesterday :

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

jin's blog

[Transformer] Quantifying Attention Flow in Transformers

'IT > Paper' 카테고리의 다른 글

'IT/Paper'의 다른글

티스토리툴바

[Transformer] Quantifying Attention Flow in Transformers

'IT > Paper' 카테고리의 다른 글

'IT/Paper'의 다른글

관련글

티스토리툴바