인공지능 공부/NLP 연구
(NLP 연구) The Long-Document Transformer 03.10
앨런튜링_
2022. 3. 24. 09:09
- Efficient Transformers: A Survey 읽어보자
- https://arxiv.org/pdf/2009.06732.pdf
- Introduction
- on-device applications, models are supposed to be able to operate with a limited computational budget
- In this paper, we propose a taxonomy of efficient Transformer models, characterizing them by the technical innovation and primary use case
- The efficiency might also refer to computational costs, e.g. number of FLOPs, both during training and inference.
- 플롭스(FLOPS, FLoating point Operations Per Second)는 컴퓨터의 성능을 수치로 나타낼 때 주로 사용되는 단위
- Introduction
- Background on Transformers
- Multi-Head Self-Attention
- The operation for a single head is defined as
- Multi-Head Self-Attention
- On the scalability of Self-Attention
- At this point, it is apparent that the memory and computational complexity required to compute the attention matrix is quadratic in the input sequence length, i.e., N × N . In particular, the QK matrix multiplication operation alone consumes N 2 time and memory. This restricts the overall utility of self-attentive models in applications which demand the processing of long sequences. In subsequent sections, we discuss methods that reduce the cost of self-attention.
Local attention 코드 구현
각 집약 - > 전체문장으로 보는 경향 확인