(NLP 연구) The Long-Document Transformer 03.02

2022.03.02

Token마다 Attention Layer를 동일하게 적용하는게 맞을까?
- 어떤 Token은 Full Attention은 적용하고 어떤 Token은 Sparse Attention을 적용하는게 맞지 않을까? 즉 토큰마다의 Overfit과 Underfit이 있지 않을까?
- Attention optimization이라고 논문의 표현이 맞을까?
Week Plan
- Attention 관련 논문 읽기(이번주)
  - Longformer: The Long-Document Transformer ****(2020, 708회 인용 / AllenAI)
    - https://arxiv.org/pdf/2004.05150.pdf
    - https://github.com/allenai/longformer
  - ETC: Encoding Long and Structured Inputs in Transformers (EMNLP, 2020, 87회 인용 / Google Research)
    - https://arxiv.org/pdf/2004.08483.pdf
    - https://github.com/google-research/google-research/tree/master/etcmodel
  - BigBird (NeurIPS, 2020, 358회 인용 / Google Research)
    - https://arxiv.org/pdf/2007.14062.pdf
Today Plan
- Attention 공부 (논문분석 정리)
- Anaconda 설치 방법 및 연동
- Birbird git
- python version 해결하기

(NLP 연구) The Long-Document Transformer 03.10 (0)	2022.03.24
(NLP 연구) The Long-Document Transformer 03.08 (0)	2022.03.24
(NLP 연구) The Long-Document Transformer 03.04 (0)	2022.03.24
(NLP 연구) The Long-Document Transformer 03.03 (0)	2022.03.24
(NLP 연구) The Long-Document Transformer 03.01 (0)	2022.03.24