NLP Notes (Updating)
Published:
NLP Notes (updating)
Natural Language Generation
GPT-2
PPLM: github code
GEM Benchmark: paper
table-to-text
https://arxiv.org/abs/2004.14373
story generation
https://arxiv.org/abs/2010.01717
Summarization
Examples:
- BART: https://huggingface.co/docs/transformers/model_doc/bart
- distilBART: https://huggingface.co/sshleifer/distilbart-cnn-12-6
- PEGASUS: https://huggingface.co/google/pegasus-large
Image Captioning
DenseCap
Johnson, J., Karpathy, A., & Fei-Fei, L. (2016). Densecap: Fully convolutional localization networks for dense captioning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4565-4574).
Karpathy, A., & Fei-Fei, L. (2015). Deep visual-semantic alignments for generating image descriptions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3128-3137).
Code: DenseCap
Sentences to paragraph:
Krause, J., Johnson, J., Krishna, R., & Fei-Fei, L. (2017). A hierarchical approach for generating descriptive image paragraphs. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 317-325).
Other related key words: image description, visual storytelling
Evaluation
Gehrmann, S., Adewumi, T., Aggarwal, K., Ammanamanchi, P. S., Anuoluwapo, A., Bosselut, A., … & Zhou, J. (2021). The gem benchmark: Natural language generation, its evaluation and metrics. arXiv preprint arXiv:2102.01672.
Automated metrics
Lexical similarity:
- BLEU
- Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation.
- ROUGE-1/2/L
- Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries.
- ROUGE can be improved by increased the output length of the model
- METEOR
- Satanjeev Banerjee and Alon Lavie. 2005. METEOR: an automatic metric for MT evaluation with improved correlation with human judgments.
Semantic Equivalence
- BERTScore
- Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q. Weinberger, and Yoav Artzi. 2020. Bertscore: Evaluating text generation with BERT.
- BLEURT
- Thibault Sellam, Dipanjan Das, and Ankur Parikh. 2020. BLEURT: Learning robust metrics for text generation.
Diversity
- Shannon Entropy over unigrams and bigrams (H_1, H_2)
- Claude E Shannon and Warren Weaver. 1963. A mathematical theory of communication.
- Mean Segmented Type Token Ratio over segment lengths of 100 (MSTTR)
- Wendell Johnson. 1944. Studies in language be- havior: A program of research.
- The ratio of distinct n-grams over the total number of n-grams (Distinct_1, 2)
- The count of n-grams that only appear once across the entire test output (Unique_1, 2)
- Jiwei Li, Michel Galley, Chris Brockett, Jian- feng Gao, and Bill Dolan. 2016. A diversity- promoting objective function for neural conversation models.
Human evaluation
Anya Belz, Simon Mille, and David M. Howcroft. 2020. Disentangling the properties of human evaluation methods: A classification system to support comparability, meta-evaluation and re-producibility testing.
David M. Howcroft, Anya Belz, Miruna-Adriana Clinciu, Dimitra Gkatzia, Sadid A. Hasan, Saad Mahamood, Simon Mille, Emiel van Miltenburg, Sashank Santhanam, and Verena Rieser. 2020. Twenty years of confusion in human evaluation: NLG needs evaluation sheets and standardised definitions.
Anastasia Shimorina and Anya Belz. 2021. The human evaluation datasheet 1.0: A template for recording details of human evaluation experiments in nlp.
Others
CEFR level
6 levels: A1 (beginners), A2 (pre-intermediate), B1 (intermediate), B2 (upper-internediate), C1 (advanced), C2 (proficiency)
- Code in github: Vocabulary level grader
- Online websites: Text Analyzer (for free), Text Inspector (not free)
Classification
- sentiment detection
- Offensive language: https://arxiv.org/pdf/2006.07235.pdf
- Offensive language: https://aclanthology.org/S19-2010/
- natural language inference
- fact verification / propaganda
Structured prediction
- parsing
- relation / event extraction: https://aclanthology.org/P19-1074/
- conference resolution
Question answering
- retrieval
- reading comprehension
- KB-QA
- closed-book QA