Generalization through Memorization: Nearest Neighbor Language Models (ICLR 2020)

- 9 mins

Abstract

1. Introduction

2. Nearest Neighbor Language modeling

figure1

Datastore

\[(\mathcal{K}, \mathcal{V}) = \{(f(c_i), w_i) | (c_i, w_i) \in \mathcal{D}\}\]

Inference

\[p_{\text{kNN}}(y|x) \propto \sum_{(k_i, v_i)\in\mathcal{N}} \mathbb{1}_{y=v_i} \exp{(-d(k_i, f(x)))}\] \[p(y|x) = \lambda \; p_{\text{kNN}}(y|x) + (1-\lambda) \; p_{\text{LM}}(y|x)\]

Implementation

3. Experimental Setup

Data

Model Architecture

Evaluation

$k$NN-LM

Computational Cost

4. Experiments

4.1. Using the Training Data as the Datastore

table1

table2

4.2. More Data without Training

table3

figure2

4.3. Domain Adaptation

table4

Key Function

figure3-table5

Number of Neighbors per Query

figure4-and-figure5

Interpolation parameter

Precision of Similarity Function

6. Analysis

figure6

Qualitative Analysis

Simple vs Neural Representation

figure7-and-figure8

Implicit vs Explicit memory

8. Conclusion and Future Work

Joohong Lee

Joohong Lee

Machine Learning Researcher

rss facebook twitter github youtube mail spotify instagram linkedin google pinterest medium