ArXiv

Preprint

Source Code

Dataset

Much of the knowledge contained in neural language models may be
expressed in terms of relations.
For example, the fact that Miles Davis is a trumpet player can be written as a **relation**
(*plays the
instrument* )
connecting a **subject** (*Miles Davis* ) to an **object**
(*trumpet* ).

One might expect how a language model decodes a relation to be a sequence of complex, non-linear
computation spanning multiple layers. However, in this paper we show that for a subset of
relations this (highly non-linear) decoding procedure can be well-approximated by a single
*linear
transformation* (**LRE**) on the
subject representation **s** after some
intermediate layer.

A linear approximation in form of
LRE(**s**) = *W***s** + *b*
can be obtained by taking a first order Taylor series approximation to the LM computation, where
*W*
is the local derivative (Jacobian) of the LM computation at some subject representation
**s _{0}**.
For a range of relations we find that averaging the estimation of
LRE
parameters on just 5 samples is enough to get a faithful approximation of LM decoding.

Here
*F*
represents how LM obtains the object representation
**o**
from the subject representation
**s**
introduced within a textual context
*c*.
Kindly refer to our paper for further details.

We evaluate the LRE approximations on a set of 47 relations spanning 4 categories:
*factual associations*, *commonsense knowledge*, *implicit biases*, and
*linguistic
knowledge*.
We find that for almost half of the relations LRE faithfully recovers subject-object mappings for a
majority
of the subjects in the test set.

We also identify a set of relations where we couldn't find a good LRE approximations. For most of these relations the range was names of people and companies. We think the range for this relations are so large that LM cannot encode them in a single state, and relies on a more complex non-linear decoding procedure.

This work is not yet peer-reviewed. The preprint can be cited as follows.

Evan Hernandez, Arnab Sen Sharma, Tal Haklay, Kevin Meng, Martin Wattenberg, Yonatan
Belinkov, and David Bau. "*Linearity of Relation Decoding in Transformer Language
Models.*" arXiv preprint

@article{hernandez2023linearity, title={Linearity of Relation Decoding in Transformer Language Models}, author={Evan Hernandez and Arnab Sen Sharma and Tal Haklay and Kevin Meng and Martin Wattenberg and Jacob Andreas and Yonatan Belinkov and David Bau}, year={2023}, eprint={2308.09124}, archivePrefix={arXiv}, primaryClass={cs.CL} }