A relation network (RN) is an artificial neural network component with a structure that can reason about relations among objects. An example category of such relations is spatial relations (above, below, left, right, in front of, behind).[1]
RNs can infer relations, they are data efficient, and they operate on a set of objects without regard to the objects' order.[1]
History
In June 2017, DeepMind announced the first relation network. It claimed that the technology had achieved "superhuman" performance on multiple question-answering problem sets.[1]
| Dataset | Accuracy | Notes | 
|---|---|---|
| CleVR (pixel) | 95.5% | Images of 3D objects, such as spheres and cylinders. Types of questions are: "attribute" queries ("What color is the sphere?", "compare attribute" queries ("Is the cube the same material as the cylinder?"), "count" queries ("How many spheres?") | 
| CleVR (state description) | 96.4% | Images represented by state description matrices. Each row in the matrix contained a single object's features: coordinates (x, y, z); color (r, g, b); shape (cube, cylinder,...); material (rubber, metal,...); size (small, large,...). | 
| Sort-of-CLEVR | 94% | 2D images along, each containing 6 squares and/or circles of 6 colors. Questions are coded as fixed-length binary numbers, eliminating natural language parsing complications. Each image serves 10 relational ("What is the shape of the object that is farthest from the gray object?") questions and 10 non-relational ("What is the shape of the gray object?") questions. | 
| bAbI | 90% | Textual data. 20 tasks, each requiring a particular type of reasoning, such as deduction, induction, or counting. Each question is associated with a set of supporting sentences. For example, the sentences "Sandra picked up the football" and "Sandra went to the office" support the question "Where is the football?" (answer: "office"). Each sentence is processed separately. The success threshold is 95%. 10k entries. | 
| Dynamic physical system | 93% connections /95% counting | Balls moving on a surface, with elastic and inelastic connections. One test determined whether pairs of balls were connected. The other determined how many were connected. | 
Design
RNs constrain the functional form of a neural network to capture the common properties of relational reasoning. These properties are explicitly added to the system, rather than established by learning just as the capacity to reason about spatial, translation-invariant properties is explicitly part of convolutional neural networks (CNN). The data to be considered can be presented as a simple list or as a directed graph whose nodes are objects and whose edges are the pairs of objects whose relationships are to be considered. The RN is a composite function:
where the input is a set of "objects" is the ith object, and fφ and gθ are functions with parameters φ and θ, respectively and q is the question. fφ and gθ are multilayer perceptrons, while the 2 parameters are learnable synaptic weights. RNs are differentiable. The output of gθ is a "relation"; therefore, the role of gθ is to infer any ways in which two objects are related.[1]
Image (128x128 pixel) processing is done with a 4-layer CNN. Outputs from the CNN are treated as the objects for relation analysis, without regard for what those "objects" explicitly represent. Questions were processed with a long short-term memory network.[1]