Linking Neural Networks, Manifolds, and Linguistic Signals
This was originally written as a short essay for a seminar I attended earlier this year, so the style may come off a bit starched. I’ve posted it here because I think it’s a meaningful attempt at drawing lines connecting several fields in modern science.
The manifold hypothesis is a critical (yet unproven) basis for deep learning. It claims that, while data in the real world is high-dimensional, it largely lies on low-dimensional manifolds, which means it can be efficiently categorized and reduced into simpler descriptions. The counterpoint to the opacity of deep learning algorithms is that they can efficiently learn the manifolds of real-world data, whereas humans cannot meaningfully describe these manifolds. Learning manifolds is also the most efficient way for machines to deal with Big Data — as updating manifolds is computationally easy — especially when the data comes from places where there exist no well-defined generative models.
Manifold learning and generative modelling have been paralleled to the psychological notions of bottom-up and top-down learning (Brockman 2019). Concepts, or more accurately partitions of the input space, in manifold learning are derived from basic reinforcement processes, closely parallelizing the bottom-up style of operant reinforcement. In this case, the agent has no explicit knowledge of the pattern according to which they partition inputs (although, as we will discuss later, humans can formulate approximate rules). “Intuition” is an extreme form of this type of learning. In top-down learning, a general set of rules for dealing with inputs is given, and these rules often define how to modify the model in response to new inputs. One might refer to the principle-and-parameters model of generative linguistics, which hypothesizes that, for example, children “know” that all human languages have a fixed head position (a principle), and can, upon hearing sentences from a language, determine whether that language is head-final or head-initial (the parameters). We can say that such a generative model also defines a manifold, but one that allows only a limited amount of freedom.
Manifolds may be linked to a phenomenon popularized by Thomas Kuhn known as theory loading. The idea of theory loading is that even basic perceptions are formulated within preexisting assumptions about the way the world operates (scientific theories being such a set of assumptions). Kuhn’s clearest example of this is an experiment wherein experimenters ask subjects to identify cards from a standard 52-card deck, but, unbeknownst to the subjects, some of the cards are colored incorrectly — eg. “a red six of spades and a black four of hearts” (Kuhn 1962). Subjects, assuming that the cards are “normal”, do not notice the errors until prolonged exposure. A manifold-based explanation of this would be: the subject has a manifold for recognizing “normal” cards, and both red four of hearts and black four of spades are categorizations on this manifold. A black four of hearts is, unlike noisy observations of standard cards, very far from the manifold, but will nonetheless be pulled towards the nearest spot and assimilated. In general, perception assimilates all raw stimuli to a manifold given by theory. This parallel may provide an explanation for the manifold hypothesis: if human data forcedly exists on a manifold, then naturally the data provided to ML algorithms by humans would as well.
This argument provides a potential link between human psychological phenomena and neural network training. There is at least one respect in which it is lacking: while NN manifolds are purely a result of reinforcement, humans can modify their theories using language or other signals as discriminative stimuli (nobody has failed to see the gorilla after being told it was there). Since in a reductive sense any sort of signal — down to the molecular signals used between bacteria for quorum sensing — perform this task of “conveying information” about the state of the world, this is not a particularly philosophical problem. Replicator dynamics and reinforcement learning are generally sufficient for agents to learn to signal ex nihilo. However, to my knowledge, signaling as a facet of nonlinear dynamics and machine learning with its focus on differential equations have not been put together, despite their similarities, and thus it is not clear how signal control of manifolds might work. To remedy this, consider the following simulation:
We have a sender agent and an actor agent in a cooperative game. Nature picks a state 1, 2, 3, or 4. Nature reveals to the sender whether the state is ≤2 or not. Nature reveals to the actor whether the state is odd or not. The sender may send the actor one of two signals, but at the beginning these signals have no “meaning”. The actor must perform an action corresponding to the state, and if they perform the correct action, then the sender and actor are both reinforced.
This is a variation on an example provided in Signals (Skyrms 2010, p138–141), in which there are two senders who see partial states, and the actor receives a signal from each of them. This is not too different and we should likewise expect the agents to learn consistently. In either case, the actor must perform logical composition over the two partial states to determine the “true” state. We can relate this back to the problem of discriminative control of manifolds by first making states more “opaque” — perhaps in the form of images — and forcing the agents to parse them, by turning agents into neural networks. The sender’s network should yield two categories (two signals) and the actor’s four (four possible actions). We add the signal from the sender as an input to the actor’s network. At the end of training, we want that the image corresponding to the odd states evokes exclusively action-1 when the sender gives one signal, and exclusively action-3 when the sender gives the other signal. If this works (it should, given its similarity to existing machine learning models based on information bottlenecking), it would show that the manifold of the receiver’s network is sensitive to signals sent from other agents.
Notably, humans can explicitize much of their “intuitive” or bottom-up knowledge by describing their own behavior. BF Skinner provides a general analysis of this sort of model-building in An Operant Analysis of Problem Solving, under the concepts of contingency-shaped (corresponding to bottom-up learning and basic operant training) and rule-governed (top-down) behavior. Rules specify a type of action and its consequences, and can be used to control behavior without any interactions with those consequences. For example, Skinner cites a song used by blacksmith’s apprentices:
Up high, down low,
Up quick, down slow —
And that’s the way to blow. (Skinner 1984)
This song describes both a method of handling a forge’s bellows and a generic reinforcer. An apprentice can use it to properly handle a forge, even if they “do not see the effect on the fire” and therefore are not subject to traditional operant reinforcement. This pattern is no different from using generative models. The interesting question is how these rules are generated from contingency-shaped (bottom-up or “intuitive”) behavior: how did someone come up with the song in the first place? For Skinner, it is a matter of demand by the verbal community:
The community asks: What did you do? What are you doing? What are you going to do? And why? And the answers describe behavior and relate it to effective variables… the expression I grabbed the plate because it was going to fall refers to a response (grabbing) and a property of the occasion (it was going to fall) and implies a reinforcer (its falling would have been aversive to the speaker or others).
A description of action can quite easily become a rule when it acquires social authority. The step that allows humans to jump from contingency-shaped or bottom-up learning to rule-governed or top-down learning is thus fundamentally a linguistic move embedded within a verbal and deontic community. It is also the same move that is lacking in the “opacity” of deep models. This makes it difficult to expect of deep networks, which at present have no significant linguistic capabilities.
However, there is also a theoretical problem (or a solution, if you are optimistic) with asking deep networks to describe their own behavior, even if they had language, or any mediating method, such as hardcoded responses (a topic in driverless car research). This is the Kuhnian problem of incommensurability. Imagine that, during the Ptolemaic era, someone feeds the results from many observations of celestial movement into a deep network with human-like rule-generation capabilities. Even though these observations were formulated within the Ptolemaic system, it is theoretically possible that the network concludes — as Copernicus eventually would — that the manifold is most accurately codified as a heliocentric model. Since deep networks can be highly sensitive to distinctions that humans generally ignore, we might have a situation where a computer tells us that our long-held scientific theories are wrong, based on reasons that we would find difficult to even entertain, before any decisive evidence can be provided. Again, this may be a problem or a solution, but the debates will likely not be pretty. It is then for the better (or for worse) that the most immediate applications of explaining behavior, such as driving, concern actions that cannot meaningfully challenge the institutions they take part in, contrary to scientific research.
Overall, contingency-shaped behavior (alternatively bottom-up knowledge or manifold learning) and rule-governed behavior (alternatively top-down knowledge or generative modelling) provide different approaches to both human and machine knowledge. While humans frequently bridge the gap back and forth, this is more difficult for machines, insofar as language seems to be key. This incongruity does not need to be permanent, and the incorporation of signaling from evolutionary game theory into machine learning may provide a step towards understanding how the gap works on a computational level.
Brockman, John, ed. Possible Minds: 25 Ways of Looking at AI. 2019. Chapter 21.
Goodfellow, Ian et al. Deep Learning. 2016.
Kuhn, Thomas. The Structure of Scientific Revolutions. 1962.
Skinner, B.F. “An Operant Analysis of Problem Solving”. The Behavioral and Brain Sciences. Vol 7 (1984). p583–591.
Skyrms, Brian. Signals: Evolution, Learning, and Information. 2010.