A core thread of cognitive neuroscience research is deciphering the physiological make-up of epistemology. What are the root properties of visual knowledge? What are the neural mechanisms for interpreting sensory data? What exactly is the brain looking for when it looks at something? Michael Tarr, Cowan Professor of Cognitive Neuroscience at Carnegie Mellon University and co-director of CMU's Center for the Neural Basis of Cognition, is trying to parse the brain's visual vocabulary. "There's nothing in the visual world that's self-specifying," he notes. "Vision is inferential." Tarr's father Joel, whose work comes up in this interview, is the Richard S. Caliguiri University Professor of History & Policy, also at Carnegie Mellon. Visual artist and Triple Canopy contributor Benjamin Tiven (“Distant Objects Becoming Near”) speaks with Tarr about how we read images. This interview is part of Common Minds, a series of essays and conversations on the contemporary infatuation with the brain.
Benjamin Tiven: Tell me about the main areas of your research.
Michael Tarr: I work on issues within the visual system: how the brain receives and produces images. One major problem I've worked on is invariance, that is, how do you know that two seemingly different things are in fact the same thing, when they look different in an image. There are some nominal solutions for this in computer vision, but it’s still a fundamental problem. How the brain really does this remains unknown.
BT: And what are you working on right now?
MT: We've tried to elucidate what goes on during the chain of computation as information moves through the visual system. But now we're really focusing on what I think is the core, underlying question: What constitutes a "molar" or essential feature? What is the root building block in the visual system, once you get past early vision? Vision is a complex series of processes going from all the way back here [points to back of head] to all the way up here [points to eyes]. We do have some notion that things get more complicated as they go up the stream, but no one can tell you what the encoding is. What is it exactly that's being represented about an object, or a face? What are the critical features? How do we invariantly recognize it? That's the root question: How is the brain representing information about objects? Don't think of vision as a "camera" that takes images and makes them 3-D. It’s a cognitive process, where you take visual information and break it into building blocks or pieces that allow you to generate new objects or interpret objects in ways you hadn't seen before, or imagine them, or manipulate them.
BT: Is this analogous to a pixel, or to a bit—some kind of consistent, measurable root unit of data?
MT: Well, I'm trying to understand that—what the language of vision is, or the predicate or syntax of vision. I'd like to be able to know how we know that something we know is indeed what we know it to be. But to your question, no—I don't think there's a bit, or a pixel type of data per se, but there's a complex vocabulary.
BT: You mean, of shapes?
MT: Of shape, surface, materials, the whole thing.
BT: And the brain sees them all as one interlinked matrix of details?
MT: No, it creates that matrix. The visual world is just rays of light on the eye. Physically, the root unit of visual data would be a photon or a light ray that's striking the retina. But there's nothing in the visual world that's self-specifying. Vision is inferential. The data that you're given as a sensory being is very continuous—light on the retina, a sound wave striking your ear—and from those you infer structure about the nature of the world. But none of that, while in the data, is labeled as such. You have to infer all of it.
BT: So how does the brain store an image? A blind person can remember an image seen prior to their blindness. Does that mean their brain retains this same encoding platform of visual information?
MT: Gosh, who knows. That's another building-block question that we don't understand yet. We just don't know. My sense is that the way blind individuals represent space neurally and mentally tells us a lot about the potential of the visual system. That spatial representation seems to occur in blind people in similar parts of the brain as in sighted people tells you that the visual system isn't purely driven by visual input, but that auditory or navigational information also shape the process of conceiving and mapping space.
BT: Does your work account for understanding glitches in our visual system? That is, things like doubles, duplicates, forgeries, or discrepancies of resolution? These have all proven powerful triggers in the literary, aesthetic, or behavioral registers. But are they at all meaningful in a cognitive or neurological sense?
MT: As for the kind of similarity or overlaps you're asking about, I can only say that neurologically, coding information is related to the tasks one is doing. Sure, two different things can seem the same under the right conditions—it depends what the brain is looking for. Our monitoring of the world is really much less continuous and accurate than we think it is. Experience is the conversion of energy into data. The project of all life is to correlate the interpretation with the energy source, since the better your ability to interpret reality, the more likely you are to survive and pass on your genes. Now, how close or causal is the relationship between the energy we experience and our interpretation of it—that's a different question. In fact, something like illusion or magic is based on a discrepancy between the information we're taking in and our interpretation. Illusion occurs when there isn't a causal relationship between what we experience and what's actually there. The physical reality is different from what you end up experiencing, probably because your interpretive system has made bad assumptions.
BT: To what degree does your work overlap with the wider question of "consciousness"?
MT: [Shakes head wearily, leans back, deep breath] Everyone's into consciousness now, and they have all these ideas, and I mean, no one really knows.
BT: But at the end of all your research into molar features and procedural patterns, is there an extension of that logic to ...
MT: You mean how does it all get conscious?
BT: Yeah.
MT: Who the hell knows?! People write all these books, but they have no data one way or the other. Maybe it’s just a synchrony of oscillating neurons, all firing at the same pace, and that's what defines consciousness. You are your brain, and maybe what you become aware of is just some synchrony, some frequency. But there's probably not a consciousness "spot" that everything channels into.
BT: Is there a theory you find more plausible, or less laughable, than others?
MT: It seems likely that it’s just some read out of your brain state. Some temporal synchrony of firing neurons. This is what's happening in studies of attention, and perhaps that's the same as consciousness—that the way you deploy attention is by training neurons to synchronize. So maybe what you're aware of are things that are all in a particular cycle, together. So, your conscious awareness is just the baseline level of neural synchrony that's monitoring what the world around you is like. Maybe it’s that simple. It may be just some emergent property of whatever way the brain works.
BT: So do you think all the interest in consciousness is a false poetic, in that way?
MT: It may be. Consciousness is something people love to talk about, but until we have a better understanding of the brain, it’s hard to say, and a number of theories have decent claims to some plausibility. The brain is the most complex device in the known universe. So it’s probably true that most experienceable phenomena are due to complex and overlapping neural mechanisms and their attendant behavioral consequences. Nature? Nurture? Of course it’s a combination. But the brain is not a cleanly engineered device, it’s built on layers and layer of adaptations, and each one is kind of ad-hoc.
BT: That's exactly what your father said about cities—that, infrastructurally-speaking, they were dense, vast agglomerations of solutions to various now-lost problems, which had accrued over a tremendous span of time and which couldn't be disentangled from each other. Our brains retain all the engineering patterns that successfully passed evolution's tests on our ancestors. In some ways, urban construction is parallel, where earlier efforts to shape life or solve a problem—say, the sewage or power grid—are these systems we now have no choice but to work within, even if now they're not ideal. It sounds like cities and brains evolve in the same way.
MT: Well, sure. Different time scales, but sure. The structures that form our brain have been in evolution almost continuously for a billion years. Each advantageous mutation just takes you incrementally further from whatever structure was previously there. The brain we have now is the result of a random walk through a certain evolutionary space, and that random walk would have been a lot different if the mutations had been different early on. But the world is the way it is, and the brain has always been building itself in response to that world. You end up with structures that seem optimal for the problems they're faced with.
BT: This seems like a hard-science version of a more recent claim in some fields of spatial psychology or existential philosophy. In the book Human Space, Otto Friedrich Bollnow writes, "The concrete space of human life is organized by purposeful activity in such a way that everything has its assigned place. This spatial order is created only in the smallest detail by the individual. For the most part, we find it already present as a supra-individual order, into which we are born. But this too has come into being as the result of a purposeful human activity. It is in this purposeful form that the world becomes comprehensible to us, and only because of it can I move meaningfully in space." What do you make of this? Do you think it’s just a different expression of the same neurological basis we just described? Do you think that architecture might somehow map or reflect some neurological coding?
MT: Well, we build it, so it must reflect something about our cognition. But I wouldn't push that too far, and I don't know how strong a statement of it I would make. I am skeptical about how much architecture is really constrained or shaped by our cognitive systems, which obviously allow for a pretty wide range of architectural typologies and modes. I think it’s determined more culturally than neurologically. Also, four walls and a ceiling is a pretty efficient way to create a reasonably usable volume, in simple material terms—which would really point more towards physical or technological limits. Now, I do think that the more subtle aspects of architecture—the ceiling height, or the direction from which natural light enters the building—those might be more cognitively-driven. But I'm just guessing.