Stephen Wilson 0:05 Welcome to Episode 25 of the Language Neuroscience Podcast. I'm Stephen Wilson. Thank you for listening. It's been a while since I recorded an episode. Sorry for the delay. But I'm very excited to be back to the podcast and happy to say that the first episode back is a really great one. I've got some more guests lined up too. So there shouldn't be such a long wait for the next episodes. To make a long story short, the reason I didn't record any episodes lately is that I moved back to Australia with my family at the start of this year. There has been a lot going on. I moved to the US when I was twenty three with nothing but two suitcases. It was easy. I thought nothing of it. But moving with my whole family at this stage of life is a whole other thing. We're now living in Brisbane, which is where my parents and many extended family members live. And it's lovely to see them all the time. Instead of having to wait years at a time. I started a new position at the University of Queensland, where I'm in the School of Health and Rehabilitation Sciences. There are great new colleagues here who I'm enjoying getting to know and I'm looking forward to developing some new lines of research. I need to build up a new lab. So if you're a potential grad student or postdoc, please don't hesitate to get in touch. I really miss my friends and colleagues at Vanderbilt, which was an awesome place to work. But fortunately, we're continuing to collaborate on our Aphasia Recovery Project, even though I'm living on the other side of the world. Okay, on to the podcast. My guest today is Alex Huth, Assistant Professor of Neuroscience and Computer Science at the University of Texas, Austin. Alex uses functional imaging and advanced computational methods to model how the brain processes language and represents meaning. He's done a series of extremely elegant studies over the last 10 or so years. And we're going to talk about a few of the highlights, including a really cool study that is just about to come out in Nature Neuroscience with first author Jerry Tang, by the time you hear this, it will be out. Okay, let's get to it. Hi, Alex. How are you today? Alexander Huth 1:51 Hey, Stephen. I'm doing well. Thanks for having me on your podcast. Stephen Wilson 1:54 Yeah. Well, thanks very much for joining me, and I'm pretty sure that we've never met before, right? Alexander Huth 1:58 I don't think so. No. Stephen Wilson 2:00 Yeah. I mean, do you go to a lot of conferences? Alexander Huth 2:02 I mean, you know, in the before times, I went to like, SFN frequently and SNL every now and then. Stephen Wilson 2:10 Ok. Alexander Huth 2:12 But yeah, hoping to start going back again. But uh, yeah. Stephen Wilson 2:15 Yeah, I also haven't been to many for a few years and when I did, it would mostly be SNL, and usually only when it was in the US. So, I'm not very much out and about, you know. And yeah, so where are you joining me from today? Alexander Huth 2:30 I'm in Austin, Texas. Stephen Wilson 2:32 Yeah. Alexander Huth 2:33 Sunny Austin. Stephen Wilson 2:34 It looks nice. I can see outside your window and it looks like a beautiful day. Alexander Huth 2:37 It is. Spring has sprung. Allergens are in the air, you know. Stephen Wilson 2:41 Right. Yeah. I’m now in Brisbane, Australia. And it's also a beautiful day here. First thing in the morning and it's apparently fall, which we call autumn. I'm just trying to get my head back around the Aussie lingo. But you know, all the seasons are pretty much the same here. Alexander Huth 2:59 Lovely. That's that's how I like it. I grew up in Southern California. That's part of the course there. Stephen Wilson 3:03 All right. So you don't know anything about weather? Alexander Huth 3:05 I've learned a little bit in Texas, which kind of surprised me. But yeah. Stephen Wilson 3:09 Yeah. Cool. So, I always like to kind of start talking with people by learning about like, how they came to be the kind of scientist that they are and like, basically childhood interests and how that led you to where you are. So, you know, were you were nerdy kid? What were you into when you were a kid? Alexander Huth 3:24 A very nerdy kid. I don't know. That's, that's how it goes. Right? I was very into Star Trek. I was like really, really into Star Trek: The Next Generation. That was, that was my jam. Especially Data. Like, I loved Data. He was he was just, I don't know, a weird robot hero to me. So, I don’t know, I got this like kind of fascination with artificial intelligence based on that, and just generally liking science fiction. But when I was starting college, I was kind of in a slump. It wasn't a it wasn't a bad period. This is like the early 2000s. So, you know, I sort of looked around and it seemed like one of the really interesting ways forward in if we want to build machines that think like humans is to figure out how humans think. So, I started getting into neuroscience. I remember the the first neuroscience talk I ever saw was Mark Konishi describing the Binaural auditory localization circuit and I was like, this is really cool. I want to, I want to do this kind of stuff. So, I started sort of getting into, interested in neuroscience through that. Okay, so you're you went to undergrad at Caltech. Is that right? Yeah, yeah. That’s right. Stephen Wilson 4:42 Okay. And is that close to where you're from? Like you mentioned Southern California. Alexander Huth 4:46 Yeah. Yeah. So I grew up a little bit north of Los Angeles in a little town called Ojai, California. Stephen Wilson 4:51 Oh, right. Yeah. My wife is from Thousand Oaks, so I do know that area. Yeah. Alexander Huth 4:56 Right next door. Yeah. Stephen Wilson 4:57 Yeah. Alexander Huth 4:58 Yeah, so Pasadena was like a little ways down the road. And Caltech is a good school. I was excited to go there. So yeah, I like started doing neuroscience stuff. But I really kind of started enjoying things when I started doing research. So, that was in Christoph Cox lab, when he was with the Caltech, was working with blind subjects, we were looking at auditory motion perception and blind subjects, which was exciting and interesting and ended up being like a general paper in 2008. Stephen Wilson 5:05 Yeah. Alexander Huth 5:31 And then from there, there was, there was one real moment that was like a sort of eye opener, like, changed things for me was, Jack Ellis, who I ended up doing my PhD with. He came to Caltech, and he gave a talk, and it was about vision. It was about like, V2. what does the V2 do? How do we model V2. But the thing that he really talked about was, was this approach of just record what the neurons do when they see natural stimuli. Right, show images. Show images that actually are things that, you know, an animal might see, record what neurons do and then build models from that. Try to figure out, you know, if you get 1000s of these images, 1000s of responses from these neurons, can you figure out like, what is it about the image that makes this neuron fire, and something about, like, just that perspective of doing things, not from the very like, controlled experiment, kind of deductive approach, but doing this, this inductive thing, where you just kind of lean on computational modeling, and say, like, I'm going to let the model figure out, like, what this neuron is actually doing. I, I just became like, insanely excited about this. So when I was applying to grad schools, Berkeley and working with Jack was like, one of my kind of top interests. Stephen Wilson 6:55 Yeah. Okay. So, you know, I definitely have noticed that like you, you've leaned in very much to natural stimuli throughout your career. You even have a paper on it with, with Liberty Hamilton, specifically about that, but like, so that was really actually a driving force for you from the very beginning, right? And so, you know, what did you see as the advantages? And was there anything you were worried about giving up as in like, kind of, well, you weren't really getting away from controlled designs, because you never did controlled designs? But like, I'm sure you knew about the, you know, the literature and what were you worried about giving up anything as you moved in this new direction? Alexander Huth 7:32 Yeah. Yeah. No, certainly, I had been doing like actually visual, like psychophysics experiments. Mostly, it was like designing and running little psychophysics experiments while I was in Christoph’s lab. So, I had been like, kind of familiar with this, and learning how to do things there. But just the, the idea of this natural stimulus thing, where it's like, the work of doing the science of like figuring out what's going on, you kind of move it from one place to another, or you move it from the experimental design kind of phase into this modeling phase. And I just really liked that idea. Of course, you know, there are definitely things we give up, right? So there are correlations and natural stimuli and those are hard to break. And sometimes you can't tell like what is causing your actually this, this response. And sometimes you have to go in and like do kind of focused experiments to break those correlations, and then figure out like, what is actually responsible for, you know, what this brain area is doing, or what this neuron is doing or whatever. But overall, the this idea of just kind of, like replacing this elaborate, like, you know, let's test one little thing at a time with the big picture, like, let's just see how the whole system works. And then, you know, kind of let it sort itself out as we as we figure out how to model it. I just really like resonated with me. I love that idea. Stephen Wilson 8:54 Yeah, well, it's worked out really well. So let's kind of give our listeners a more concrete idea about what you're talking about when you're talking about these experiments using natural stimuli. And like I told you an email is the key, we kind of go through some of your work chronologically, because I do think it build, each paper kind of builds on the, each big paper kind of builds on the previous ones in a very satisfying way. So, though we might start with them a paper which appears to be your dissertation research, which is 2012, published in Neuron. Is that right? Alexander Huth 9:32 Yep. Stephen Wilson 9:32 And it's called a ‘Continuous semantic space describes the representation of thousands of objects and action categories across the human brain’. It's not quite a language paper, but that's okay. Like it’s, it’s, it’s semantics and that's, that's close enough for me. Alexander Huth 9:48 Yeah. Yeah. So I can give you kind of the backstory of this. So when I joined Jack's lab, this is like 2009. I thought I was going to be doing vision. I was interested in vision. I was like, this is, this is where the cool stuff is happening. It’s, it’s the vision world. I thought like, you know, the interesting problem is this mid-level vision problem. Like, we know how V1 works. We know, you know that there's FFA and PPA and whatever the high level visual areas, but like what's happening in the middle, like, what are the transformations that actually get us there? But Jack sat me down, like when I, when I joined the lab, this was just after the Tom Mitchell's science paper had come out. This 2008 paper, where they were building these encoding models in a lot of the same way as we do, for words. So they had shown people individual words, and then they had these feature vectors for the words and they were you know, predicting the brain responses to the words and so on. And I think Jack had seen this and was really excited by this. And so he sat me down when I first joined the lab, and he said, you know, I have a plan for you. I want you to work on language. And I said wait, I don't know anything about language like I've never done this before. And he said, Okay, so we have a good collaborator, Tom Griffiths, he knows a lot about language, you're going to learn this stuff, and you're going to do it. And I said, Okay, fine, let's do it. So, we started kind of going down that road, and like designing language experiments, which were a lot crummier than what we ended up eventually publishing. But sort of along that road, we started trying to build models of word representation, that was kind of what we were interested in at the time was like, these really like word embedding models. And it was early days of word embedding models, were mostly focused on things like LSA, its latent semantic analysis, which is, like 1989, or something like this right? Sort of classic word embedding models. So, I was building these models and I was collecting corpora of text and doing this kind of early computational linguistic stuff. And then, we didn't have good fMRI data for that yet. So we thought like, how do I actually test this? How do I figure it out? Figure out if it's actually working. And what we did have in the lab was a lot of data from people watching movies. Data from people like looking at images. Stephen Wilson 12:08 Yup. Alexander Huth 12:08 So we just thought, like hey, let's, let's try this out on on somebody watching a movie. Like, you know, if we're looking at images, right, we have some labels of the things that are in these images. Let's just plug that into a word embedding model and see if it works. And it worked really well, it turned out. It was it was quite effective. And so this kind of led, led me down this alternate path of like, fitting these kinds of models, we ended up actually kind of changing things up quite a bit instead of word embeddings, we were using these WordNets labels, which are… Stephen Wilson 12:38 Yeah, I mean, because in that first paper, it's not trivial how you would go from these movies that you had people watching these little movie clips, to, you know, your semantic representations, right? I mean, that's actually not trivial. Can you, can you can you sort of describe how that works, just so that people can understand the, the paper? Alexander Huth 12:58 Yeah, yeah. So, the experiment was we had people watch a little more than two hours of natural video, natural movies. This was a stimulus set that I actually didn't design. This is designed by Shinji Nishimoto, who had a fantastic paper in 2011, that was decoding video from, from human visual cortex, which is, like, just an incredibly impressive thing, even whatever it is, like 12 years later, I'm blown away by that work. And a lot of like, his ideas from that have carried forward into our into our current work. Stephen Wilson 13:29 Right. Alexander Huth 13:29 But, um, so we had this data around, we just didn't know like, what was actually happening in these movies, right, we needed to know like, what is this a movie of, so like, so we can use these techniques. Stephen Wilson 13:41 Yeah, I mean, as a human you know what the movie is of, but you need to quantify it for your models, right? Alexander Huth 13:45 Exactly, exactly. We need to turn this into numbers so that we can model it, right? So, I started exploring, like, you know, do I use Mechanical Turk to label this or something. And we tried that, and it ended up being just kind of messy, and the results were not great. So we ended up doing a thing, which I, I tell my students the story all the time, because I think it's a, it’s an example of like, just be a little bit less lazy than the next person and things can work out for you. I just sat down and labeled all the movies myself. So it took I don't know, like, two months or something, just like spend an hour, an hour or two a day like labeling, you know, a few dozen seconds of movie or something like this. So we labeled it one second at a time. Each second, we like wrote down, I wrote down like, what are all the things, and like, are there verbs that are happening or are there actions that are happening? So… Stephen Wilson 14:37 Yeah. Alexander Huth 14:37 This process took a while. Stephen Wilson 14:38 I thought so! (Laughter) Alexander Huth 14:39 But I felt confident in the labels because I did them. And I knew that they're like consistent across the whole sets, unlike the Mechanical Turk labels, which were very again, like messy in a lot of ways. So, once we had these labels, then, we could very easily kind of fit these models so we can convert, you know, if a, if a scene has like a, you know, a dog in it. We know that there's a dog there, we can convert that into some numerical vector that says, like, what is the dog? And then we can fit these regression models that predicts the fMRI data, predict how the brain responds based on these vectors of like semantic information. Stephen Wilson 15:19 Yeah, okay. Just quick meta point, like, I really hear you about, like, the time the laboriousness right, when I read the paper, I was like, that must have taken a really long time and wondering, I wonder, did he do that himself? Or does he, like, browbeat somebody into doing it. But it's so true, right? You have to especially like, early in your career, like you really do sometimes have to like to suck it up and do something really time consuming. And I've done that a bunch when I was, you know, earlier in my career, and now like, whenever I give my students some, some awful task, I sort of have this like, you know, pocket full of things that can tell them well, you know, when I was a kid, I did this, I carried my father was go on my back through the snow. So you can you know, transcribe this speech sample. Alexander Huth 15:59 Exactly, exactly. There's, there is a lot of value in that. I think, like, a lot of people just see something hard and like, give up or like, can I find some shortcut around this? And just sit down and do it oftentimes, which is not that bad. Stephen Wilson 16:11 Yeah, sometimes you just need to do it. Yeah. So okay, so you've got the movies, you've kind of like, you know, you've labeled them with words as to what's in them, and then you and then you mentioned you from those words, you get vectors of numbers that are going to describe the meaning of the words. So that's what you know, that's kind of an encoding like a, what do you call it? A… Alexander Huth 16:31 An encoding model. Stephen Wilson 16:31 Encoding model. So can you describe how you get that vector of numbers like, for those who have not seen that approach before? Alexander Huth 16:39 Yeah. Yeah. So, for this paper, we used WordNets, which some people might be familiar with. It's essentially a dictionary. So we have these unique entries that are tied to the definition of like this entry. Which, which is nice because then, this disambiguates like words that have different senses, right? So dog can be a verb, it can be a noun, there's like 10 different senses of dog, the noun. But you know, I know that like dog dot n dot 01 is the word I label for like, this is a, you know, a mammal, of the canid family, whatever. Stephen Wilson 17:18 Yeah. Alexander Huth 17:18 So, with WordNet you get that kind of sort of detail, but then you also get information about what kind of thing that is. So WordNet contains a lot of these hyponymy relationships. So like, a dog is canid. canid is a carnivore, a carnivore is placental mammal, placental mammal is vertebrate, etc. So you have this kind of chain. That's sort of an easy one, because it's taxonomy. But it covers all kinds of things, right? All, all, I don't know how many like 10s of 1000s of definitions, there are on WordNet, but it’s very expensive. So, we could label all these things in WordNet. And then we could use actually that information to help us out a little bit. So what we did is, one kind of simple thing you could do, is, say just there were 1300 something unique labels in this movie dataset. So, let's just convert each frame or each second of the movie into like a length 1300 vector, that is zeros, except for the places that correspond to like the categories that are present. Stephen Wilson 18:27 Right. So say there's a dog or not dog. Alexander Huth 18:30 Right, right. So there'll be a one if there's a dog in the scene and zero if there's not. So that's fine. That's that's an OK model. It turns out, it doesn't work terribly well as a model. Because it doesn't have a lot of information that is actually quite important. Right, so like, say, there's a dog in some scenes, and a wolf in other scenes, right? If you just had these as like one zero labels, then your model has no idea that like a dog is like a wolf. You have to separately fit weights for, you know, how does the brain respond to dog? And how does the brain respond to wolf? Which is kind of inefficient. But also it means that you can't like, generalize to new categories that you didn't see. Right? So if your training data contains a dog, but then you're testing this model on some video that contains a wolf, if you had no wolf training data, you just wouldn't be able to predict that at all. Stephen Wilson 19:21 Right. Alexander Huth 19:21 But if your model knew that, like a wolf is actually a lot like a dog, these are very similar things, then maybe you could guess that the response to wolf should be like dog, and then everything works better. So what we did in this model is essentially just add these hypernyms. So we extended it from like the 1300 categories that were actually labeled to 1705 categories, I think that were the full set, which included all the hyper names of the labels that we had. So instead of, you know, if the scene just had a dog in it, instead of just having one sort of indicator that there was a dog there, there would also be an indicator that there's a canine that there's a carnivore a mammal. Stephen Wilson 20:03 A mammal. Yeah. Yeah. Alexander Huth 20:05 Which, like later, we kind of actually worked out what this kind of meant mathematically in an interesting way. In that, this really actually kind of represented, we could think of it as like a prior on the weights in the model, that we kind of push closer together, weights for categories that are related. So, it would kind of enforce that, like the response to Dog and Wolf should be similar. Stephen Wilson 20:32 Yeah, by having a common covariant really with them. Right? Alexander Huth 20:37 Exactly. Yeah. It turns out that that model works much better. Like it's much better, like predicting brain responses. So, this is another sort of critical part of the kind of natural stimulus paradigm, I think, is that we build these models on natural stimuli and then we also test them by predicting brain responses to like new natural stimuli, which I think, you know, I argue this strongly in some papers. I think this is really kind of a gold standard for testing theories of how the brain does things. Right? It's like, we want to understand how the brain processes visual information, or how it processes language. Let's just record like, what happens when the brain is processing language and then let's say how well can we predict that? Stephen Wilson 21:19 Yeah. Alexander Huth 21:19 How well can we guess how the brain is going to respond to this thing? So…. Stephen Wilson 21:25 It’s fascinating how quantifiable it is, right? You know whether you're understanding things or not? Alexander Huth 21:30 Yeah, it gives you a score to just like, make it go up. Right? You can keep tweaking things and make that…. Stephen Wilson 21:35 And I think that's what we're going to see, as we discuss some of these papers today. Your models keep on getting more and more sophisticated, right? And so this is a pretty old school model at this point, this paper is ten years old, maybe even older, when you actually did the work. We've come a long way since then. But I do want to start here, because a lot of the concepts kind of run through it all and just the models get better. Alexander Huth 21:39 Yeah, yeah, absolutely. And I think they get better in a way that is quantifiable to us. Stephen Wilson 22:04 Yeah. Alexander Huth 22:04 Which I think it's very nice. Because it's, it's easy in a lot of ways. You know, to think of like, you know, for studying some psychological phenomenon, we can say like, here's a simple model, and then we're going to elaborate, elaborate and elaborate that model. And maybe it predicts like other little things about it, but it becomes unwieldy in a way. It's like, it’s unclear with like big, elaborate models. How well do they actually explain things? Whereas here, we have this like quantifiable metric, right? We can say like, it makes the number go up, that makes our prediction of how the brain actually responds to real things that we care about, better. Stephen Wilson 22:39 Yeah. Very cool. Okay. So, you've kind of explained how you get the numbers, how you turn the movies into numbers, that represent their meaning. Now, I can't remember if you said, you had, I think you have two hours of movies in the study. Is that right? Alexander Huth 22:54 That’s right. Stephen Wilson 22:54 And you don't have a huge number of participants. I think you have five participants. One is yourself and you've got another couple of co-authors in there. I noticed there is a participant called Jay G. But on careful examination, it is not Jack Gallant. Alexander Huth 23:09 It is not. That's right. Stephen Wilson 23:11 Did you try to get him into the scanner to be one of the….(Laughter) Alexander Huth 23:17 Yeah, he was scanned for some things, but not for this one. Stephen Wilson 23:19 So, a substitute JG in his place? Alexander Huth 23:22 Yeah. That’s right. Stephen Wilson 23:22 So yeah, kind of the classic psychophysics tradition of you know, small numbers of participants, most of whom are the authors of the study, because that's who will tolerate two hours of scanning. Alexander Huth 23:34 Yeah. Yeah. Stephen Wilson 23:35 Two hours might seem like a lot, but it's going to get more. So, anyway, how many explanatory variables do you end up having, at the end of the day, when you fit that to your data? Like it must be well over a thousand, right? Alexander Huth 23:45 Yeah, yeah. So, the feature space in that paper, goes to 1705 parameters. And then we also do a thing where, you know, if you're if you're trying to predict how the brain is responding to this, like ongoing natural movie, you also need to capture the hemodynamic response function. Right? And the standard way to do this, is just to like convolve your design matrix, which would be, you know, the 1700 dimensional thing with, with a canonical HRF. But, you know, thing that they had found in Jack's lab, before I even got there, this is really, I think, work by Kendrick Kay, that showed this very nicely. That doesn't actually work terribly well, especially if you have this opportunity to like measure how well are you actually doing? Like, how are you predicting? And it turns out that using the canonical HRF is kind of bad, or you're leaving a lot of like variants on the table. So, we use this approach the finite impulse response model, where essentially we fit separate weights for each feature for several different delays. So we're kind of fitting a little HRF for each feature. So… Stephen Wilson 24:54 Yeah. Alexander Huth 24:54 For the dog feature, we get a little HRF and so on. Stephen Wilson 24:58 Yeah. Okay, you have four different delays you like two seconds to each. So you're basically modeling the first eight seconds and letting the response take any shape it does in that time. Alexander Huth 25:06 Exactly. I think it’s really actually three delays. It was like, four, six and eight seconds. Stephen Wilson 25:11 Okay. Alexander Huth 25:12 We expanded to four delays later for language. Because language actually has earlier, earlier kind of take off, it turns out. Visual cortex has the slow HRF. Which, it’s kind of weird when you think about it because the canonical HRF is built based on like V1, and it turns out V1 doesn't have like the most standard HRF across the brain. It's quite different. Auditory cortex has like a very short HRF in comparison. Motor cortex has a different sort of style of HRF. Stephen Wilson 25:39 Yeah. Alexander Huth 25:39 There’s different things happening everywhere. But using this kind of method, it blows up your parameter space. Stephen Wilson 25:44 You have to multiply it by three in this case, Alexander Huth 25:47 Exactly. Yeah. The 1705 times three features. Stephen Wilson 25:51 Yeah. Alexander Huth 25:51 But it still works better than using the canonical HRF. Because you're capturing all these sort of details in, in the hemodynamic responses across the brain. Stephen Wilson 26:02 Yeah, I mean, I'd love to talk about HRF. I could talk about HRF for like an hour with you, but I probably we probably should focus on language. (Laughter). But ya, no, I mean, the the relevant thing here is that it turns what's already a large number of explanatory variables into a very large number. So, are there issues with fitting linear models that have 5000 explanatory variables? Or do you have enough data to do that? Alexander Huth 26:24 So, they're definitely issues. There’s always issues. There’s always issues in fitting linear models. I don't know. This is, in the time that I was in Jack's lab, I'd say maybe a good like, 80% of everyone's time and effort was devoted to this question of like, how do we fit better linear models? Like that was, that was really central to like, everything we did. It's weird, because like, that didn't look like the scientific output. Right? It's like, we didn't publish a lot of papers about like, how do you fit these linear models? It just ended up being like a tool that we used. But that was a massive amount of the like, intellectual effort there. It was just like, how do we do this well? Yeah. So you know, we have to use regularized linear regression. That's like a super important tool. There were a lot of different like styles of this that were used for different projects in the lab. It turned out for sort of vision models, if you're modeling like visual cortex, then one style of model works terrifically well. This like sparse linear regression, because visual cortex responses are actually kind of sparse, they only care about like one little part of the visual field. Whereas, for these like semantic models, that I was using, that actually didn't work that well, which was surprising. And the thing that worked really well was ridge regression, and that's been kind of the mainstay of everything we do since then. So this is a L2 regularized regression. I don’t know. We can…. Stephen Wilson 27:42 It’s definitely interesting. Yeah. And I think that the big picture point is clear. And it's just a kind of a, it's just an interesting general point of just how much of science often is just these behind the scenes details that make or break the papers. And when you read the paper, it just basically says, we did L2 regularize regression and what the reader doesn't always know is that like, that was a year of pain to get to that or, you know. So that's very interesting. I always, it’s just always fascinating to hear about the process behind these papers. Because like, I think a good paper like just read, it reads like that was the most obvious thing in the world to do. But like, sometimes it wasn't obvious at all. Okay, so you fit models, at every voxel, you fit these models and have 5000 explanatory variables based on the semantic feature representation and flexible HRF. And then what you find is that different parts of the brain or different voxels have very different response profiles. And you demonstrate this this in the paper with a voxel from the PPA, or the, stands for parahippocampal place area, and another voxel in the precuneus. Can you kind of talk about the different responses that you saw across the brain? Alexander Huth 28:52 Yeah, yeah. So that's kind of the second stage of this style of like encoding model science is, you know, we can fit the models, we can test them, we see they work well. And then we can say, like, what is it that’s actually causing some piece of brain to activate? Right? Like, what what are the features that are important to this, you know, this voxel this chunk of brain? So, you know, one thing we can do is just like look at the weights, right? So we just pick out a voxel and say like, what are the weights look like? There is high weights for the PPA case. I don't have in front of me, but it's like high weights for buildings and cars. So it like it likes sort of seeing things that are constructed and constructed in motion. Whereas the precuneus voxel, I think it's much more selective for like other people and animals and this kind of thing. So, you know, we can do that we can look in detail, quite a bit of detail at like individual voxels and say, like, you know, what does this voxel care about? What is this voxel care about? But that has its limits? There's a lot of numbers to look at there and there's a lot of voxels in the in the brain, right? And this is, you know, we're not doing this on groups of subjects. We're fitting this separately on each individual subject. Um, there’s a lot of voxels to look at. So, what we did instead was, we kind of tried to summarize the weights by reducing their dimensionality. So, we just applied like a standard sort of machine learning data mining technique, Principal Component Analysis, use that to squeeze down these things from 1705 weights we average across the HRF, down to just three or four dimensions. And say, like, you know, what are the major kind of axes of variation across, across the brain? Right? If we had to summarize what these voxels are doing, like three or four numbers, what does that what does that say? Stephen Wilson 30:37 And those, those top three or four dimensions, capture kind of interpretable aspects of semantics, right, in this paper, at least Alexander Huth 30:46 interpretable ‘ish’. So like, then interpreting those ends up being a whole, a whole thing. Like, that's, that's difficult and, like, contentious and yeah, yeah. So, I mean, they do separate some things that are sort of natural, and we would expect to see, so like animate versus inanimate categories. It's a big distinction, essentially, human versus non human. I think like, sort of buildings, artifacts, and vehicles versus other things. So these are the kind of like major dimensions that we see pop out. And one thing we did in that paper was also like we compared these dimensions to, there‘s been a lot of literature in the, the field of people looking at ventral, ventral temporal cortex of like, what are the major dimensions of representations. So, one of those was animacy. So this is from Jim Haxby’s lab. Andy Connolly had this great work showing that like, there seems to be this like gradient of animacy, across ventral temporal cortex. And that came out like very naturally, in our data. We really saw that, that there was like this, this beautiful, you know, animacy kind of dimension. Other things we found, like less evidence for, but yeah, we still got to kind of explore that space. Stephen Wilson 32:07 Yeah, like my notes of it. And this is just like my notes that I when I read it, which is a couple of weeks ago, whenever I wrote the first component, motion slash animacy, second component, social interaction, third component, civilization versus nature. fourth component, biological versus non biological. So, like you said, I'm not like trying to hold you to those. But I think it's kind of interesting that like, these are the big, big organizing principles. It kind of like a bit of an, it's a window into like, what's really important for humans, right? Like, these are the major axes, on which the semantic system is, you know, has the most variance. Alexander Huth 32:45 Yesh, yeah. And that's, that's something that I really like about this approach, too, right? Is that, like, we're not taking those things as given? Stephen Wilson 32:55 No. Alexander Huth 32:55 We’re not baking those things into our experimental design. We're just saying, like, watch a bunch of videos, and let's see what falls out. Right? Like, yeah, what are the differences across the brain? What are the major distinctions the brain makes? Stephen Wilson 33:05 Yeah. And the brain did. And he was surprised in this paper that the brain didn't care about object size, which actually, maybe, is not that surprising. Like, maybe it shouldn't, right? Alexander Huth 33:16 Maybe it shouldn't, that's one of these, like, sort of proposals for what was the major organizing kind of feature of, of visual cortex was object size, and we found less evidence for that. I don't know what the current kind of status of of that theory is. Stephen Wilson 33:29 Yeah. Me neither. Okay, so like, just kind of big picture was like, you know, can you describe, like, the what, which brain areas were responsive to these semantic distinctions in this study? Alexander Huth 33:43 Yeah, so big picture. You know, what we saw here was that this sort of semantic selectivity for visual concepts, was not isolated to just like these few areas in higher visual cortex. Which is kind of the picture that we had, loosely, from a lot of studies that came before this, right? So, we knew about like, the place selective areas, we knew about face selective areas, body selective areas. And those were kind of the major, you know, sets of things that we knew about. Stephen Wilson 34:15 Yeah. Alexander Huth 34:16 And those turned out to be quite important and very clearly, like come out in this kind of data. But we think of those more as like, you know, ticks in a terrain, right? So there's this actually complicated terrain, kind of all across higher visual cortex. Like if you go outside of retinotopic visual cortex, there's kind of a band of, of cortex that stretches around that, like all the way around the brain. And all across this band, we see selectivity for something for like some kind of, you know, specific semantic features. So that's kind of the majority of where this happens is sort of in this band, like around higher visual cortex. We also see quite a bit of stuff happening. You know, there's there's kind of these weird like spirits that come out of visual cortex. So, up to the to the pSTS, there is some visual representations up there, through the intraparietal sulcus and sort of onto the medial surface of the cortex, there’s another kind of spear of visual cortex. And then up in prefrontal cortex, there's also like a few selected areas that are quite visually selective. So there's some face patches and like the frontal eye fields are very visually responsive. Stephen Wilson 35:26 It's funny, you're calling them visual, but like, don't you think they're semantic? Alexander Huth 35:31 Yes. (Laughter) Because all of these things, you know, we've seen later are like, you know, they don't really care that it's vision per se. That's not quite true. So, FEF does care that it’s vision. FEF kind of only responds to visual stuff. Stephen Wilson 35:46 Frontal eye field. Yeah. Alexander Huth 35:48 Yeah.Intraparietal sulcus, it’s quite visual. So, that's kind of a gap in the other maps that we see. Well, lot of this stuff in higher visual cortex, especially, we call it visual cortex, because that's how we were studying it, but it turns out that that overlaps, like very heavily with other modalities of representation. Stephen Wilson 36:09 Cool. Yeah, I want to talk more about like, the anatomy of the areas that you find, but maybe best in the context of the next paper where I think it comes out more clearly. So shall we move on to the next one now? Alexander Huth 36:21 Sure thing. Stephen Wilson 36:22 So this is your 2016 Nature paper. And, you know, I've always noticed that, like, Nature doesn't publish fMRI, you know? Like science does, or they did you know, like, back in the heyday of fMRI, where, you know, every man and his dog was getting, you know, these high profile papers, it was only science that was buying the Kool-Aid, you know? Nature, the only fMRI papers they ever published was like, sorry, I'm blanking on the one, Logothetis et al., where, you know, they actually do simultaneous fMRI and, you know, direct cortical recording. I mean, that was good enough for Nature. But generally speaking, Nature does not publish fMRI. So…. Alexander Huth 37:05 We were pleased that this happened. Stephen Wilson 37:06 Congratulations. Yeah. If any paper was going to be in Nature, I think this is a worthy, a worthy one. Alexander Huth 37:15 Thank you. So, it was, it was a big effort, yeah. Stephen Wilson 37:19 Definitely. So in this one, like, it's definitely a bit more language than the other one. So, because the stimuli are language, right? So, can you tell us about the stimuli? Alexander Huth 37:28 Yeah, yeah. So this was, you know, what I said before is like, you know, I kind of started off, you know, we wanted to do language, we wanted to do encoding models for language. I did this kind of offshoot project into vision that was kind of using our models that we were designing to study language started to do vision. But at the same time, we were sort of continuing down this path of doing language. So, we’d done some, some tests with other kinds of stimuli. So, we tried things like having people read sentences with RSVP, just like one word at a time. That didn't work terribly well, it turns out, it does work fine. We were just kind of doing things poorly at the time. We were worried about timing. So we're worried about you know, like, when did the words happen? So, my co-author, Wendy de Heer and I, we really, like developed this experiment together. We spent a lot of time doing things like recording ourselves speaking stories, at a rate of one word per second. (Laughter) Which is the most mind bendingly, like awful thing to listen to. Like imagine, it's shocking, like just how boring like, verging on like painfully boring, that is. It’s awful. Stephen Wilson 38:03 It teaches us something about prosody, the fact that that's not going to work. Alexander Huth 38:44 Right? So, we did these, like one word per second experiments, terrible. But it was it was controlled in a sense that, you know, we knew when every word happened. And then Wendy said, why don't we just try something like actually natural. She had been listening a lot to The Moth, which is a storytelling, podcast and radio show. And she said, why don't we try listening to one of these stories? And I was like, you know, how are we going to deal with the timing? And she said, Oh, this is the thing that linguists have totally figured out. You need to transcribe it, use this Forced Aligner, you can figure out when each word is. Fine. And I was like, Okay, if you know how to do that, that's great. So we did it. We collected some data with, I think just the two of us listening to these Moth stories. In fact, just like one more story to start. And it just immediately worked extremely well. We got beautiful signal quality, like all across the brain. We do a sort of pretest on a lot of the stimuli that we use, which is we just have someone listen to the same stimulus multiple times. Right? We just like, have this person listen to the same story twice. Stephen Wilson 39:52 Intra-subject correlation, huh? Alexander Huth 39:55 Intra-subject. Yeah, exactly. So it’s not inter-subject correlation, intra-subjects. It’s just within each voxel, how correlated are the responses? Kind of a measure of like, how big are the functional signals that we're getting. Right? We've done that for these like, you know, one word per second stimuli. It was God awful. And then we did it for The Moth story, and it was just beautiful, like the whole brain was trying to respond… Stephen Wilson 40:14 But you don't know, yeah., sorry. You don't know yet. What's driving that? Right? If you're… Alexander Huth 40:19 Right. Stephen Wilson 40:19 I mean, it could be anything. It could be something really trivial, like, you know, auditory amplitude, right? But it's not, but you don't know yet, right? Alexander Huth 40:28 That was actually, what Wendy was interested in. So she was a grad student in Frederic Theunissen’s lab, they study sort of low level auditory processing, mostly in songbirds, but now also in fMRI. So when he was interested in sort of building these acoustic models, like spectral representation models of auditory cortex. So, she was totally fine with that. Yeah, but we didn't know why this activation was happening. So, we got these maps. We are like, this is beautiful. Let's keep doing this. So we just collected a bunch of data of us at first, just listening to these Moth stories, listening to lots of Moth stories. We collected a two sessions of five stories each of listening to these Moth stories. We went through this the transcription and alignment process, which I learned eventually. So then, you know, we had this information about like, every single word in the stories, and exactly when all those words were, were spoken, right? So we had this aligned transcript. And then we could start to do kind of interesting things, right? So, we also have phonetic transcripts, we can build phonetic models. So, just say, you know, our feature spaces, the English phonemes. What does each voxel respond to in phoneme space? We could build acoustic models with this data. So get sound spectra for the stories, and then use that to model the brain data. And we could get these semantic models, right. So this is using these, again, kind of primitive word embedding models, latent semantic analysis, and so on. And those worked very well across like a big chunk of brain. And that was, that was very exciting. That actually happened, I think in like 2012. So that was around the time that the earlier like movie paper was coming out when we had these first results that were showing that this is really starting to work and starting to show us something. Stephen Wilson 42:13 Okay, so in this case, you're not having to go through it laboriously identify what's in the like, because previously, you were looking at the movies and saying what is there semantically, right? Like, whereas here, you're just actually taking the words that are transcribable and then deriving semantic representations for the words automatically from that point, right? Like… Alexander Huth 42:35 Yeah. Stephen Wilson 42:35 So, it's a lot less labor intensive. Alexander Huth 42:38 Definitely. And I mean, the transcription process, I think it's the most similar to like the labeling process. Stephen Wilson 42:43 But it’s sort of much more deterministic. Alexander Huth 42:45 Yeah. It’s much easier, much easier. No, no judgment calls to be made, really. You can listen to something at half speed, if you're pretty good and just bang it out. Yeah. Stephen Wilson 42:57 Okay. So, yeah, I know. And it works a lot better, right? Like then, compared to the previous paper, like the semantic maps here, are just a lot more clear and expansive, right? Alexander Huth 43:13 Yeah, yeah. So, where the, you know, the visual stimuli really elicited these responses, like in this band around visual cortex, these stories stimuli, we got stuff everywhere, got stuff everywhere. It was all over, sort of, Temporoparietal Association Cortex, all the way from like, ventral temporal cortex, up through sort of lateral parietal cortex and down into the precuneus. And then also, all across prefrontal cortex, we found this strong, predictable responses to language, with these semantic features. So, this um, just worked really well, kind of quickly. One thing that at this point that we were like, kind of freaked out by, was, we just didn't find any asymmetry in terms of model performance between the two hemispheres, it was like the right hemisphere was responding just as much or, you know, our models were working just as well in the right hemisphere, as in the left hemisphere. Stephen Wilson 44:09 Sorry, when you talk about working, I really want to talk very much when I'm talking about this, because it's super interesting. I just wanna make sure you, understand what we're saying exactly. So, when you say working well, or you're talking about predicting held out data, or you're just talking about having a lot of variance by semantics? Alexander Huth 44:24 Yes, yes. Sorry, that I mean, predicting held out data. So we can…. Stephen Wilson 44:28 So, yeah, so you can, you're listening to two hours of stories and then predicting another 10 minutes. Alexander Huth 44:34 Right. Stephen Wilson 44:35 What the brain should look like, on a story that the model hasn't seen before and many voxels in the brain are really good at this, but not all, and then you make these maps of like, where are the voxels that are good at predicting where you, where you're able to predict. Because I mean, I guess the I mean, forgive me for breaking it down, like really simply but like nothing. If a voxel is not able to predict a held out data that probably means it doesn't have semantic representations. Because if, it if it doesn't have semantic representation then wouldn't be able to, because why would it? And then if it does, then it should, right? So, it's kind of a really good window into like, where in the brain there are semantic representations. Alexander Huth 45:11 Exactly. That's kind of one of, you just verbalized one of the core kind of tenets of this kind of encoding model style science. Which is that, if you can predict what a voxel is doing, on held out data, using some feature space, then we take that as evidence that, that voxel’s representation is tied up to that feature space. That it is related to that feature space in some way. And of course, there can be spurious correlations, and we see this and you know, we can try to explain those away in various ways. But basically, that's the kind of inference that we try to make. Right? So, so we found that like, right hemisphere, we could predict right hemisphere, just as well, as we could predict left hemisphere, there was no real asymmetry and friction there. I remember showing this to another grad students when I’d first found this and he said, nobody's gonna believe you in the language world. (Laughter) Like, too weird. If you don't find left lateralization, like, nobody's gonna believe you. Which, I don’t know, it has ended up being very interesting in terms of how people think about lateralization. So, um, Liberty Hamilton, who's a longtime collaborator of mine, and I'm married to, she also, you know, this is kind of a bugaboo that we have together is, you know, she's seen in a lot of her work in electrophysiology, that, right hemisphere, auditory cortex, like definitely represents language stuff, for perception, at least. And that's really what we saw here, too, is that like, for language perception, right hemisphere, it was engaged in sort of semantic representation to the same kind of degree as left hemisphere. Stephen Wilson 46:47 Yeah. And so what are you? I mean, do you think that is? I mean, I feel like, when I first saw your paper, that was what jumped out at me too. And I struggled with it briefly and then I came to terms with it. Um, so, it doesn't trouble my mental model anymore. How do you, how do you interpret it? I mean, specifically, how do you square it with the fact that aphasia only results from left hemisphere damage? Alexander Huth 47:18 Yeah. So, I mean, I think there's a broader question of like, how we square a lot of these results with the Aphasia literature, which is difficult, right? Because the literature says that, there's only kind of a small selection of areas that if you have damaged those areas that it causes, you know, this loss of semantic information, right? Loss of like, word meaning information. But we saw these, like, much broader, like big distributed things all across like prefrontal cortex, parietal cortex, whatever, this just really did not match what people had seen in the aphasia literature. And, you know, especially then for the right hemisphere, as well, right, we see it all over the right hemisphere, and that, again, just didn't match. So, I think of this in kind of two ways. So, one is that, you know, what we're showing here is kind of what types of information are correlated with activity in these voxels? Right? So, you know, if somebody is listening to a story, and the story is talking about, you know, the relationships between people and so on, and you're trying to process that information, then there's a bunch of voxels that become active, there's a bunch of brain areas that become active that start to turn on in the presence of this kind of social information. That doesn't necessarily mean that like, those are the areas that, you know, link, the words that you're hearing to the meaning that you're hearing. That maybe downstream of that, and I think most of them actually are downstream of that. They’re involved in like, some kind of cognition around this concept, but not necessarily like, just the the process of like linking words to meaning linking meanings of words together. Stephen Wilson 48:58 I think the other linking is so critical here. Yeah. Alexander Huth 49:02 Yeah. So we kind of can't disentangle that here and I think that is probably one of the real kind of drivers of what people think the Aphasia literature. And I know, this is a popular topic there. The other thing that I kind of point to, in trying to square this with with aphasia, is the fact that we see quite a bit of like redundant representation across cortex, right? We don't see, there's just, you know, one patch of cortex that cares about the social information are one patch of cortex that cares about, I don't know, time words, for example, it's another category that we saw it kind of pop out. It's actually you have many patches to care about these things, right? We have a whole kind of network for each of these kinds of concepts kind of topics. So, I think it's very plausible that like, even if these areas are really like causally involved in representing and processing that kind of information that damaging Some of them won't be sufficient to actually cause the deficits that you would see in aphasia. Stephen Wilson 50:05 Yeah. Okay. Yeah. I mean, I definitely agree with your interpretation. I mean, I think I would put it like you are really studying thought here, right? In a way, like it's out like downstream of these links, right. So I think we think with both hemispheres, and you know, just because language well, the link with the lips but the links are in the left hemisphere, right? So if you took a person with aphasia, and did your study on them, like a severe receptive aphasia, maybe, I think you probably wouldn't see those semantic representations in the right hemisphere, either, even though the right hemisphere would be intact, right? Because it would never get there, because the links which are left lateralized would not exist, and that would, and therefore you wouldn't be able to generate that those semantic representations from the linguistic input. Right? So I actually don't think it's inconsistent at all on, on second thought, although like, you know, before your paper came out, I think a lot of us kind of thought, oh, yeah, the semantic representations left lifeless, but I don't know if I thought, I hope I didn't think that because it would be silly. Because, you know, one thing that I know from working with people with aphasia is that they understand the world that they live in, right? They're not like walking around confused, and not knowing what's going on. Alexander Huth 51:20 Absolutely. Stephen Wilson 51:21 It's a language, I mean, it's a language deficit. And the only patients that don't understand the world around them, and neurodegenerative patients who have bilateral damage, that specifically like semantic dementia, like when it's advanced, or you know, Alzheimer's when it's advanced, right, but like, it'll any kind of lateralized brain damage the you don't really get real semantic deficits, like, you get like semantic control deficits. They can't do semantic tasks, maybe for a myriad of reasons. But you know, you never get somebody that doesn't understand what's going on. So it actually makes total sense. Alexander Huth 51:57 It does. Stephen Wilson 51:58 It's really, it's nice to kind of square it. Because yeah, it definitely like struck me. That was the thing that really leapt out at me was how bilateral it was and symmetrical. Alexander Huth 52:09 Yeah, so it's definitely symmetrical in terms of what we've been talking about of just like how well can we predict these areas. But it turns out that it's actually not super symmetrical in terms of exactly what information is represented there. Stephen Wilson 52:22 Oh, really? Alexander Huth 52:22 So, we see a little bit of a hint of that in, in this 2016 paper, where there's an asymmetry in terms of representation of specifically, like concrete words, concrete concepts, that seems to be more left lateralized than right lateralized. So, it's like the representations are as strong, in away, but they're just kind of of different things. In more recent work that we've yet to publish, but we're very excited about which is doing a similar kind of thing, like we did in this paper, but using more modern like language models. So we're looking at sort of phrase representations instead of word representations. We see that this asymmetry is really pronounced in terms of sort of representational timescale, where the right hemisphere seems to represent sort of longer timescale information than the left hemisphere. Stephen Wilson 53:13 That's interesting. Alexander Huth 53:15 It’s maybe tied into this like concrete abstract distinction, which is also sort of associated with timescale. This is my student Shailee Jain, who's working on this. We're very excited about what this is gonna show. Stephen Wilson 53:26 Okay, cool. Yeah, actually, the next thing I want to talk about is one by by her, but not not the one you're mentioning, I think. But um, yeah, one more thing about this one, before we move on from it, well, two more things, actually. So, the lateralization might have come as a surprise to language people, but the within each hemisphere, the areas that, where you see the semantic predictability match pretty well on to what the language neuroscience community had kind of settled on as the semantic network of the brain, right? Alexander Huth 54:00 Absolutely. Yeah. No, I really love this, I think it’s the 2009 review paper from Jeff binder. Stephen Wilson 54:06 Yeah, it's one of the most useful papers that I go back to again and again. Just love that one. Alexander Huth 54:13 It's beautiful. I love how they break down, you know, the different types of experiments and you know, what, what kind of approaches they like and don't like. But there's a figure there that I often show in talks, which is, it's just a brain with little dots on it in every place that's been reported as like an activation in some paper for some semantic task and it matches so well what find in this work, right? Stephen Wilson 54:36 It does. Alexander Huth 54:36 The entire prefrontal cortex, the entire kind of parietotemporal cortex. I often say this is… Stephen Wilson 54:42 Midline areas, too, you know? You have the same midline areas, like you both have like, you know, you've got your medial prefrontal, and then precuneus in the middle and not yet like, it’s not obvious at first glance, because you guys use flat maps right? So everything is like kind of flattened out, because you're at, this is a way you can tell you you grew up in vision, because like… Alexander Huth 55:03 Exactly. Stephen Wilson 55:03 You use flat maps, but as soon as you, like, just take a step back, you realize, oh, that's just the semantic network. Alexander Huth 55:10 Exactly. Yeah, I don't know, sometimes I say it's like, it's easier to say the parts of the brain that don't represent semantic information, which is like somatomotor cortex and visual cortex. Like those are, those are the big ones. There's like two big holes in this map. Everything else, kind of cares to some degree or another. Stephen Wilson 55:29 Yeah. Okay. Oh, yeah. The other thing I wanted to talk to you about this paper, like, you know, it's just like a masterpiece of visualization among all, among everything else, right? Like, it's just so, the figures are so beautiful, and you've got all these like…. Alexander Huth 55:45 Thank you. Stephen Wilson 55:46 You’ve got all these three dimensional animations that can be found on the web. Can you kind of talk about that aspect of it? Like, is that something you really enjoy? Like, what kind of, how did you develop those skills? Like, you know, there's definitely a piece there that’s like, pretty special. Alexander Huth 56:01 Yeah, I I love it. I love visualization. I, yeah, I think partway through writing this paper, I went to seminar like an Edward Tufte seminar. And so I started trying to do everything in Tufte style, whatever, I love it, I love his idea of already call it like super graphics, super infographics like something that's just like, incredibly dense with information that you can stare at for a long time, and you keep seeing new stuff. I really liked that idea. And so that's what I kind of tried to replicate here. So a lot of this work is really based on the visualizations are based on a software package that we developed in Jack's lab, pi cortex. So this is really led by James gal who was a grad student there with me, like, brilliant programmer, like, polymath, he can do so many things. So you know, he's a neuroscientist, but then he's also, you know, able to write this very massive amounts of low level code for showing these brain images in a web browser. Really just like, fantastic stuff. So I worked with James on developing this pipe vortex package, then a couple other people in lab, especially Mark Lavoie was a was a big driver of this as well. And, you know, so because we were also developing the visualization package, alongside like doing the science, we could make it do anything that we wanted to do, right, like any idea that we had, like, we should make it look like this, we can just do that we could like spend some time and implement that. So I really liked the sort of close commingling of those two things like developing the visualization software and the science at the same time. And I think that's, I think it's nice. I think it's powerful. Stephen Wilson 57:45 Yeah, it's very powerful. I mean, because the data is so multidimensional, right? And like, you can't really use conventional, if you try to display it with conventional tools, as you're not going to be able to convey it. You know, it's so funny that you mentioned Tufte, right? Because like, I love Tufte as well, like my wife, actually, she's a librarian, and she, I think she introduced me to Tufte, a long time ago. Like, she gave me a Tufte book for Christmas, probably 15 years ago. It's ‘The visual display of quantitative information’. You probably have read it. Alexander Huth 58:14 Wonderful book. Wonderful book. Yeah, I read that in college. My roommate got it, and I was like, this is cool. Stephen Wilson 58:17 Yeah, so she gave it to me for Christmas. And I was just like, transfixed by it. I’d spent the rest of the holiday reading it and studying it. And I just went back to my work, so invigorated and I worked on this paper, it was 2009 paper in NeuroImage, about support vector, predicting PPA subtypes. But all the visualizations like I was now like, really inspired by this Tufte book and I like, spent a lot of time on those figures and I’d be thinking how would Tufte make this figure. Alexander Huth 58:46 It's nice. Stephen Wilson 58:46 That's cool. And so yeah, you guys use Python, like, what do you have like a, what other kinds of technical infrastructure do you use for developing your stuff? Alexander Huth 58:59 Yeah, so um, it’s mostly Python, but, you know, a lot of the visualization is actually done in the web browser, like through JavaScript. So this was, I forgot back in like, 2013, maybe. James had been working on PI cortex for a little while and we were trying to figure out, like, we were trying to move away from Mayavi, which is like a 3d display library that is very powerful, but also, like, very clunky and like hard to use in a lot of ways. And I said, like, let's make a standalone, you know, like, 3D application that you can run on your computer, you know, in Python, whatever. And James said, no, no, let's let's do it in the browser, let's like actually send information to the browser. And I was like that, that's crazy. That seems really hard. Like, why would we do that? And he completely ignored me, which was like 100%, the right thing to do, and he wrote this thing, that was Python interacting with the browser and 10s of 1000s of lines of JavaScript code to display these brain images. But it's, it’s just fantastic. So, you can interact with the brain images in all kinds of fun ways. We can build these like interactive viewers like we have for this paper where you can click on different parts of the brain, and it'll show you exactly what that little piece of brain is doing. So, I think that was, that was a really big part of it, is like getting it all to work in the browser, because that it's also very easy to share with other people. Stephen Wilson 1:00:20 It’s transportable. Yeah. Yeah, that makes sense. Like, it's funny in my, in my lab, we're also developing this portal into our aphasia data, which is not released yet. But we will be working on it for a while. And just like you, I was like, originally envisaging standalone application and I was telling the SLPs in my lab, who collect all this data, like, oh, this is what I want to do, I want to like have an application, you know, you'll install it on your computer, and then you can look at that data, and they would just looked at me like, what's wrong with you? You know, like, you'd have to install an application, like what, you're going to download? How would you, you know, what is that even like? Alexander Huth 1:00:55 What is that? It’s the 90s? God! Stephen Wilson 1:00:57 And they convinced me to do it in the web. And yeah, like, that's what we've done. And it's now built in JavaScript, and it's going to be a lot more accessible as a result. Alexander Huth 1:01:06 Nice! Yeah, absolutely. Stephen Wilson 1:01:08 Okay, cool. So, let’s move on to the next paper, if we can. This is by Shailee Jain and yourself in 2018. Called ‘Incorporating context into language and coding models for fMRI’. It's, I think it's an underappreciated paper. I mean, it's got like a bunch of citations, but I hadn't seen it before. It's published in, you know, kind of a CS… Alexander Huth 1:01:33 Conference, yeah, NeurIPS. Stephen Wilson 1:01:35 But very interesting. Because I think it's like the first fMRI paper, fMRI encoding paper that like actually takes context into account goes beyond the word level. Right? Is that right? I don't think there is anything previous. Alexander Huth 1:01:51 I think so, there had been one CS paper. Stephen Wilson 1:01:53 A MEG study, by Leila Wehbe. A few, but that's, MEG. Alexander Huth 1:01:58 Yeah. Leila had done this in Meg in like 2014. Leila is a close collaborator of our lab, we do a lot of work together on this kind of stuff. There was one other like CS conference paper, I think, from like the year before, that, it was a little bit messy. The results were kind of mixed. But yeah, I think we were at least one of the first to really use these neural network language models, which were new and exciting at the time. So, this is actually the first paper out of my lab, like Shailee was my first grad student. This is her first project in the lab. (Laughter) It was to try out these new things, which are neural network language models. Like, let's see what they do. We've been doing everything with these like word embedding vectors before that, which are great, they're beautiful, you can reason about them in really nice ways. You can interpret them in nice ways. But of course, they, you know, they're just words, right? You're predicting the brain response to somebody telling a story. But you're just actually individually predicting the response to each word then saying that, like the response, the story is the sum of the responses to the words, which is just obviously blatantly false, right? Are those models couldn't capture the kind of richness of language. So, this was right at the time, when language models were starting to become exciting in the computer science like natural language processing worlds. Right around when these first language models, ELMo and BERT came out. And so this is the days of like Sesame Street language models, which, people have found were really interestingly useful. But if you just train this neural network to like, predict what the next word is in a piece of text, or predict like a random masked out word from a piece of text, then it's kind of forced to learn a lot of interesting stuff about how language works. Stephen Wilson 1:03:47 Yeah, can we just, yeah, can we just flesh that out a little bit, because this is so fundamental to all this work. And I think everybody's heard of ChatGPT, and so on, but I don't know how many of us, sort of understand that how much the fundamental technology is built on this concept of predicting the next word. So, can you like just kind of try to explain that a little more detail like, what these models, what's their input? What's their? I mean, okay, not that much detail. Obviously, it's a podcast, not a CS paper. But, but so what's the input? What's the output? And then I guess, why? Why does what's the architecture? Alexander Huth 1:04:25 Yeah. So, it's pretty simple. You have a big corpus of text, right? So like documents that are 1000s of words long potentially, say all of Wikipedia, for example, this we use to train some of these models. You feed the model, the words, one by one. That reads the words and at every step, every time you feed it a word, you ask it to guess what the next word is? And it guesses like a probability distribution across all the words of like what it thinks the next word might be. In these early models, we used recurrent neural network specifically LSTMs, long short term memory networks, which were pretty popular as language models at the time. So, this network, kind of you feed it words one at a time, and it keeps its state, right? So, what it will sort of produce, what it computes at each time step is a function of like, what the word was that came in, and what its state was at the previous time step. So, it combines those two things to try to guess what the next word is going to be. So, this seems kind of elementary, right? It's pretty simple. You just guess what the next word is. But, you know, what, turned out to be really cool about this and why people were so excited about this and natural language processing world was, you know that in order to do this effectively, in order to guess the next word, accurately, you need to do a lot of stuff, you need to know how syntax works, you need to know about parts of speech, you need to know a lot about how semantics works, right? You need to know like, which words go together? You know, what are… Stephen Wilson 1:06:02 What are the representations of the individual words here? Because they, you still have to vectorize the individual words, right? Alexander Huth 1:06:08 Yeah. Yeah. So, so, in this model, there's a like a word embedding layer. That's like the initial thing. So we go from a vocabulary of, I don't know, 10,000 words, down to like a 400 dimensional vector. And we're embedding is also learned as a part of this model. Stephen Wilson 1:06:25 How does it learn that? Alexander Huth 1:06:27 Back propergation. It's a key to all this. So, you start off with the words being one-hot vectors. That's actually a lie. Let me, let me back up. You start off with assigning a random embedding vector for each word. Stephen Wilson 1:06:44 Oh, okay. Alexander Huth 1:06:45 And then those embedding parameters, just like the values in that embedding, you can compute a gradient, you compute this derivative that's like, you take the loss at the end of the model, which is, how wrong was it in predicting the next word, and then just take the derivative of that loss with respect to these embedding parameters, and then use that to, you know, change the embedding parameters a little bit, and you keep doing this for 1000s and 1000s of steps, and it learns these word embeddings. It learns like, very effective word embeddings. Stephen Wilson 1:07:12 Okay, I didn't understand that. I had thought that, that you still had to kind of put in like, kind of an old, old school like word embedding as the first step that you're saying that's actually part of the whole architecture that kind of just comes down from the same algorithm that gives you the sensitivity to past words, is also developing the representation to each individual word. Okay, I didn’t understand that as well. Alexander Huth 1:07:36 in the early days, a lot of people did this, where they initialized with, with preset word embeddings. But, it was found pretty quickly that as long as you have sufficient data to train the model, things work much better, if you like, train the word embeddings at the same time as the rest of the model. Stephen Wilson 1:07:50 Okay. Yeah. So, just I'm sure you've looked at this. I don't know if it's in any of your papers, but it probably is. But like, if you use the kind of just getting away from the contextual aspect of this paper, if you use word embeddings that are derived in this way, does that work much better than the sort of ones that you use in your 2016, Nature paper? Or does it work much the same? Alexander Huth 1:08:12 It's about the same, they end up being actually very similar in a lot of ways. Stephen Wilson 1:08:16 Okay. Alexander Huth 1:08:16 The, the, yeah, it's interesting. They're, they're like, a bunch of different ways of generating word embeddings. I could geek out about this for a long time. But in the in the very old school word embeddings, like latent semantic analysis are generated by looking at how words co occur across documents. The newer things like Word2Vec and GloVe are also looking at word co-occurrence Word2Vec is actually like a neural network model was trained to do this. GloVe is just a statistical thing. The word embeddings that we used in my 2016 paper were bespoke thing that I came up with, but they capture really the same kind of thing, right? It was it was just using kind of predefined dimensions instead of these, like, learnt dimensions. And the word embeddings that you get out of neural network models are super similar. Like they they act very similar because they're, they're just capturing the same thing, which is right. Stephen Wilson 1:09:07 So, co-occurrence is a big part of it, right? Alexander Huth 1:09:09 Yeah, definitely. Stephen Wilson 1:09:10 Okay. Okay, so now I understand that. So, you start you have random representations of the words that are learned to become differentiated, kind of based on character, to represent the semantics at the end of your words, and then you've got hidden layers that are representing previous words that have been saying, like, what kind of range does, do you look at in this paper? Like, how far back can it look? Alexander Huth 1:09:33 Yeah, um, I think, we look back only like 20 words, here. So, we're manipulating like, how many words go into the model? Like how many words of context is it see before, before the current word? And what we found is it basically like the more words we feed in, the better it gets, right? As we, as it sort of sees more context, the representations are better matched to whatever's happening in the brain. Our model predictions get get better there. So we can use this to kind of like, look at, in a course way, context sensitivity across cortex, like which parts of cortex, you know, really, you know, are affected by information, 20 words back and which ones maybe only care about, like what the most recent word is. So, you know, we see kind of things that we'd expect to see, like auditory cortex only cares about the current word, for the most part, right? It's mostly caring about like the sound of the word. So that's not unexpected. Whereas areas like precuneus, TPJ really care more about, like, what happened a while back, or maybe some integration of information across a couple dozen words. Stephen Wilson 1:10:38 Right. Yeah. So you can kind of map out the language, well the semantic network, I guess, I'd say, in terms of how deep it is, and like how, how contextually dependent it is, like going from single words to longer strings. And most of these models are outperforming the single word models pretty much throughout the brain, right? Alexander Huth 1:10:59 Yeah, very handily, which was very, like that was just a central exciting result to us, right, that we'd had these word embeddings. Honestly, our word embedding models had been fixed since 2013, at that point, so it was like five years of just messing with word embeddings and finding nothing that work better, right? We tried 1000 different variations of word embeddings. And like, nothing actually… Stephen Wilson 1:11:18 I guess. Yeah, my previous question you just had basically told me like, yeah, these state of the art single word representations don't really do better than your bespoke ones from from the 2016, Paper. So yeah, that so that wasn't really a, an avenue for improvement. But this is. Alexander Huth 1:11:32 Yeah, yeah. But this, like, instantly was just better. Was like head and shoulders better than the word embeddings. That was really exciting, right? This This was the first time that we were getting kind of getting it representations of longer term meaning, which is something that of a course, we want to look at….. Stephen Wilson 1:11:52 Yeah. And combinatorial meaning presumably, yeah, it's getting more and more like, like, as you go further and further, you're getting more and more language. You know? (Laughter) Alexander Huth 1:12:02 Exactly. And it keeps working better better at predicting the brain. Right? That number go up…. Stephen Wilson 1:12:06 Yeah. Not surprising. Yeah. So this is a very cool paper. Showing, you know, the how much you gain by adding context? Alexander Huth 1:12:15 Yeah. Yeah. It's kind of like I think it's like, kind of known to people in our like, little fields. But um, most neuroscientists don't read these CS Conference papers. So I think a lot of people just like haven't, haven't seen it. Stephen Wilson 1:12:27 No, it has plenty. Like I said, it has plenty of citations like well over 100. But like, I don't think that, I mean, I hadn't seen them until I started, like, you know, looking in more depth, so that I could talk to you. That's a shame, because it's really good. Alexander Huth 1:12:42 Good. Thank you. Stephen Wilson 1:12:44 So let's move now to a paper that is not currently published, but will be published by the time people hear this podcast. This is Tang et al. And by the time you hear this, it will be just out in Nature Neuroscience. Alexander Huth 1:13:00 Yep. Stephen Wilson 1:13:01 Super cool paper. Can you tell me about what you've done in this one? Alexander Huth 1:13:06 Yeah, yeah. So, this is, this is our decoding paper. So, this is, we're no longer focused on encoding. I'm just trying to predict how the brain responds to language. We're now trying to reverse that right to take the brain responses and turn that into like, what were the words that the person was hearing? Stephen Wilson 1:13:23 Okay, you're reading minds? Basically. Alexander Huth 1:13:26 We try to avoid that term. But, but yeah, same idea. So, our approach here is really, it's driven by things that were done back in Jack Gallant's lab. So, Shinji Nishimoto, in particular, who done this like video decoding work, he developed this whole framework for, he and some other folks there, Thomas Naselaris and Kendrick Kay in particular, they develop this framework for how do you turn an encoding model into a decoding model? Right? We know how to build these very good encoding models. But if you want to do decoding, if you want to figure out like, what was the stimulus from brain activity? How do you do that? And just if you fit a direct decoding model, where you like, just try to do regression in the opposite direction. So, take the brain data as your input and your stimulus as the output. That ends up to not work or be like very difficult in a number of ways, mostly to do with sort of statistical dependence between things in the stimulus, like if you're predicting multiple stimulus features, you're not predicting them in a way that actually respects like the covariance between those features. And that ends up being pretty important for getting this stuff to work. So, in this paper, we use this kind of Bayesian decoding framework that they developed. The basic idea is, you just kind of guess. So, we guess like what might the stimulus be? What words might the person have heard? And then we can check how good that guesses by using our encoding model. So, in this paper, we had you know, this, we've had a couple of years of advancement in language models. Just an insanely rapidly developing fields right now. So, when we started working on this decoding stuff, we were using GPT, like the original OG GPT, from 2018, 2019. And that's what we ended up like, that’s in the published paper. Of course, there's, you know, things have changed a lot in the intervening years, but it still is good enough for, for this to work. So, these GPT based encoding models works terribly well. It's doing more or less the same thing as the language models in Shailee’s paper, in fact, she developed these GPT encoding models. Stephen Wilson 1:15:33 Yeah, can we just pause and get a little detail on that? So, you know, everybody's heard of ChatGPT, but can you sort of explain like (a) what does it stand for and (b) how does it differ from like, what's the crucial difference between that and the long short term memory models that you used in the 2018, paper? Alexander Huth 1:15:56 Yeah, so GPT is a generative pre trained transformer. That was its original moniker. And the basic idea is, it's using this different architecture. So, it’s no longer using a recurrent neural network. It's using a network called a transformer that was invented in 2017. Stephen Wilson 1:16:15 By Google, right? Alexander Huth 1:16:16 Yeah , yeah. Ashish Peswani, who is actually an author on Leila’s RNN predicting MEG paper back in 2014, which is an interesting detail. Or he was like one of the one of the authors on the original transformer paper too. So, transformers, they use very different mechanisms than recurrent neural networks. They use what we call an attention mechanism, the self attention mechanism, were essentially to try to build up some representation of each word or each token in an input. The model can like attend across its inputs attend across its context. So it can pick out you know, information from many words that have come before and use that to inform what it's doing right now. And, you know, what's, what's really different about this, compared to recurrent neural networks, is that transformers don't have the kind of limited memory capacity that RNNs have. And I think that's that's really one of the fundamental things that makes them work so darn well, for so many things. You know, the recurrent neural network, you feed it one word at a time, and then it has to pack information into its internal state. So, it maybe has like 1000 dimensions of like internal states, right? That are like, that's its entire memory that's like everything it knows is in those 1000 dimensions. And you know, if you feed it hundreds of words, it has to, you know, if it wants to remember something about those hundreds of words, it has to pack it all into that 1000 dimensional vector. Stephen Wilson 1:17:44 Right. Alexander Huth 1:17:44 So, it's hard to do, it's hard to do. And especially because the kind of supervisory signals at long ranges end up being very sparse in language like, it's rare for something 200 words ago to really influence like what the next word is in a piece of text. It's very important when it does, but it's it's pretty rare. So, that ends up being kind of too weak a signal for RNNs to really pick up on it. But these transformer models, they can just kind of arbitrarily look back, they can say like, you know what, from anything in my past was relevant to this thing that I'm looking at right now. And then just pick that thing out and use it. And that means that it doesn't have this limited capacity, in the same way. It has a like much greater memory capacity, working memory capacity, effectively. Which just makes it like incredibly powerful at doing these things. There's also other reasons why transformers have kind of taken over this world now. They end up being extremely efficient to train and run on our current GP hardware, which is kind of like a weird reason why this model will be very good. But it's a you know, a technical reason why people could train much bigger transformer models much more effectively than like big RNN models. They've really like taken over this world now. Stephen Wilson 1:18:57 Okay, now that was really useful. Alexander Huth 1:18:59 I teach a class on neural networks in the computer science department here. I just finished that like last week. And our last module was on transformers, which was a lot of fun. So we talked about how transformers work. Stephen Wilson 1:19:10 Okay, I would like to, I'd like to order that. Alexander Huth 1:19:14 It's fun. It's fun class. Stephen Wilson 1:19:15 Yeah. I bet. Okay. So let's start. Yeah, let's get back to your study. So, you know, how are you using using GPT models instead of the RNNs. And yeah, you were just telling me, you're about to tell me about well you were telling me about the challenges of encoding rather than decoding, right? Alexander Huth 1:19:37 Yeah, yeah. So um, you know, we replaced the RNNs with these GPT models, the transformer based models, again, we get a big boost in encoding performance, we can predict the brain much better. But now what we're going to do is try to reverse this do the decoding. So, we are using this Bayesian decoding approach where essentially we're just like guessing sequences of words and then for any guessed sequence, we can use our encoding model to say like, how, how well does this match the actual brain data that we see? Right? So, we get a sequence of words, we predict how the brain would respond to that sequence of words and then we compare that prediction to the actual brain response that we observe. Stephen Wilson 1:20:15 Yep. Alexander Huth 1:20:16 Like, this is kind of the core loop in this method. And then…. Stephen Wilson 1:20:19 it's called a beam search algorithm, right? Alexander Huth 1:20:23 Yeah. So beam search, is really like we keep multiple guesses sort of active at a time. We guess like, what's the next word in each of these multiple guesses, and then we throw out like the worst ones, but we keep this sort of active set of 20 to 100 different like current hypotheses for what the what the text was that we're trying to decode. This ends up being kind of important because it helps us correct for the the sluggishness of the hemodynamic response function, which is one of the real challenges in doing this kind of decoding, right? So we're trying to pick out, you know, the language that somebody is hearing. Lots of words happen in language, right? Like words can happen pretty quickly. And with fMRI, like one snapshot brain image is summing across, like 20, 30 words, maybe if somebody's speaking at a pretty rapid clip. Stephen Wilson 1:21:10 Yeah. Alexander Huth 1:21:10 So doing this beam search, where we, we have multiple hypotheses. So that means the model can kind of the model can make a mistake, and then kind of go back and correct it, right? Because it's not locked into like one best guess. That ends up being really important for like, being able to correct for the fact that it has this slow sort of information that it can get something at first and then see later information that can make it sort of update what happened before. Stephen Wilson 1:21:41 Right. Can we, can we talk about the, sorry, I'm just trying to like, think about how to, to make all this clear? Can we talk about the sort of structure of the experiment, because I think it will really help to understand like, what the participants do, and then you know, what the task that you set yourself in terms of decoding their brains like, because then I think the mechanisms will then make more sense. Do you know what I mean? Alexander Huth 1:22:10 Yeah, absolutely. So the basic experiment that we do is just the same that we've been doing before we have people laying in the scanner and listen to podcasts, mostly the Moth, still. Stephen Wilson 1:22:21 You really should have them listen to the language neuroscience podcast. Alexander Huth 1:22:25 Oh, yeah, that'd be good. Stephen Wilson 1:22:25 I think it would work a lot better. But in this one, you've got them doing it for 16 hours, right? Alexander Huth 1:22:32 Yeah. Yeah. So… Stephen Wilson 1:22:33 That’s a lot of data. Alexander Huth 1:22:34 We’re making bigger datasets, which ends up being really important when we're looking at something high level, like semantics, right? So if we were just trying to build models of like phonetic representation, there's only like 40 phonemes in English, you know. You can hear them in many combinations, but like, you only need sort of so much variability in your data to kind of map out a phonetic representation. TIMIT is great for this right, the TIMIT corpus, it's like every phoneme in many different combinations. So you can get good stuff from TIMIT. This is what like Eddie Chang's lab does a lot of. But there's a lot more different kinds of semantics, right? There's a lot more different kinds of ideas that can be expressed there, you can think a lot of different kinds of thoughts, right? So, to kind of map this out in detail, you need to go much deeper, you need to go like much broader in terms of what you look at. So that's why we just keep adding more data. In this case, yeah, we had people come back more than a dozen times, over the course of months, we just keep scanning them over and over again, each time listening to like new stories, different stories, right? So we see like, different aspects of these ideas and how they're represented in these people's brains. And what we find is that, like, encoding, performance really relies on the amount of data, especially for these like semantic encoding models, like as you increase the amount of data, it just keeps getting better. The same for this decoding performance. So just keeps getting better as we add more data. But so, you know, we have this ton of data of people listening to stories, we can build our encoding models, that's well and good. You know, our initial tests of the decoder were basically just like, we had somebody, you know, listen to a story. That's our test story that we use for predicting brain responses. And we just tried to decode the words in that story instead of, you know, predict the brain responses to that story. And eventually, through like quite a bit of trial and error and figuring out what were the important aspects here, we got that to work pretty well. We were pretty excited by this. It was, uh, you know, it started spitting out you know, not like exactly the words in the story, but, like a pretty decent paraphrase of what the words in the story were. Stephen Wilson 1:22:34 Okay. So just to make it clear, you train them on like, 16 hours of data, build these models, and then you take like, let's say, 10 minutes of data. I don't know exactly how much it is, but something small. Alexander Huth 1:24:01 Exactly. Yeah. 10. Stephen Wilson 1:24:23 And you then basically feed the model the brain data from the person listening to this unseen story and you try and get the model to generate what the story was that the person heard. Right? Alexander Huth 1:25:05 Yes. Stephen Wilson 1:25:06 So in other words, the only way that's going to work is if you're, I know you hate to use the phrase, but like, if you're, you have to read their mind, because the model has no access to what story they were played. So, the only way the model is gonna know the story is by reading their mind. Alexander Huth 1:25:19 Is it reading their brain? Like, we don't know where the mind is? It's somewhere near to the brain. Definitely reading what's happening in the brain. Stephen Wilson 1:25:26 It's the brain. As you know. It's the same thing. Okay. So yeah, so it starts, you have some success with that. Alexander Huth 1:25:36 Yeah, yeah. So um, there was kind of a startling moment. I think this was during the pandemic. We're all like working at home. And Jerry showed me some results that were like, Oh, my God, this, this works. This is like giving us things that sounds like the story. It's actually pretty accurate at this point. This is very exciting to us. Right? So you know, we can now decode a story that somebody is hearing, right, which is kind of step one. Like that's, that's interesting by itself. But that's not even, really potentially that useful. So at that point, we went back and did some follow up experiments. So we took the same subjects that we've been scanning and… Stephen Wilson 1:26:13 Oh, hang on. Can we can we like, talk about that result from your paper? Can we kind of just share it with our listeners? Alexander Huth 1:26:19 Yeah. Absolutely. Stephen Wilson 1:26:20 We're talking figure one here, right? Alexander Huth 1:26:22 Yes. Stephen Wilson 1:26:22 Okay. So, you say it's not that interesting. I think it's very interesting. (Laughter) Okay, I'm gonna say the actual stimulus that the subject heard, and you're going to tell me the decoded stimulus that your model produced based on reading their mind, or whatever you want to call it. Okay. I got up from the air mattress and press my face against the glass of the bedroom window, expecting to see eyes staring back at me. But instead of finding only darkness, Alexander Huth 1:26:50 I just continued to walk up to the window and open the glass. I stood on my toes and peered out. I didn't see anything and looked up again. I saw nothing. Stephen Wilson 1:26:59 Wow. Okay, let's do some more. This is good. I didn't know whether to scream or cry or run away. Instead, I said, leave me alone. I don't need your help. Adam disappeared, and I cleaned up alone crying. Alexander Huth 1:27:13 started to scream and cry. And then she just said, I told you to leave me alone. You can't hurt me. I'm sorry. And then he stormed off. I thought he had left. I started to cry. Stephen Wilson 1:27:24 Let's do one more. That night, I went upstairs to what had been our bedroom and not knowing what else to do. I turned out the lights and lay down on the floor. Alexander Huth 1:27:34 We got back to my dorm room. I had no idea where my bed was. I just assumed I would sleep on it. But instead I lay down on the floor. Stephen Wilson 1:27:42 That's pretty amazing. You know… Alexander Huth 1:27:44 Can we do the last one? The last one the last one always used as demo. Stephen Wilson 1:27:47 Okay, last one. I don't have my driver's license yet. And I just jumped out right when I needed to. And she says, well, why don't you come back to my house and I'll give you a ride. I say okay. Alexander Huth 1:28:00 She's not ready. She has not even started to learn to drive yet. I had to push her out of the car. I said, we will take her home now. And she agreed. Stephen Wilson 1:28:07 It's incredible. Alexander Huth 1:28:08 Right? It actually works. We're getting this out of fMRI data. fMRI, which is like the worst of all neuroimaging methods like except for all the other ones. Stephen Wilson 1:28:16 Except it is the best. Yeah. Okay. Alexander Huth 1:28:17 It is awful in so many ways, and yet, we're getting out like, it's not word for word. In fact, the word error rate is god awful, right? Sorry, my dog is excited about something. In the word error rate is like 94%. Here, it's not getting the exact words for the most part. Stephen Wilson 1:28:35 Sure. Alexander Huth 1:28:35 It's getting the gist, right? It’s getting the paraphrase of what’s happening. Stephen Wilson 1:28:39 And you have some, you know, some kind of intuitive ways of quantifying how well it's doing that don't rely on it being word for word match. And it's, it's all quite intuitive and explained well on the paper. Alexander Huth 1:28:52 Yeah, yeah. So this was I mean, very exciting. When we saw this, you know, we could read out the story that somebody was hearing, and kind of the fact that it was a paraphrase was also interesting to us that it's like, you know, we're not getting it. It really seems like some low level representation or getting something high level, right? Stephen Wilson 1:29:11 And you wouldn't with fMRI, right? Like, I mean, like, maybe with Eddie Chang's data, you could read the phonemes and get it that way. Alexander Huth 1:29:18 Right. Which they do that beautifully. Stephen Wilson 1:29:20 You're never, you're never gonna be able to do that with fMRI. Alexander Huth 1:29:22 Yeah. Yeah. But the the ideas, right, like, what's the what's the thought behind the sense? That probably like changes slowly enough that we can see it. It's kind of isolated with fMRI. Right? The individual words, they're all mashed up. That's, that's a that's a mess. But, uh, you know, each idea kind of evolves over a few seconds and that's something that we have a hope of pulling out with fMRI. Stephen Wilson 1:29:47 Yeah. Okay, so very cool. And then you take it in a lot of other directions from there. Which one should we talk about? You Let's talk about Okay. Do you need the whole brain to do this? Or can you do this with just parts of the brain? Alexander Huth This was a fun analysis. And I think one of the more like, scientifically interesting kind of parts of this paper. So Jerry tried doing this decoding with, you know, so for the most part, we're just using voxels from all over the brain, like whatever seemed kind of useful for for doing this, we have some criteria that are detailed, but, you know, it's parts of prefrontal cortex, parts of parietal temporal cortex, speech network, whatever, it's everywhere in both hemispheres. But we asked, like, you know, what, if we only use prefrontal cortex, what if we only use like, temporal parietal association areas? What if we only use the speech network? What if we only use the right hemisphere? What if we only use the left hemisphere? Like how well does this work? So generally, what we see is that, you know, the performance is degraded, if we don't use all of the data that we have, which I think is not terribly surprising, you know, we're just getting kind of more signals if we use more parts of the brain, but it's degraded kind of uniformly. So it's not like we're getting one kind of thing out of each brain area, it kind of it seems pretty redundant. Like how how well we're able to decode from each brain area ends up being like very correlated, right hemisphere, left hemisphere, prefrontal cortex, temporoparietal cortex, all of these seem to be carrying like very similar information, at least in terms of like what we're decoding here. So that was exciting to us as like a, you know, as a scientific demonstration of like, kind of proof of redundancy, to some degree across these areas. But then also, in terms of kind of practical applications of this kind of technology. That means that, you know, even if somebody has, say damage to one of these regions, or if only some of these regions are accessible to some, you know, neuroimaging technology, because, you know, if somebody uses a decoder like this to do something, it's probably not gonna be fMRI, some practical purpose, then it doesn't need to see the whole brain it can only see like, some part of it and still be quite effective. Stephen Wilson Yeah, it's definitely got that sort of practical application. And I do think it reinforces what you mentioned earlier, like the the redundancy of these semantic representations. I mean, I would quibble, you know, one of the networks you look at, it's like a language, what you call a language network. And to me, it's like, I don't know, I don't remember exactly how you're localized it, but it seems a little low level, it's going it's getting like, you know, it looks like it's got like, you know STG rather than STS for instance. So. Yeah, I don't think it matters terribly much what the networks are, just maybe the main fact is that, like, you can take any small piece of the whole thing and get pretty good performance. Alexander Huth Yeah, yeah. Especially that it's like, you know, right hemisphere, too. So this, this is even, I think, more proof positive that right hemispheres, it's it's actually doing things Stephen Wilson Well. Thought. I mean, yeah. Yeah. I mean, again, like, I'm, like, let's do a thought experiment. Like if it was a person with aphasia, I bet you wouldn't get it in the right hemisphere, even if the right hemisphere was intact, right? Because it would never get there from the auditory story. But, more relevant to your actual future application, person with aphasia is thinking with their intact, right hemisphere and having thoughts that are fully formed and normal in their intact right hemisphere. And you can read those off and produce language from them, right? Yeah, but it wouldn't work on the it wouldn't work on the input side. But which is fine. Because it because well, I mean, maybe it's fine. But. Alexander Huth It'd be hard to train the model in that case. Stephen Wilson Well, would it though? I mean, that that raises the question of to what extent do you can you do this with cross training? Like, can you train it on somebody other than the individual that you're using it on? Alexander Huth The answer that is currently like, definitely, no, we can't do that. So we tested that as sort of one of the, you know, after we got these results, actually, can can we take a step back for we talked about this part? Yeah. So in addition to the, you know, to the listening to stories, we tested a couple other things. So we tested, we had subjects, go in the scanner and tell stories in their heads. So the way this actually worked is we had people memorize like one minute segments of stories, not not like memorize them word for word, but you know, practice retelling them in their own words, a bunch of times, we recorded them like telling these one minute segments of stories. And then we immediately stuck them in the scanner, and had them do exactly the same task where they were telling the stories just not out loud. So just do it in your head without saying the words out loud. We put that data into our decoder. And what we got was a little bit less good than what we had for the language perception but was still like, pretty clearly related to the content of the story. Right? It was it was still like a decent paraphrase of like the story that they were telling in their heads. This was kind of the the clincher This is like the you know, there's actually you know, thoughts happening here that a person is not expressing that we are reading out the brain activity related to those thoughts and turning that into words. Right. Stephen Wilson Yep, yeah. Alexander Huth This is the real deal. Stephen Wilson Now it's it's super cool, I guess for me like because I always interpret your, your work as mostly showing thought rather than language per se. It doesn't surprise me when you make that next step, right? Like, once I'd seen once I'd seen it read off the, like the one that we did kind of together out loud a few minutes ago, once I'd seen that I was like, Well, yeah, that'll work. That'll work for like, you know, production as well. And it did. Yeah, I can see how that's actually like, that's actually a pretty big step. Alexander Huth Yeah, yeah. So when we saw that, they actually got a little scary, we were like this, this is to the level of being like, it was makes us a little nervous like this. This is a bit freaky, what's happening here? So Jerry went off and did a ton of reading on kind of mental privacy and neuro ethics and what people are saying about this, like, what, what are the things that people are concerned about? What are kind of the categories of things that people care about in this area? And he designed a set of experiments to kind of test these privacy concerns? Right? So one of the first ones is what we started talking about a moment ago. Do you need the subject's cooperation, do you need data from the subject to train the model, right to to get the training data to, to form the model in the first place. So to test that, we just tried to like map models from one subject to another. And we tried a bunch of different approaches for this, we did anatomical mapping for one subject to another, we did functional cross mapping using some functional data from each subject. And it just didn't work. It just fundamentally didn't work. Even though like, you know, at a broad strokes, the you know, the semantic maps are pretty similar across people. Yeah, whatever this model is getting at the what's making it able to pull out this pretty accurate representation of what the language is. That's got to be kind of buried in the details. Right? Stephen Wilson Yeah. And I guess, yeah, we I think I forgot to ask you, and before but I know that in your 2016 paper, and the other one, too, like, one of the remarkable thing is was that the semantic maps were similar across participants in their actual specific layout of like, which concepts were represented were that you're saying like that, even though it's similar. It's not similar enough to work in this context, where there's like, several more layers of inferencing happening in between, right? Alexander Huth Yeah, that's right. So this was, you know, on the one hand is kind of like, bad for practical applications, because it means you know, you need a ton of training data for someone to get this to work. But kind of good for mental privacy, or this means that you can't just chuck someone in an MRI scanner and read what they're thinking. Right. Which is, which is good. I think I think that's a good thing. But that's not something that just exists in the world. We said tested some other kind of privacy. Questions. So Stephen Wilson I mean, I'm sorry. I mean, to be honest, like, I think it might only be a temporary delay, because I mean, conceptually, it's all there to read it to read across minds. I think. I'm not saying you need more data per se, because you've pretty much got the max data, but or maybe I don't know, but like, you know, the the pieces are in place that you should be able to train at across subjects. Yeah. I mean, it's probably it's probably on a technical limitations like. Alexander Huth I, I agree with that. Yeah, definitely. Maybe better anatomical MRI, some way of like, maybe areas from person to person Stephen Wilson Yeah, because you're just doing anatomical alignment, right. If you did some kind of functional based alignment, like. Alexander Huth We, we actually we tried functional alignment. And that didn't work. So at least, like at the course level that we were doing it that that didn't work. But certainly that is not to say that it cannot work. Right. And yeah, we see this actually an end of the paper that like, you know, this particular thing. This is probably a temporary, a temporary problem for this model, right? Or, you know, only whatever, you know what I'm saying? Yep. So yeah. Stephen Wilson So you you test. What do you call it? Resistance? Conscious resistance? Alexander Huth Yeah. Yeah, so we need the subjects cooperation, to run the decoder, right. So to train it, we know that like we kind of do, at least for now. But to actually like run the model to decode words from somebody's brain, like, do they need to be cooperating with us? So for this, we had people go in scanner listen to a story, but then we told them to do something else while they're listening to the story, right? Instead of trying to actively listen to the story. We had them try to like actively think something else, right. So we did like a mental math task like counting backwards by sevens. A task where we had people name as many animals as you can, or like try to tell a different story in your head. So one of these like imagined stories And then we tried to decode from that. And it turned out that, at least especially for the naming animals and telling a different story conditions, the decoding was was ruined. There's just like, no information there, no usable information was it. It was above chance. But it's like, there's, there's nothing that you can infer about that, like the words just kind of don't make sense, or they don't hang together as part of the story. And they don't look like the thing that the person is trying to do. either. It's like, it's not like we decode a bunch of animal names, while they're trying to name animals, it seems like these things are interfering in some way, that's just making it like not really get something out. The mental math task was not as effective, which I also found, kind of, you can introspect on this. Just try to do it in your, in your mind, like count backwards by sevens. And eventually, you get kind of good at it. And then you can start like listening to somebody talk while you're doing it. Right. And I think that's kind of what can happen here. I mean, Stephen Wilson I think counting back by seven through a whole conversation, you didn't even notice. Alexander Huth Goodness. But it was like the more semantic tasks, right, the like the animals that really did interfere. Stephen Wilson Yeah. Okay. So I like it that you kind of like went straight to like, wondering about the ethical implications of this. I mean, I think when you when publishing a paper like this, it's good to start talking about that from the get go. Alexander Huth Yeah, I thought that this was like, really all Jerry's doing, I gotta say. Very clever person. Stephen Wilson Yeah. Is he still in your lab or moved on? Alexander Huth Yeah, so he's, he's a computer science grad student, my lab, graduating in, I don't know about a year. This is really the bulk of his like thesis work here. And it's awesome. I'm very proud of him. Stephen Wilson Looking forward to see what he does next. Yeah, very cool. Okay, well, we talked for a long time. I took I guess it's your evening. Alexander Huth Yeah, not so bad and it's a fun conversation. Stephen Wilson Yeah, very much so. I really enjoyed, like, reviewing your work to talk to you and like hearing about it from you firsthand. So yeah, haven't like I said, I haven't made a podcast for a while, because I moved and things will be up in the air. But this was a really great one to get back to it with. Alexander Huth Awesome. Yeah. Thanks, again, so much for having me on. This was a lot of fun. And I love talking about this stuff. I like the that we could geek out about the little things too. I don't often get to talk to people about like, HRFs. Stephen Wilson Yeah, no, I think well, that's the thing that's kind of, you know, unique about my podcast is it's not really for a general audience, like most of the people are kind of doing science. And they know about these things. And they, and they're interested in them. And so it's more like a conversation you might have at a conference than a conversation you'd have you know, with a neighbor. Speaking of conferences, any chance you can make it to France this year for SNL. Alexander Huth I'm planning on it. I was invited to give a talk at a symposium. I don't know if that symposium is accepted yet. But, uh, but we'll see. Stephen Wilson I hope so. It'd be great to see you there. If you can make it. Alexander Huth Yeah, I'd love to see you there. You're gonna be there? Stephen Wilson Yeah, I'm the Program Chair. So I have to be there will I will I be like, you know, having any headspace to do anything other than worry about, you know, PC and Mac connections to the audiovisual equipment that I'm less sure about, but I will be. Alexander Huth Oh, God I hate it. So I, you know, been organizing some like workshops at a computer science conferences for the past couple years where we get sort of this intersection of neuroscience, computer science people. And it's like, it's the biggest bummer because it's the topics that I'm really excited about. And I really want to hear these people talk. But then it's like, your headspace is just wrong. When you're organizing. You're worried about so many other things. And it just like doesn't, doesn't work as well as just, you know, going and enjoying something. Stephen Wilson Yeah, I think it's going to be like that. But I'm still looking forward to going to France. Alexander Huth Yeah, it'll be nice. Stephen Wilson Well thank you very much, and I will hope to catch up with you soon. Alexander Huth Yeah, you bet Stephen. Thanks again. Stephen Wilson All right. Alexander Huth Have a good morning. Stephen Wilson Take care, bye. Alexander Huth You too.