Martin Schrimpf |

Hello

Posted on 17 November 201627 August 2024 by Martin Schrimpf

I am a tenure-track assistant professor at the EPFL Neuro-X institute where I head the NeuroAI Lab; with appointments at the School of Life Sciences, and the School of Computer and Communication Sciences.
My research focuses on a computational understanding of the neural mechanisms underlying natural intelligence in vision and language. To achieve this goal, I bridge Deep Learning, Neuroscience, and Cognitive Science, building artificial neural network models that match the brain’s neural representations in their internal processing and are aligned to human behavior in their outputs.
I completed my PhD at the MIT Brain and Cognitive Sciences department with Jim DiCarlo, following Bachelor’s and Master’s degrees in computer science at TUM, LMU, and UNA. Previous work includes research in human-like vision at Harvard, natural language processing + reinforcement learning at Salesforce, as well as other projects in industry. I love translating discoveries into applications and have founded — and worked in — several startups. My work has been recognized in the news at Science magazine, MIT News, and Scientific American.

The LLM Language Network

Posted on 6 January 2025 by Martin Schrimpf

Can neuroscience localizers uncover brain-like functional specializations in LLMs? Yes! We analyzed 18 LLMs and found units mirroring the brain’s language, theory of mind, and multiple demand networks!

[preprint] [Social Media]

Scaling Laws for Task-Optimized Models of the Primate Visual Ventral Stream

Posted on 6 January 2025 by Martin Schrimpf

For a while, gains in AI have translated into gains in modeling the brain. We test if that will continue to be the case with recent advances in scaling. Surprisingly we get mixed results: while increased scale advances model alignment to behavior, neural alignment saturates.

[preprint] [social media]

Topographic model of language processing in the brain

Posted on 28 October 2024 by Martin Schrimpf

Functional responses in the brain to linguistic inputs are spatially organized — but why? We show that a simple smoothness loss added to language model training explains a range of topographic phenomena in neuroscience: arxiv.org/abs/2410.11516, Twitter Thread.

New paper: a simple untrained model of language in the brain

Posted on 25 June 20245 July 2024 by Martin Schrimpf

In 2021 we were surprised to find that untrained language models are already decent predictors of activity in the human language system (http://doi.org/10.1073/pnas.2105646118). Badr Alkhamissi in the lab figured out the core components underlying the alignment of untrained models: tokenization and aggregation. With these findings, we built a simple untrained network “SUMA” with state-of-the-art alignment to brain and behavioral data — this feature encoder provides representations that are then useful for efficient language modeling. Directly mapping our model onto the brain, these results characterize the human language system as a generic feature encoder that aggregates incoming sensory representations for downstream use. If you disagree we hope you consider breaking our model (soon on Brain-Score/Github).

See here for social media posts:
https://x.com/bkhmsi/status/1805595986510717136
https://x.com/martin_schrimpf/status/1805599047098470793
https://x.com/GretaTuckute/status/1805676221189308491
https://x.com/ABosselut/status/1805600725537370119

5 abstracts accepted to CCN

Posted on 13 May 20249 July 2024 by Martin Schrimpf

My group will present 5 abstracts at the Cognitive Computational Neuroscience conference at MIT in Boston this fall! The projects cover new models of vision and language, new ways to evaluate these models on their brain alignment, and ideas to make use of the best models.

Current DNNs are Unable to Integrate Visual Information Across Object Discontinuities

Topographic Deep ANN Models Predict the Perceptual Effects of Direct IT Cortical Interventions

A Simple Untrained Recurrent Attention Architecture Aligns to the Human Language Network

Inferotemporal Cortex Underlies Primate Generalization Capabilities and Brain-Aligned Models Generalize Better

Scaling Laws for Task-Optimized Models of the Primate Visual Ventral Stream

2024 Brain-Score Benchmarking Competition

Posted on 26 March 20246 January 2025 by Martin Schrimpf

Announcing the 2024 Brain-Score Benchmarking Competition! This year — we have turned the table! We invite experimentalists and the community at large to expose the explanatory gaps between current models of primate vision and the biological brain.

brain-score.org/competition [social media]

Driving and Suppressing the Human Language System

Posted on 18 April 202324 August 2023 by Martin Schrimpf

We previously found GPT (2) to be a strong model of the human language system (pnas.org/doi/10.1073/pn).

In a new paper lead by Greta Tuckute, we push on this further and test how well model-selected sentences can modulate neural activity. Turns out you can almost double/completely suppress relative to baseline.

A subtle finding in this work that I find really interesting:

Under reasonable assumptions of inter-subject noise, prediction accuracy of neural activity is ~70% as good as it could possibly be. So even with these edge-case stimuli, gpt2-xl accounts for over 2/3 of the variance in the human language system.

Faculty position at EPFL

Posted on 23 September 20223 April 2023 by Martin Schrimpf

Very excited to finally make this public: I will be joining EPFL in the summer of 2023 to establish a research group focused on brain-like models of vision and language, and potential applications of these models. My appointment has been confirmed by the ETH Rat today.

The focus of my group will be:

Building better (= more behaviorally and neurally aligned) models of vision and language. To build these models, we will use neural recordings from non-human primates and humans as well as human behavioral benchmarks, from the Brain-Score platform [perspective paper, technical paper, CORnet model, VOneNet model, language paper].
Integrating multimodal representations: I think there is a lot of power in shared invariant representations; creating models that go all the way from pixel input to downstream language representations would allow us to potentially harness shared representations between these two domains, and test if we can better map such a model onto brain hierarchy.
Towards clinical translation: in my mind, one of the end goals of these brain-modeling efforts is to apply them and improve people’s lives. This could involve helping blind patients (preliminary work), or people with dyslexia.

I will be hiring at all levels, please get in touch if these topics sound interesting to you.

Wiring Up Vision paper Spotlight at ICLR 2022

Posted on 22 April 202223 April 2022 by Martin Schrimpf

Our paper on reducing the number of supervised synaptic updates in computational models of vision was accepted to ICLR as a Spotlight! https://openreview.net/forum?id=g1SzIRLQXMM

The paper improved quite a bit since the preprint I think, we especially made a stronger connection to Machine Learning by showing that our proposed techniques outperform other approaches to drastically reduce the number of parameters. We retain over 40% ImageNet top-1 performance with only ~3% of parameters relative to a fully-trained network.

2022 Brain-Score Competition

Posted on 21 December 202121 December 2021 by Martin Schrimpf

We are excited to announce that submissions to the 2022 Brain-Score competition are open until February 15, 2022!

The first edition of the Brain-Score Competition proposes to evaluate computational models of primate object recognition in over 30 neuronal and behavioral benchmarks and will award $6,000 to the best submissions over three tracks: overall Brain-Score, V1, and object recognition behavior. In addition, selected participants will be invited to present their work in a Cosyne workshop which will feature some of the leading experts in vision neuroscience and computer vision.

For more information, please visit the competition website, follow Brain-Score on twitter, and join our Slack workspace! Good luck!

ThreeDWorld

Posted on 7 December 20217 December 2021 by Martin Schrimpf

Our virtual ThreeDWorld is now public: www.threedworld.org

We provide a fully controllable virtual world striving to be ~photorealistic, based on the Unity engine. ThreeDWorld provides visual and audio rendering with physically realistic behavior that users can interact with through an extensive python API. Check out the code here: github.com/threedworld-mit/tdw

MIT News also wrote a great article summarizing the platform: news.mit.edu

Connecting artificial and biological language processing published at PNAS

Posted on 1 December 20213 December 2021 by Martin Schrimpf

Our work modeling the human language system with neural network language models is published in PNAS! https://www.pnas.org/content/118/45/e2105646118

The article received widespread press coverage, e.g. by MIT News, Axios, and Scientific American (Press).

McGovern fellowship and open science prize

Posted on 20 November 20217 December 2021 by Martin Schrimpf

I was awarded a Friends of the McGovern fellowship,

and won an Open Science Prize by the Neuro – Irv and Helga Cooper Foundation for my work on Brain-Score.

Teaching Award

Posted on 3 June 2021 by Martin Schrimpf

I was awarded the Walle Nauta Award for Continuing Dedication in Teaching for the Systems 2 class (Neural Mechanisms of Cognitive Computations) that Mike Halassa and I have been teaching for the past 3 years.

Artificial Neural Networks Accurately Predict Language Processing in the Brain

Posted on 27 June 202029 June 2020 by Martin Schrimpf

Computational neuroscience has lately had great success at modeling perception with ANNs – but it has been unclear if this approach translates to higher cognitive systems. We made some exciting progress in modeling human language processing https://www.biorxiv.org/content/10.1101/2020.06.26.174482v1.
This work is the result of a terrific collaboration with Idan A. Blank, Greta Tuckute, Carina Kauf, Eghbal A. Hosseini, Nancy Kanwisher, Josh Tenenbaum and Ev Fedorenko.

Work by Ev Fedorenko and others has localized the language network as a set of regions that support high-level language processing (e.g. https://www.sciencedirect.com/science/article/pii/S136466131300288X) BUT the actual mechanisms underlying human language processing have remained unknown.

To evaluate model candidates of mechanisms, we use previously published human recordings: fMRI activations to short passages (Pereira et al., 2018), ECoG recordings to single words in diverse sentences (Fedorenko et al., 2016), fMRI to story fragments (Blank et al. 2014). More specifically, we present the same stimuli to models that were presented to humans and “record” model activations. We then compute a correlation score of how well the model recordings can predict human recordings with a regression fit on a subset of the stimuli.
Since we also want to figure out how close model predictions are to the internal reliability of the data, we extrapolate a ceiling of how well an “infinite number of subjects” could predict individual subjects in the data. Scores are normalized by this estimated ceiling.

So how well do models actually predict our recordings? We tested 43 diverse language models, incl. embedding, recurrent, and transformer models. Specific models (GPT2-xl) predict some of the data near perfectly, and consistently across datasets. Embeddings like GloVe do not.
The scores of models are further predicted by the task performance of models to predict the next word on the WikiText-2 language modeling dataset (evaluated as perplexity, lower is better) – but NOT by task performance on any of the GLUE benchmarks.
Since we only care about neurons because they support interesting behaviors, we tested how well models predict human reading times: specific models again do well and their success correlates with 1) their neural scores, and 2) their performance on the next-word prediction task.
We also explored the relative contributions to brain predictivity of two different aspects of model design: network architecture and training experience, ~akin to evolutionary and learning-based optimization. (see also this recent work). Intrinsic architectural properties (like size and directionality) in some models already yield representational spaces that – without any training – reliably predict brain activity. These untrained scores predict scores after training. While deep learning is mostly focused on the learning part, architecture alone works surprisingly well even on the next-word prediction task. Critically for the brain datasets, a random embedding with the same number of features as GPT2-xl does not yield reliable predictions.

Summary: 1) specific models accurately predict human language data; 2) their neural predictivity is correlated with task performance to predict the next word, 3) and with their ability to predict human reading times; 4) architecture alone already yields reasonable scores. These results suggest that predicting future inputs may shape human language processing, and they enable using ANNs as embodied hypotheses of brain mechanisms. To fuel future generations of neurally plausible models, we will soon release all our code and data.

Wiring Up Vision: Minimizing Supervised Synaptic Updates Needed to Produce a Primate Ventral Stream

Posted on 16 June 2020 by Martin Schrimpf

Certain ANNs are surprisingly good models of primate vision, but require millions of supervised synaptic updates — this unbiological development has been the recent focus of many discussions in neuroscience. Is all this training really necessary? We approach this in new work https://www.biorxiv.org/content/10.1101/2020.06.08.140111v1.

Neuroscientists have argued for innate structure with only thin learning on top, i.e. where structure the genome dictates brain connectivity and is leveraged for rapid experience-dependent development. We took first steps at this with more brain-like neural networks.

We started from CORnet-S, the current top model on neural and behavioral benchmarks in Brain-Score.org. We first found that variants of this model which are trained for only 2% of supervised updates (epochs x images) already achieve 80% of the trained model’s score.

Even without any updates, the models’ brain predictivities are well above chance. Examining this “at-birth” synaptic connectivity and improving it with a new method “Weight Compression”, we can reach 54% without any training at all

However, to be more brain-like we require at least some training — but ideally this would not change millions of synapses requiring precise machinery to coordinate the updates. By training only critical down-sampling layers, we achieve 80% when updating only 5% of synapses.

Applying these three strategies in combination (reducing supervised epochs x images + improved at-birth connectivity + reducing synaptic updates), we achieve ~80% of a fully trained model’s brain predictivity with two orders of magnitude fewer supervised synaptic updates.

Taking a step back, we think these are first steps to model not just primate adult visual processing during inference, but also how the system is wired up from an evolutionary birth state encoded in the genome and by developmental update rules. Lots more work to do!

1 2