Pavlovian sign- and goal-tracking in humans reflects model-free and model-based learning

Individuals differ in how they learn from experience. Using computational modelling, eye-tracking and fMRI, we find a double dissociation between those who learn summary values through dopaminergic reward prediction errors, and those who learn a generative model through state prediction errors.
Published in Social Sciences
Pavlovian sign- and goal-tracking in humans reflects model-free and model-based learning
Like

Share this post

Choose a social network to share with, or copy the URL to share elsewhere

This is a representation of how your post may appear on social media. The actual post will vary between social networks

Dopamine neurons in the midbrain encode a reward prediction error signal (Schultz, Dayan, & Montague, 1997) that can be used to learn. Over the course of learning, cues predictive of reward become ‘wanted’, i.e., the dopaminergic signals turn them into motivationally relevant stimuli (Berridge & Robinson, 1998, 2003). However, in a 2011 study in rats, Flagel et al., reported that only half of their subjects, called “sign-trackers” relied on these dopaminergic signals for learning, whereas the other half, called “goal-trackers”, did not straightforwardly rely on dopamine for learning, and did not come to ‘want’ cues predictive of reward. This provided a key demonstration that Pavlovian conditioning is only dependent on dopamine and dopamine reports a prediction-error signal only in sign-trackers. In addition, it showed that a different, non-dopaminergic, learning mechanism must exist in goal-trackers. Because it seems sign- and goal-tracking can be genetically determined, and because sign-tracking was a phenotype with risk for addictive behaviours, it had the potential to link genetic mechanisms to addiction via learning processes.

One possibility we had suggested before was that goal-trackers may rely on model-based reinforcement learning algorithms (Berridge & Dayan, 2014; Huys et al., 2014), which involve active anticipations of future outcomes. If that was true, then learning in goal-trackers should be accompanied by signals necessary to learn a model. In a simple Pavlovian conditioning paradigm, this would likely involve prediction errors of state rather than reward (Gläscher et al., 2010). However, there was no direct evidence at the time, and indeed it was not known whether the distinction between sign- and goal-tracker subjects existed in humans. It was also unclear how one could simultaneously identify sign-tracking in humans and measure the underlying neural correlates because sign-tracking required behavioural approach. Like a number of other groups at the time, we thought that it might be possible to use eye-tracking to identify sign- and goal-trackers. However, thinking in more detail about the required design rapidly suggested that the numbers would have to be relatively large: the experimental design would make it necessary to fix the timing between conditioned stimulus and unconditioned stimulus, and this would likely reduce any MRI signal.

An opportunity then arose to examine this in the context of a study on the contribution of learning processes in alcohol dependence (LeAD). The aim was to test reward-learning processes in 200 healthy 18 year old male subjects. Although risky, we decided this was a rare opportunity and added eye-tracking in the fMRI scanner during a Pavlovian conditioning session.

We first focused on replicating the sign-/goal-tracker dichotomy in humans. We found that, as expected from the animal results, reward prediction correlates were only present amongst putative sign-trackers – individuals with an approach-like gaze response. In addition, we examined the learning processes in the goal-trackers by looking at state-prediction-error signals and indeed found these to be more prominent in goal-trackers. Computational modeling of gaze and pupil size as well as behavioural analyses further supported this neural double dissociation. The results thus suggest that as in animals (Flagel et al., 2011) model-free reward-prediction errors drive learning in sign-trackers, but additionally that goal-trackers instead rely on model-based reinforcement learning.

Drugs of addiction are known to act via the dopamine system (Dayan, 2009). Given that goal-trackers don’t rely on dopamine for Pavlovian learning, they should have a reduced risk of developing addiction. Indeed, this prediction is supported by data in rats (Saunders & Robinson, 2013) and suggests that sign-tracking should be a risk factor for the development of addiction.

Daniel J. Schad & Quentin J. M. Huys

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in