In these models, we treat fast and slow responses categorically (

In these models, we treat fast and slow responses categorically (as in a two-armed bandit task) and predict their probability of occurrence with a standard softmax choice function, with parameters optimized by maximum likelihood (as opposed to the standard model, which minimizes squared error between predicted and actual RT). We consider models in which reward structure of these categorical responses is acquired via either Bayesian integration or reinforcement learning (Q-learning). To summarize, then, model fits provide subject-specific, trial-by-trial estimates of reward prediction error (δ+, δ−), the mean expected values about

the likelihood of a positive prediction error for fast and slow responses (μslow, μfast), and the uncertainties about these estimates (σslow, Crizotinib supplier σfast). The model also provides estimates of individual participant’s reliance

on relative uncertainty to explore (ε). We used these estimates to analyze our fMRI data and provide an explicit test of the hypothesis that RLPFC tracks relative uncertainty to strategically guide exploration (see Supplemental Analysis and Figure S1 for the analysis of reward prediction error). Across conditions (Figure 1), participants reliably adjusted RTs in the direction indicative of learning (Figure 3A). During the second half of each learning block, RTs in the decreasing expected value (DEV) condition were significantly EPZ6438 faster than in constant expected value [CEV; F(1,14) = 13.95, p < 0.005]. Likewise, RTs in the increasing expected value (IEV) condition were significantly slower than in CEV [F(1,14) = 5.6, p < 0.05] during the second half of each learning block. Within

each condition, participants reliably sped up from the first to second half of trials in DEV [F(1,14) = 8.2, p < 0.05] and slowed down in IEV [F(1,14) = 5.1, p < 0.05]. There were no reliable differences in RT from first to second half of trials in CEV or constant expected value-reversed isothipendyl conditions (CEVR; p values > 0.5). These incremental RT adaptations over the course of learning were well captured by the mathematical model (Figure 3B). As in prior studies, these adaptations were observed in the average learning curve within and across individuals. In contrast, trial-by-trial changes in RT were not incremental but were characterized by large “RT swings” (Frank et al., 2009). The model captured some of the variance in these swings by assuming that they reflect exploratory RT adjustments in the direction of greater uncertainty about the reward statistics (Figure 3C). Across subjects, the r-values reflecting the correlation between the direction of RT swing from one trial to the next and the model’s estimate of relative uncertainty were reliably greater than zero (t = 3.9; p < 0.05).

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>