Results Optimal training difficulty for binary classification tasks.Ina standard binary classification task, an animal or machine ‘agent’ makes binary decisions about simple stimuli. For example, in the classic Random Dot Motion paradigm from Psychology and Neuroscience15,16, stimuli consist of a patch of moving dots— most moving randomly but a small fraction moving coherently either to the left or the right—and participants must decide in which direction the coherent dots are moving. A major factor in determining the difficulty of this perceptual decision is the frac- tion of coherently moving dots, which can be manipulated by the experimenter to achieve a fixed error rate during training using a procedure known as ‘staircasing’17. We assume that agents make their decision on the basis of a scalar, subjective decision variable, h, which is computed from a stimulus that can be represented as a vector x(e.g., the direction of motion of all dots) h¼Φðx;ϕÞð1Þ where Φ(⋅) is a function of the stimulus and (tunable) parameters ϕ. We assume that this transformation of stimulus xinto the subjective decision variable hyields a noisy representation of the true decision variable, Δ(e.g., the fraction of dots moving left). That is, we write h¼Δþnð2Þ where the noise, n, arises due to the imperfect representation of the decision variable. We further assume that this noise, n,is random and sampled from a zero-mean Gaussian distribution with standard deviation σ(Fig. 1a). If the decision boundary is set to 0, such that the model chooses option A when h> 0, option B when h< 0 and randomly when h=0, then the noise in the representation of the decision variable leads to errors with probability ER ¼Z0 1 pðhjΔ;σÞdh ¼Fð Δ=σÞ¼Fð βΔÞð3Þ where F(x) is the cumulative density function of the standardized noise distribution, p(x)=p(x|0, 1), and β=1/σquantifies the precision of the representation of Δand the agent’s skill at the task. As shown in Fig. 1b, this error rate decreases as the decision gets easier (Δincreases) and as the agent becomes more accomplished at the task (βincreases). The goal of learning is to tune the parameters ϕsuch that the subjective decision variable, h, is a better reflection of the true decision variable, Δ. That is, the model should aim to adjust the parameters ϕso as to decrease the magnitude of the noise σor, equivalently, increase the precision β. One way to achieve this tuning is to adjust the parameters using gradient descent on the error rate, i.e. changing the parameters over time taccording to dϕ dt ¼ η∇ϕER ð4Þ where ηis the learning rate and ∇ϕER is the derivative of the error rate with respect to parameters ϕ. This gradient can be