Lecture 2 — Modeling Individual Decision Making (Van Maanen)

Paper: Palada, H., Neal, A., Vuckovic, A., Russell, M., Samuels, K. & Heathcote, A. (2016). Evidence accumulation in a complex task: Making choices about concurrent multiattribute stimuli under time pressure. J. Experimental Psychology: Applied, 22(1), 1-23.

Type: Methodological. First of the three "methodological" lectures, which move up a level of description (individual decision-making → autonomy/interaction → collective patterns).

Must-know core → Minimum to Pass

Newell time scales (this = Cognitive) · 3 integration levels (anecdotal/computational/algorithmic) · Linear Ballistic Accumulator (LBA) 4 parameters (start, drift, threshold, \(t_0\); fastest accumulator wins) · difficulty → drift, instruction → threshold · Palada 2016: selective influence (difficulty → drift rate, workload → threshold).

Big-picture framing

Van Maanen's pitch: cognitive psychology gives you formal models of how people actually decide. Most AI is built without those models — it just optimises objective functions and assumes users behave rationally. If you want AI that interacts well with humans (think: Netflix recommendations, lane-keep assist, training tools, alerts to air-traffic controllers), you need a model of the user's cognition.

The lecture's exemplar paper (Palada et al. 2016) shows this isn't an academic curio: the same model that explains why people make speed-accuracy trade-offs in laboratory dot-motion tasks also accounts for how operators behave in a simulated unmanned aerial vehicle (UAV) surveillance task under workload.

Key concept 1 — Newell & Simon and the time scales of human action

Herbert Simon (Nobel laureate 1978, "decision-making process within economic organizations") and Allen Newell are presented as the founders linking cognitive psych to AI (both attended the 1956 Dartmouth workshop). Simon → Administrative Behavior; Newell → Unified Theories of Cognition.

Newell's time scales of human action is the framework that organises the methodological lectures. Four bands:

Band	Time	Examples	Lecture
Social	days–months (10⁵–10⁷ s)	Political opinion shifts, group consensus	L4 (Klein)
Rational	minutes–hours (10²–10⁴ s)	Tasks, reasoning, causal inference	L3 (Hortensius)
Cognitive	100 ms–10 s (10⁻¹–10¹ s)	Perception, memory recall, attention, simple choices	L2 (this lecture)
Biological	μs–ms (10⁻⁴–10⁻² s)	Neurons, neural circuits	not covered

L2→L4 = "progressively higher level of description." L5–7 = application-oriented.

Key concept 2 — Three levels of integration of cognitive theory

(From Liefooghe & Van Maanen, Frontiers in AI, 2023)

When you build interactive AI, you can integrate cognitive science at three levels of formality:

Anecdotal — general intuitions about cognition. E.g. "spacing effect exists, so let's repeat flashcards at intervals."
Computational — hypothesised computations individuals perform. Specifies inputs/outputs of cognition without committing to mechanism.
Algorithmic — hypothesised cognitive processes. Specifies a mechanism (e.g. a memory equation, an evidence-accumulation rule). This is the most demanding but yields the best predictions and lets you explain individual differences.

The slide showing performance increase across levels is the most exam-likely figure: anecdotal-level flashcard apps gain ~6.7 on a French test; algorithmic-level apps (ACT-R-based "smart fact learning") gain ~7.5. The algorithmic level also lets you (i) explain individual decay rates and (ii) predict success on a first test.

This is a core methodological argument of the course: better cognitive models → better AI.

Key concept 3 — Two worked examples

A. Smart fact learning (memory)

Anecdotal level: Leitner (1972) flashcards — three decks; correct cards move forward, incorrect move back. Implements practice, testing and spacing effects without any equation.
Algorithmic level: an Adaptive Control of Thought-Rational (ACT-R) memory model (Anderson & Schooler, 1991, Rational Analysis of Memory). Base-level activation is \( B_i = \ln\!\left(\sum_{j=1}^{n} t_j^{-d}\right) \): a function of frequency (the number of prior uses \(n\)), recency (time since each use \(t_j\)) and decay (\(d\)). The "need probability" is how likely the item is to be required in the near future; if \(p(\text{Activation}) \times \text{Gain} < \text{Cost}\), forget it. Apps using this model (Van Rijn, Van Maanen, Van Woudenberg, ICCM 2009; Sense et al., Front. Educ. 2018) predict individual recall latencies and outperform anecdotal apps.

B. Decision-making (perception → choice)

The Linear Ballistic Accumulator (LBA) model:

Each response option has its own accumulator that races toward a shared threshold \(b\).
Each accumulator has:
Start point, sampled from a uniform distribution \(U[0, A]\)
Drift rate, sampled from a normal distribution \(N(v, s)\): the rate at which evidence builds up (explained below)
Threshold \(b\)
Non-decision time \(t_0\) (stimulus encoding plus response production, i.e. everything outside the decision itself)
Fastest accumulator wins: the first to reach \(b\) is the chosen option. Total response time = decision time + \(t_0\).

What the drift rate actually is: it is the slope of an accumulator's straight-line rise, i.e. how fast and how reliably evidence for that option builds up. It indexes the quality of the evidence (the stimulus's signal-to-noise ratio). A high drift rate means a clear stimulus, so evidence accumulates quickly and the decision is fast and accurate; a low drift rate means an ambiguous or degraded stimulus (for example a target hidden behind thick cloud), so accumulation is slow and error-prone. The drift rate is therefore the model's measure of discriminability / task difficulty, distinct from the threshold, which is the amount of evidence the person chooses to require.

Behaviourally calibrated: in a simple two-alternative forced-choice (2AFC) random-dot-motion task with speed-versus-accuracy instructions: - Drift rate is influenced by task difficulty (the similarity between target and distractor). - Threshold is influenced by instruction (speed versus accuracy).

The LBA reproduces both correct and error response-time (RT) distributions (not just the means), which is why it beats simpler regression-style models.

Speed–accuracy trade-off explained

"Focus on speed" → participants lower their threshold → decide faster but on less-accumulated (noisier) evidence → more errors. This is a mechanistic explanation for an effect that, at the anecdotal level, is just "trade-off exists."

Body-temperature experiment (Van Maanen et al., Sci Rep 2019/2021)

Manipulated core body temperature via hot-tub immersion.
Hot condition → people misperceive deadlines as closer than they are ("perceived deadline" < "true deadline").
LBA fit: warm subjects show lower choice thresholds at the end of immersion — they sacrifice accuracy because subjectively time feels short.
Generalises to: soldiers, construction workers, anyone making decisions under heat stress.

SWOT of cognitive-model integration + a named failure case (on the slides)

The slides frame algorithmic-level integration with a SWOT: Strengths — better model of human cognition, more user autonomy; Weakness — the model may be mis-specified; Opportunity — a more precise user model; Threats — over-specification, less autonomy.

The concrete cautionary example is iBorderCtrl — an EU-funded border "lie detector" that tried to infer deception from facial micro-expressions on a ~50 ms timescale. With huge individual variability and contested validity, it's the slides' worked example of what goes wrong when you push an algorithmic cognitive model into a high-stakes deployment. Strong essay fodder for "risks of algorithmic-level integration."

Paper 2 — Palada et al. (2016): Evidence accumulation under workload

Task

A complex, applied decision-making task: a simulated unmanned aerial vehicle (UAV) surveillance display in which subjects judge whether ship stimuli are targets or non-targets. The ships are concurrent multiattribute stimuli, partly obscured by cloud. (Air traffic control is an analogous applied domain often used to motivate the work, but the actual task was UAV surveillance.)

Two manipulations

Classification difficulty, via cloud opacity (three levels of cloud obscuring the ships): harder discriminations.
Time pressure / workload, via the number of ships present at once (four levels): more ships means each decision must be made faster.

Findings (paper Figs 5, 9 and 10; shown in the lecture)

As workload rises, mean correct response time (RT) falls markedly (over 0.5 s faster at very-high versus low workload), while the error rate rises only slightly (about 3%).
The LBA (and the diffusion model) were fit to the data. The best-fitting models, chosen by the Akaike Information Criterion (AIC), show clean selective influence, each factor driving exactly one parameter:
- Difficulty → drift rate only: cloud-obscured stimuli lower the rate of evidence accumulation (target error rates rose markedly, by about 19%).
- Workload → threshold only: under time pressure people lower the threshold to respond faster; they do not raise the accumulation rate (no "super-capacity" effect).
- This is the headline result and it corrects the loose shorthand "workload lowers both", the two factors map to two different parameters.
Fitting the model per participant classifies people as "good / medium / poor decision makers" along three axes: cautiousness (threshold), processing efficiency (drift rate) and execution time (non-decision time \(t_0\)). (The three-axis classification figure shown in the lecture is from Katsimpokis et al., 2020, not from Palada.)

Why this matters for an open society (Van Maanen's framing)

The LBA gives you user models for any speeded, high-stakes, repeated decision-making task: - Air traffic control - Surveillance / threat assessment (→ links to L5 Van der Vegt) - Factory floor / line work - Future of Work platform (IOS pillar B): selection, training, monitoring

Two specific IOS-relevant uses called out in the lecture: - Future of Work platform (Transitions & Wellbeing) — modelling individual workers helps allocate, train and protect them. - In/Equality platform (Equity & Diversity) — algorithmic decisions in hiring/finance can be re-examined when you know what human deciders' biases and trade-offs actually look like.

Likely essay-question angles

"Describe the LBA model. How can a cognitive model of decision-making contribute to AI for an open society? Use Palada et al.'s UAV-surveillance study or another example."
"Distinguish the anecdotal, computational and algorithmic levels of integration. Why might algorithmic-level integration be preferable for AI training/monitoring systems? What are the costs?"
"Newell's time scales of human action structure the methodological half of this course. Where does the LBA sit, and how does this compare to the level of analysis in Lectures 3 and 4?"

Quick self-test

What does the LBA assume about how a decision is made? List its four parameters.
Which LBA parameter changes with task difficulty? Which with instruction? Why does that distinction matter empirically?
State the three levels of integration of cognitive theory. Give one example of each in the domain of fact learning.
In Palada et al., which model parameter does workload change, and which does difficulty change (the selective-influence result)? What's the practical implication for AI alerting / user-interface (UI) design at a UAV surveillance station?
How would you map the LBA / individual-differences finding (cautiousness × efficiency × execution time) onto one of the IOS platforms? Pick a concrete platform and argue.

Source slides

Open AIOS_lecture2_Cognition.pdf in new tab ↗