Exploration vs Exploitation
A rat in a T-maze faces a choice: one arm contains cheese (reward), the other a shock (punishment). A cue in the central arm reveals which side has the reward.
Option 1: Go Directly (Exploit)
Gamble on 50/50 oddsβmight get cheese, might get shocked.
Option 2: Seek Information First (Explore)
Visit the cue first, learn where the reward is, then go there.
"This choice speaks to the classical exploration-exploitation dilemma: a dilemma resolved under Active Inference."
β Parr, Pezzulo & Friston (2022), Chapter 7, p.130
Information-Gathering Actions
The cue is any observation that resolves uncertainty before a consequential choice. It's an epistemic actionβtaken for information, not immediate reward.
- Checking a calendar before deciding which cafΓ©
- Reading reviews before purchasing
- Asking a question before committing
- Checking the weather before dressing
"If he does not know what day it is, he has to first select an action with epistemic value."
β Parr, Pezzulo & Friston (2022), Chapter 2, p.34
Building a POMDP
The task is formalised as a Partially Observable Markov Decision Process:
A-matrices (Likelihood)
Map hidden states to observations. What will I see?
B-matrices (Transitions)
How states change given actions.
C-vectors (Preferences)
Cheese: +6, Shock: β6.
D-vectors (Initial Beliefs)
Both contexts equally likely (50/50).
β Chapter 7, p.131-135
Policy Selection
Policies minimise expected free energy G(Ο):
Expected Free Energy
G(Ο) = βEpistemic β Pragmatic
Epistemic Value
How much will this reduce uncertainty?
Pragmatic Value
How likely to bring preferred outcomes?
"We do not need to balance exploration and exploitation. Both serve the same function."
β Chapter 7, p.131
Controlling the Balance
Precision (Ξ³) controls how deterministic policy selection is:
Policy Probability
P(Ο) = Ο(βΞ³ Β· G(Ο))
Low Precision (Ξ³ β 0)
More random/exploratory.
High Precision (Ξ³ β β)
More deterministic.
Try the Precision slider!
β Chapter 4, p.72
Active Inference in Action
Step 1: Go to Cue (Epistemic)
Visit the informative cue first.
Step 2: Go to Reward (Pragmatic)
After seeing cue, go to correct arm.
"The rat chooses to sample the informative cueβthe location with greatest epistemic value."
Press "AUTO" to watch!
β Chapter 7, p.135