TY - JOUR
T1 - Intrinsically motivated action-outcome learning and goal-based action recall
T2 - A system-level bio-constrained computational model
AU - Baldassarre, Gianluca
AU - Mannella, Francesco
AU - Fiore, Vincenzo G.
AU - Redgrave, Peter
AU - Gurney, Kevin
AU - Mirolli, Marco
N1 - Funding Information:
This research has received funds from the 7th Framework Programme of the European Community ( FP7/2007–2013 ), Challenge 2—Cognitive Systems, Interaction, Robotics , Grant Agreement No. ICT-IP-231722 , Project IM-CLeVeR—Intrinsically Motivated Cumulative Learning Versatile Robots .
PY - 2013/5
Y1 - 2013/5
N2 - Reinforcement (trial-and-error) learning in animals is driven by a multitude of processes. Most animals have evolved several sophisticated systems of 'extrinsic motivations' (EMs) that guide them to acquire behaviours allowing them to maintain their bodies, defend against threat, and reproduce. Animals have also evolved various systems of 'intrinsic motivations' (IMs) that allow them to acquire actions in the absence of extrinsic rewards. These actions are used later to pursue such rewards when they become available. Intrinsic motivations have been studied in Psychology for many decades and their biological substrates are now being elucidated by neuroscientists. In the last two decades, investigators in computational modelling, robotics and machine learning have proposed various mechanisms that capture certain aspects of IMs. However, we still lack models of IMs that attempt to integrate all key aspects of intrinsically motivated learning and behaviour while taking into account the relevant neurobiological constraints. This paper proposes a bio-constrained system-level model that contributes a major step towards this integration. The model focusses on three processes related to IMs and on the neural mechanisms underlying them: (a) the acquisition of action-outcome associations (internal models of the agent-environment interaction) driven by phasic dopamine signals caused by sudden, unexpected changes in the environment; (b) the transient focussing of visual gaze and actions on salient portions of the environment; (c) the subsequent recall of actions to pursue extrinsic rewards based on goal-directed reactivation of the representations of their outcomes. The tests of the model, including a series of selective lesions, show how the focussing processes lead to a faster learning of action-outcome associations, and how these associations can be recruited for accomplishing goal-directed behaviours. The model, together with the background knowledge reviewed in the paper, represents a framework that can be used to guide the design and interpretation of empirical experiments on IMs, and to computationally validate and further develop theories on them.
AB - Reinforcement (trial-and-error) learning in animals is driven by a multitude of processes. Most animals have evolved several sophisticated systems of 'extrinsic motivations' (EMs) that guide them to acquire behaviours allowing them to maintain their bodies, defend against threat, and reproduce. Animals have also evolved various systems of 'intrinsic motivations' (IMs) that allow them to acquire actions in the absence of extrinsic rewards. These actions are used later to pursue such rewards when they become available. Intrinsic motivations have been studied in Psychology for many decades and their biological substrates are now being elucidated by neuroscientists. In the last two decades, investigators in computational modelling, robotics and machine learning have proposed various mechanisms that capture certain aspects of IMs. However, we still lack models of IMs that attempt to integrate all key aspects of intrinsically motivated learning and behaviour while taking into account the relevant neurobiological constraints. This paper proposes a bio-constrained system-level model that contributes a major step towards this integration. The model focusses on three processes related to IMs and on the neural mechanisms underlying them: (a) the acquisition of action-outcome associations (internal models of the agent-environment interaction) driven by phasic dopamine signals caused by sudden, unexpected changes in the environment; (b) the transient focussing of visual gaze and actions on salient portions of the environment; (c) the subsequent recall of actions to pursue extrinsic rewards based on goal-directed reactivation of the representations of their outcomes. The tests of the model, including a series of selective lesions, show how the focussing processes lead to a faster learning of action-outcome associations, and how these associations can be recruited for accomplishing goal-directed behaviours. The model, together with the background knowledge reviewed in the paper, represents a framework that can be used to guide the design and interpretation of empirical experiments on IMs, and to computationally validate and further develop theories on them.
KW - Attention
KW - Basal ganglia selection
KW - Dopamine
KW - Intrinsic motivations
KW - Parietal, premotor, prefrontal cortex
KW - Repetition bias
KW - Striato-cortical loops
KW - Superior colliculus
KW - Trial-and-error learning
UR - http://www.scopus.com/inward/record.url?scp=84875892345&partnerID=8YFLogxK
U2 - 10.1016/j.neunet.2012.09.015
DO - 10.1016/j.neunet.2012.09.015
M3 - Article
C2 - 23098753
AN - SCOPUS:84875892345
SN - 0893-6080
VL - 41
SP - 168
EP - 187
JO - Neural Networks
JF - Neural Networks
ER -