A Model Predictive Control and Deep Q Learning Approach to Wayfinding
Published:
For the last 200 years research into human cognition and decision-making has revolved around rational choicetheory, which assumes that actors are utility maximizers capable of searching and finding rational and optimal decisions. However human behavioral data doesn’t support the assumptions that we have infinite time nor infinite cognitive energy to search the full space of candidate choices to produce optimal decisions. Take for example a hiker, who has an infinite number of paths to choose from. It’s irrational to assume that the hiker is capable of fully evaluating each of the infinite paths available to them to then choose the most rational/ optimal path. And yet that is the operating assumption for most modern cognitive models. To remedy this problem, We propose three axiomatic principles of energy-efficient decision making, along with an optimal-control based model and Deep Q learning architectures that capture those principles for the purpose of modeling animal wayfinding. We also explore the task of LLM sentence generation as a subset of the broader class of wayfinding problems.