What RL algorithm can I use to model exploration/exploitation in my multi-armed bandit?
I am developing an agent-based model where agents are placed on an NK-landscape (a combinatorial problem) and are tasked with finding the highest point by manipulating a bit string. Agents do not interact with one another, and have a choice between two different climbing methods. At each time step the agent needs to decide what climbing method to use based off the information available to them (current fitness, potentially previous fitness). After implementing a climbing method they receive feedback (their new altitude). I see this as a version of the multi-armed bandit problem with each agent being its own bandit.
What RL algorithm can I use to model exploration/exploitation in my multi-armed bandit?
I am developing an agent-based model where agents are placed on an NK-landscape (a combinatorial problem) and are tasked with finding the highest point by manipulating a bit string. Agents do not interact with one another, and have a choice between two different climbing methods. At each time step the agent needs to decide what climbing method to use based off the information available to them (current fitness, potentially previous fitness). After implementing a climbing method they receive feedback (their new altitude). I see this as a version of the multi-armed bandit problem with each agent being its own bandit.