verve (vûrv) n.

  1. Vitality; liveliness.
  2. Aptitude; talent.

What is Verve?

The Verve library provides general purpose agents that can learn to control things. For example, they could learn to drive simulated vehicles, control robots operating in the real world, or control non-player characters (NPCs) in video games. In general, Verve agents can be plugged into any system that can sense its environment and output actions.

Verve aspires to be...

  • General purpose - applicable to a wide variety of problems
  • Easy to use - simple API, full documentation, good examples


  • Agents learn to solve reinforcement learning tasks using temporal difference learning with eligibility traces in an actor-critic architecture.
  • Agents use a dynamically-growing radial basis function state representation. This combines sensory inputs into higher-level features and allows generalization to unseen inputs. The state representation grows dynamically, allocating resources for new states as they are experienced.
  • Agents use a softmax action selection scheme which maintains separate selection probabilities for each action.
  • The library can be used for discrete environments or for real-time control in continuous environments. It is ideal for games and robotics.
  • Agents can use any number of sensory inputs and actions. Sensors can be discrete (e.g. battery power is low, medium, or high) or continuous (e.g. a distance value returned by a laser rangefinder). Continuous sensors have a “resolution” setting which determines their acuity.
  • Agents learn a predictive model of their environments through experience. This increases their learning performance by allowing them to learn from simulated experiences (i.e. planning).
  • An internal uncertainty estimation automatically determines the length of planning sequences. (If uncertainty is too high, there’s no point in continuing planning.)
  • Agents use a model of curiosity which drives them to explore new situations. This helps them to improve their predictive models.
  • Once an agent learns a task proficiently (i.e. finishes its training phase), learning can be disabled to save computational resources.
  • Agents can be saved to and loaded from XML files.
  • The distribution includes Python bindings (generated with SWIG).
  • The library is unit tested.
  • The source code is heavily commented.
  • The distribution includes the ability to output value function data to a text file for visualization. It also includes a separate application to generate PNG image files from value function data for agents with either one or two sensors.

Limitations / Future Development

  • Limitation: Computational space and time requirements grow exponentially with the number of inputs. This is mainly due to the combinatorial state representation that combines all inputs into a higher level representation. Possible solutions to this problem include dimensionality reduction (e.g., using PCA or ICA) and using hierarchical representations of states and actions.
  • Agents can learn to select from a finite number of user-defined actions, but they do not learn continuous control signals. Future implementations will autonomously learn continuous action signals instead of simply acting as a switching system.
  • Agents have no temporal state representation, so they cannot predict future events at specific times. One solution to this problem is to use an explicit representation of time (i.e. augmenting the state representation to include a short history of previous states).

License Information

Verve is licensed under either the BSD or the LGPL license. Information for these licenses is distributed with the library.

about.txt · Last modified: 2011/04/18 05:33 by tylerstreeter