index (markov.index)

Design

The digraph below shows how an agent's policy consumes a state, returning:

A new policy. In the case where this policy is different from the previous policy, the agent is 'training'.
An action to be executed in the surrounding system.
An 'observer': a function that will return a new state.

Requiring that the policy return an observer introduces a helpful generalisation to continuous-time systems - one in which the policy itself controls when observations occur, and feasibly what they consist of.

E.g., in the context of cyber defence, consider two scenarios:

In response to a high threat state a policy might produce an observer function that returns quickly with information that is critical to that specific threat.
Alternatively, in a low threat state, produce an observer function returning after a longer delay, perhaps with a state containing a broader set of information.

In the application of online learning to real systems, resource bounds will influence the optimal behaviour of the agent. The observer model allows the policy itself to control and optimise over the challenges posed by resource constraints.

The `Agent` module

The Agent module exposes a functor (what is a functor?), Agent.Make, that when called will return a module of type Agent.S.

Example usage:

module MarkovCompressor = struct
(* Your implementation here *) ...
end

module Reward = struct
(* Your implementation here *) ...
end

module Policy = struct
(* Your implementation here *) ...
end

module Agent = Markov.Agent.Make (MarkovCompressor) (Reward) (Policy)

let () =
  Agent.act (Agent.init_policy ())

The `Agent.S.act` loop

In the example above,

  Agent.act (Agent.init_policy ())

kicks off an infinite loop modelling an Markov Decision Process (MDP). Each iteration of the loop involves:

measuring the state of the MDP
infering an action to take
infering a method to next measure the state of the MDP (in other words producing an observer)
executing the action

The implementation of this loop is included in the markov library. It is found in the implementation of the Agent.Make functor.

Reference Implementation

The blue library implements the Markov interfaces (albeit with a simple implementation).

Design

The Agent module

The Agent.S.act loop

Reference Implementation

The `Agent` module

The `Agent.S.act` loop