The markov library

markov defines a reusable abstract software interface for online learning agents that interact with real systems.

Many existing applications of online learning agents to problems, both inside and outside of Autonomous Cyber Defence (ACD), involve two significant constraints:

  1. The agent interacts with a discrete-time system, representable by a state machine.
  2. The interactions between the agent and it's environment are turn-based.

Real-world systems are typically continuous-time and concurrently-running. markov presents a simple model that generalises over the two constraints above, to provide a reusable software interface for the application of online learning agents to continuous-time, concurrently-running systems.

Design

The digraph below shows how an agent's policy consumes a state, returning:

  1. A new policy. In the case where this policy is different from the previous policy, the agent is 'training'.
  2. An action to be executed in the surrounding system.
  3. An 'observer': a function that will return a new state.

Requiring that the policy return an observer introduces a helpful generalisation to continuous-time systems - one in which the policy itself controls when observations occur, and feasibly what they consist of.

policy-steps.png

E.g., in the context of cyber defence, consider two scenarios:

  1. In response to a high threat state a policy might produce an observer function that returns quickly with information that is critical to that specific threat.
  2. Alternatively, in a low threat state, produce an observer function returning after a longer delay, perhaps with a state containing a broader set of information.

In the application of online learning to real systems, resource bounds will influence the optimal behaviour of the agent. The observer model allows the policy itself to control and optimise over the challenges posed by resource constraints.

The Agent module

The Agent module exposes a functor (what is a functor?), Agent.Make, that when called will return a module of type Agent.S.

Example usage:

module MarkovCompressor = struct
(* Your implementation here *) ...
end

module Reward = struct
(* Your implementation here *) ...
end

module Policy = struct
(* Your implementation here *) ...
end

module Agent = Markov.Agent.Make (MarkovCompressor) (Reward) (Policy)

let () =
  Agent.act (Agent.init_policy ())

The Agent.S.act loop

In the example above,

  Agent.act (Agent.init_policy ())

kicks off an infinite loop modelling an Markov Decision Process (MDP). Each iteration of the loop involves:

  1. measuring the state of the MDP
  2. infering an action to take
  3. infering a method to next measure the state of the MDP (in other words producing an observer)
  4. executing the action

The implementation of this loop is included in the markov library. It is found in the implementation of the Agent.Make functor.

Reference Implementation

The blue library implements the Markov interfaces (albeit with a simple implementation).