markov
librarymarkov
defines a reusable abstract software interface for online learning agents that interact with real systems.
Many existing applications of online learning agents to problems, both inside and outside of Autonomous Cyber Defence (ACD), involve two significant constraints:
Real-world systems are typically continuous-time and concurrently-running. markov
presents a simple model that generalises over the two constraints above, to provide a reusable software interface for the application of online learning agents to continuous-time, concurrently-running systems.
The digraph below shows how an agent's policy consumes a state, returning:
Requiring that the policy return an observer introduces a helpful generalisation to continuous-time systems - one in which the policy itself controls when observations occur, and feasibly what they consist of.
E.g., in the context of cyber defence, consider two scenarios:
In the application of online learning to real systems, resource bounds will influence the optimal behaviour of the agent. The observer model allows the policy itself to control and optimise over the challenges posed by resource constraints.
Agent
moduleThe Agent
module exposes a functor (what is a functor?), Agent.Make
, that when called will return a module of type Agent.S
.
Example usage:
module MarkovCompressor = struct
(* Your implementation here *) ...
end
module Reward = struct
(* Your implementation here *) ...
end
module Policy = struct
(* Your implementation here *) ...
end
module Agent = Markov.Agent.Make (MarkovCompressor) (Reward) (Policy)
let () =
Agent.act (Agent.init_policy ())
Agent.S.act
loopIn the example above,
Agent.act (Agent.init_policy ())
kicks off an infinite loop modelling an Markov Decision Process (MDP). Each iteration of the loop involves:
The implementation of this loop is included in the markov
library. It is found in the implementation of the Agent.Make
functor.
The blue library implements the Markov
interfaces (albeit with a simple implementation).