Next: Introduction, Previous: (dir), Up: (dir) [Contents][Index]
This is the micmac Reference Manual, version 0.0.2, generated automatically by Declt version 3.0 "Montgomery Scott" on Fri Jun 26 11:45:32 2020 GMT+0.
• Introduction | What micmac is all about | |
• Systems | The systems documentation | |
• Modules | The modules documentation | |
• Files | The files documentation | |
• Packages | The packages documentation | |
• Definitions | The symbols documentation | |
• Indexes | Concepts, functions, variables and data types |
# Micmac Manual ###### \[in package MICMAC\] ## micmac ASDF System Details - Version: 0.0.2 - Description: Micmac is mainly a library of graph search algorithms such as alpha-beta, UCT and beam search, but it also has some MCMC and other slightly unrelated stuff. - Licence: MIT, see COPYING. - Author: Gábor Melis- Mailto: [mega@retes.hu](mailto:mega@retes.hu) - Homepage: [http://melisgl.github.io/mgl-gpr](http://melisgl.github.io/mgl-gpr) - Bug tracker: [https://github.com/melisgl/mgl-gpr/issues](https://github.com/melisgl/mgl-gpr/issues) - Source control: [GIT](https://github.com/melisgl/mgl-gpr.git) ## Introduction ### Overview MICMAC is a Common Lisp library by [Gábor Melis](http://quotenil.com) focusing on [graph search](http://en.wikipedia.org/wiki/Graph_traversal) algorithms. ### Links Here is the [official repository](https://github.com/melisgl/micmac) and the [HTML documentation](http://melisgl.github.io/mgl-pax-world/micmac-manual.html) for the latest version. ## Graph Search - [function] ALPHA-BETA STATE &KEY (DEPTH 0) ALPHA BETA CALL-WITH-ACTION MAYBE-EVALUATE-STATE LIST-ACTIONS RECORD-BEST Alpha-beta pruning for two player, zero-sum maximax (like minimax but both players maximize and the score is negated when passed between depths). Return the score of the game STATE from the point of view of the player to move at DEPTH and as the second value the list of actions of the principal variant. CALL-WITH-ACTION is a function of (STATE DEPTH ACTION FN). It carries out ACTION (returned by LIST-ACTIONS or NIL) to get the state corresponding to DEPTH and calls FN with that state. It may destructively modify STATE provided it undoes the damage after FN returns. CALL-WITH-ACTION is called with NIL as ACTION for the root of the tree, in this case STATE need not be changed. FN returns the same kinds of values as ALPHA-BETA. They may be useful for logging. MAYBE-EVALUATE-STATE is a function of (STATE DEPTH). If STATE at DEPTH is a terminal node then it returns the score from the point of view of the player to move and as the second value a list of actions that lead from STATE to the position that was evaluated. The list of actions is typically empty. If we are not at a terminal node then MAYBE-EVALUATE-STATE returns NIL. LIST-ACTIONS is a function of (STATE DEPTH) and returns a non-empty list of legal candidate moves for non-terminal nodes. Actions are tried in the order LIST-ACTIONS returns them: stronger moves CALL-WITH-ACTION, MAYBE-EVALUATE-STATE and LIST-ACTIONS are mandatory. RECORD-BEST, if non-NIL, is a function of (DEPTH SCORE ACTIONS). It is called when at DEPTH a new best action is found. ACTIONS is a list of all the actions in the principle variant corresonding to the newly found best score. RECORD-BEST is useful for graceful degradation in case of timeout. ALPHA and BETA are typically NIL (equivalent to -infinity, +infinity) but any real number is allowed if the range of scores can be boxed. See `test/test-alpha-beta.lisp` for an example. - [function] BEAM-SEARCH START-NODES &KEY MAX-DEPTH (N-SOLUTIONS 1) (BEAM-WIDTH (LENGTH START-NODES)) EXPAND-NODE-FN EXPAND-BEAM-FN SCORE-FN UPPER-BOUND-FN SOLUTIONP-FN (FINISHEDP-FN SOLUTIONP-FN) In a graph, search for nodes that with the best scores with [beam search](http://en.wikipedia.org/wiki/Beam_search). That is, starting from START-NODES perform a breadth-first search but at each depth only keep BEAM-WIDTH number of nodes with the best scores. Keep the best N-SOLUTIONS (at most) complete solutions. Discard nodes known to be unable to get into the best N-SOLUTIONS (due to UPPER-BOUND-FN). Finally, return the solutions and the active nodes (the *beam*) as adjustable arrays sorted by score in descending order. START-NODES (a sequence of elements of arbitrary type). SCORE-FN, UPPER-BOUND-FN, SOLUTIONP-FN, FINISHEDP-FN are all functions of one argument: the node. SOLUTIONP-FN checks whether a node represents a complete solution (i.e. some goal is reached). SCORE-FN returns a real number that's to be maximized, it's only called for node for which SOLUTIONP-FN returned true. UPPER-BOUND-FN (if not NIL) returns a real number that equal or greater than the score of all solutions reachable from that node. FINISHEDP-FN returns true iff there is nowhere to go from the node. EXPAND-NODE-FN is also a function of a single node argument. It returns a sequence of nodes to 'one step away' from its argument node. EXPAND-BEAM-FN is similar, but it takes a vector of nodes and returns all nodes one step away from any of them. It's enough provide either EXPAND-NODE-FN or EXPAND-BEAM-FN. The purpose of EXPAND-BEAM-FN. is to allow more efficient, batch-like operations. See `test/test-beam-search.lisp` for an example. - [function] PARALLEL-BEAM-SEARCH START-NODE-SEQS &KEY MAX-DEPTH (N-SOLUTIONS 1) BEAM-WIDTH EXPAND-NODE-FN EXPAND-BEAMS-FN SCORE-FN UPPER-BOUND-FN SOLUTIONP-FN (FINISHEDP-FN SOLUTIONP-FN) This is very much like BEAM-SEARCH except it solves a number of instances of the same search problem starting from different sets of nodes. The sole purpose of PARALLEL-BEAM-SEARCH is to amortize the cost EXPAND-BEAM-FN if possible. EXPAND-BEAMS-FN is called with sequence of beams (i.e. it's a sequence of sequence of nodes) and it must return another sequence of sequences of nodes. Each element of the returned sequence is the reachable nodes of the nodes in the corresponding element of its argument sequence. PARALLEL-BEAM-SEARCH returns a sequence of solutions sequences, and a sequence of active node sequences. See `test/test-beam-search.lisp` for an example. ### UCT ###### \[in package MICMAC.UCT\] UCT Monte Carlo tree search. This is what makes current Go programs tick. And Hex programs as well, for that matter. This is a cleanup and generalization of code originally created in course of the Google AI Challenge 2010. For now, the documentation is just a reference. See `test/test-uct.lisp` for an example. - [class] UCT-NODE A node in the UCT tree. Roughly translates to a state in the search space. Note that the state itself is not stored explicity, but it can be recovered by \`replaying' the actions from the starting state or by customizing MAKE-UCT-NODE. - [reader] DEPTH UCT-NODE (:DEPTH = 0) - [accessor] EDGES UCT-NODE Outgoing edges. - [accessor] AVERAGE-REWARD UCT-NODE (:AVERAGE-REWARD = 0) Average reward over random playouts started from below this node. See UPDATE-UCT-STATISTICS and REWARD. - [class] UCT-EDGE An edge in the UCT tree. Represents an action taken from a state. The value of an action is the value of its target state which is not quite as generic as it could be; feel free to specialize AVERAGE-REWARD for the edges if that's not the case. - [reader] ACTION UCT-EDGE (:ACTION) The action represented by the edge. - [accessor] FROM-NODE UCT-EDGE (:FROM-NODE) The node this edge starts from. - [accessor] TO-NODE UCT-EDGE (= NIL) The node this edge points to if the edge has been visited or NIL. - [function] VISITED-EDGES NODE - [function] UNVISITED-EDGES NODE - [generic-function] EDGE-SCORE NODE EDGE EXPLORATION-BIAS - [generic-function] SELECT-EDGE NODE EXPLORATION-BIAS Choose an action to take from a state, in other words an edge to follow from NODE in the tree. The default implementation chooses randomly from the yet unvisited edges or if there is none moves down the edge with the maximum EDGE-SCORE. If you are thinking of customizing this, for example to make it choose the minimum at odd depths, the you may want to consider specializing REWARD or UPDATE-UCT-STATISTICS instead. - [generic-function] OUTCOME->REWARD NODE OUTCOME Compute the reward for a node in the tree from OUTCOME that is the result of a playout. This is called by the default implementation of UPDATE-UCT-STATISTICS. This is where one typically negates depending on the parity of DEPTH in two player games. - [generic-function] UPDATE-UCT-STATISTICS ROOT PATH OUTCOME Increment the number of visits and update the average reward in nodes and edges of PATH. By default, edges simply get their visit counter incremented while nodes also get an update to AVERAGE-REWARD based on what OUTCOME->REWARD returns. - [generic-function] MAKE-UCT-NODE PARENT EDGE PARENT-STATE Create a node representing the state of that EDGE leads to from PARENT. Specialize this if you want to keep track of the state which is not done by default as it can be expensive, especially in light of TAKE-ACTION mutating it. The default implementation simply creates an instance of the class of PARENT so that one can start from a subclass of UCT-NODE and be sure that that class is going to be used for nodes below it. - [generic-function] STATE NODE PARENT EDGE PARENT-STATE Return the state that corresponds to NODE. This is not a straightforward accessor unless NODE is customized to store it. The rest of the parameters are provided so that one can reconstruct the state by taking the action of EDGE in the PARENT-STATE of PARENT. It's okay to destroy PARENT-STATE in the process as long as it's not stored elsewhere. This function must be customized. - [generic-function] LIST-EDGES NODE STATE Return a list of edges representing the possible actions from NODE with STATE. This function must be customized. - [generic-function] PLAY-OUT NODE STATE REVERSE-PATH Play a random game from NODE with STATE and return the outcome that's fed into UPDATE-UCT-STATISTICS. The way the random game is played is referred to as \`default policy' and that's what makes or breaks UCT search. This function must be customized. - [function] UCT &KEY ROOT FRESH-ROOT-STATE EXPLORATION-BIAS MAX-N-PLAYOUTS Starting from the ROOT node search the tree expanding it one node for each playout. Finally return the mutated ROOT. ROOT may be the root node of any tree, need not be a single node with no edges. FRESH-ROOT-STATE is a function that returns a fresh state corresponding to ROOT. This state will be destroyed unless special care is taken in STATE. ## Metropolis Hastings ###### \[in package MICMAC.METROPOLIS-HASTINGS\] Generic interface for the Metropolis-Hastings algorithm, also Metropolis Coupled MCMC. References: - http://en.wikipedia.org/wiki/Metropolis–Hastings\_algorithm - Markov Chain Monte Carlo and Gibbs Sampling Lecture Notes for EEB 581, version 26 April 2004 c B. Walsh 2004 http://web.mit.edu/~wingated/www/introductions/mcmc-gibbs-intro.pdf - Geyer, C.J. (1991) Markov chain Monte Carlo maximum likelihood For now, the documentation is just a reference. See `test/test-metropolis-hastings.lisp` for an example. - [class] MC-CHAIN A simple markov chain for Metropolis Hastings. With temperature it is suitable for MC3. - [accessor] TEMPERATURE MC-CHAIN (:TEMPERATURE = 1.0d0) The PROBABILITY-RATIO of samples is raised to the power of 1 / TEMPERATURE before calculating the acceptance probability. This effectively flattens the peaks if TEMPERATURE > 1 which makes it easier for the chain to traverse deep valleys. - [reader] STATE MC-CHAIN (:STATE) This is the current sample where the chain is. - [function] JUMP-TO-SAMPLE CHAIN JUMP &KEY (RESULT-SAMPLE (STATE CHAIN)) From the current state of CHAIN make JUMP (from the current distribution of CHAIN) and return the sample where we landed. Reuse RESULT-SAMPLE when possible. - [generic-function] JUMP-TO-SAMPLE* CHAIN JUMP RESULT-SAMPLE This function is called by JUMP-TO-SAMPLE. It is where JUMP-TO-SAMPLE behaviour shall be customized. - [generic-function] PREPARE-JUMP-DISTRIBUTION CHAIN Prepare for sampling from the F(X) = Q(SAMPLE->X) distribution. Called by RANDOM-JUMP. The around method ensures that nothing is done unless there was a state change. - [generic-function] RANDOM-JUMP CHAIN Sample a jump from the current distribution of jumps that was computed by PREPARE-JUMP-DISTRIBUTION. - [generic-function] LOG-PROBABILITY-RATIO CHAIN SAMPLE1 SAMPLE2 Return P(SAMPLE1)/P(SAMPLE2). It's in the log domain to avoid overflows and the ratio part is because that it may allow computational shortcuts as opposed to calculating unnormalized probabilities separately. - [generic-function] LOG-PROBABILITY-RATIO-TO-JUMP-TARGET CHAIN JUMP TARGET Return P(TARGET)/P(STATE) where JUMP is from the current state of CHAIN to TARGET sample. This can be specialized for speed. The default implementation just falls back on LOG-PROBABILITY-RATIO. - [generic-function] LOG-JUMP-PROBABILITY-RATIO CHAIN JUMP TARGET Return Q(TARGET->STATE) / Q(STATE->TARGET) where Q is the jump distribution and JUMP is from the current STATE of CHAIN to TARGET sample. - [generic-function] ACCEPTANCE-PROBABILITY CHAIN JUMP CANDIDATE Calculate the acceptance probability of CANDIDATE to which JUMP leads from the current STATE of CHAIN. - [generic-function] ACCEPT-JUMP CHAIN JUMP CANDIDATE Called when CHAIN accepts JUMP to CANDIDATE. - [generic-function] REJECT-JUMP CHAIN JUMP CANDIDATE Called when CHAIN rejects JUMP to CANDIDATE. It does nothing by default, it's just a convenience for debugging. - [generic-function] MAYBE-JUMP CHAIN JUMP CANDIDATE ACCEPTANCE-PROBABILITY Randomly accept or reject JUMP to CANDIDATE from the current state of CHAIN with ACCEPTANCE-PROBABILITY. - [generic-function] JUMP CHAIN Take a step on the markov chain. Return a boolean indicating whether the proposed jump was accepted. - [class] MC3-CHAIN MC-CHAIN High probability island separated by low valley make the chain poorly mixing. MC3-CHAIN has a number of HOT-CHAINS that have state probabilities similar to that of the main chain but less jagged. Often it suffices to set the temperatures of the HOT-CHAINS higher use the very same base probability distribution. - [generic-function] ACCEPT-SWAP-CHAIN-STATES MC3 CHAIN1 CHAIN2 Swap the states of CHAIN1 and CHAIN2 of MC3. - [generic-function] REJECT-SWAP-CHAIN-STATES MC3 CHAIN1 CHAIN2 Called when the swap of states of CHAIN1 and CHAIN2 is rejected. It does nothing by default, it's just a convenience for debugging. - [generic-function] MAYBE-SWAP-CHAIN-STATES MC3 CHAIN1 CHAIN2 ACCEPTANCE-PROBABILITY Swap of states of CHAIN1 and CHAIN2 of MC3 with ACCEPTANCE-PROBABILITY. - [generic-function] JUMP-BETWEEN-CHAINS MC3 Choose two chains randomly and swap their states with MC3 acceptance probability. - [class] ENUMERATING-CHAIN MC-CHAIN A simple abstract chain subclass that explicitly enumerates the probabilities of the distribution. - [class] TRACING-CHAIN Mix this in with your chain to have it print trace of acceptances/rejections. ## Game Theory ###### \[in package MICMAC.GAME-THEORY\] - [function] FIND-NASH-EQUILIBRIUM PAYOFF &KEY (N-ITERATIONS 100) Find a Nash equilibrium of a zero-sum game represented by PAYOFF matrix (a 2d matrix or a nested list). PAYOFF is from the point of view of the row player: the player who choses column wants to minimize, the row player wants to maximize. The first value returned is a vector of unnormalized probabilities assigned to each action of the row player, the second value is the same for the column player and the third is the expected payoff of the row player in the nash equilibrium represented by the oddment vectors. * * * ###### \[generated by [MGL-PAX](https://github.com/melisgl/mgl-pax)\]
Next: Modules, Previous: Introduction, Up: Top [Contents][Index]
The main system appears first, followed by any subsystem dependency.
• The micmac system |
Gábor Melis <mega@retes.hu>
(:git "https://github.com/melisgl/mgl-gpr.git")
MIT, see COPYING.
Micmac is mainly a library of graph search algorithms
such as alpha-beta, UCT and beam search, but it also has some MCMC
and other slightly unrelated stuff.
0.0.2
mgl-pax
micmac.asd (file)
src (module)
Modules are listed depth-first from the system components tree.
• The micmac/src module |
micmac (system)
src/
Files are sorted by type and then listed depth-first from the systems components trees.
• Lisp files |
Next: The micmac/src/package․lisp file, Previous: Lisp files, Up: Lisp files [Contents][Index]
micmac.asd
micmac (system)
Next: The micmac/src/uct․lisp file, Previous: The micmac․asd file, Up: Lisp files [Contents][Index]
src (module)
src/package.lisp
Next: The micmac/src/graph-search․lisp file, Previous: The micmac/src/package․lisp file, Up: Lisp files [Contents][Index]
package.lisp (file)
src (module)
src/uct.lisp
Next: The micmac/src/metropolis-hastings․lisp file, Previous: The micmac/src/uct․lisp file, Up: Lisp files [Contents][Index]
uct.lisp (file)
src (module)
src/graph-search.lisp
Next: The micmac/src/game-theory․lisp file, Previous: The micmac/src/graph-search․lisp file, Up: Lisp files [Contents][Index]
graph-search.lisp (file)
src (module)
src/metropolis-hastings.lisp
Next: The micmac/src/micmac․lisp file, Previous: The micmac/src/metropolis-hastings․lisp file, Up: Lisp files [Contents][Index]
metropolis-hastings.lisp (file)
src (module)
src/game-theory.lisp
Next: The micmac/src/doc․lisp file, Previous: The micmac/src/game-theory․lisp file, Up: Lisp files [Contents][Index]
game-theory.lisp (file)
src (module)
src/micmac.lisp
Previous: The micmac/src/micmac․lisp file, Up: Lisp files [Contents][Index]
micmac.lisp (file)
src (module)
src/doc.lisp
Next: Definitions, Previous: Files, Up: Top [Contents][Index]
Packages are listed by definition order.
• The micmac.game-theory package | ||
• The micmac package | ||
• The micmac.metropolis-hastings package | ||
• The micmac.uct package |
Next: The micmac package, Previous: Packages, Up: Packages [Contents][Index]
See MICMAC.GAME-THEORY:@MICMAC-GAME-THEORY.
package.lisp (file)
Next: The micmac․metropolis-hastings package, Previous: The micmac․game-theory package, Up: Packages [Contents][Index]
See MICMAC:@MICMAC-MANUAL.
package.lisp (file)
Next: The micmac․uct package, Previous: The micmac package, Up: Packages [Contents][Index]
See MICMAC.METROPOLIS-HASTINGS:@MICMAC-METROPOLIS-HASTINGS.
package.lisp (file)
micmac.mh
Previous: The micmac․metropolis-hastings package, Up: Packages [Contents][Index]
See MICMAC.UCT:@MICMAC-UCT.
package.lisp (file)
Definitions are sorted by export status, category, package, and then by lexicographic order.
• Exported definitions | ||
• Internal definitions |
Next: Internal definitions, Previous: Definitions, Up: Definitions [Contents][Index]
• Exported special variables | ||
• Exported functions | ||
• Exported generic functions | ||
• Exported classes |
Next: Exported functions, Previous: Exported definitions, Up: Exported definitions [Contents][Index]
game-theory.lisp (file)
graph-search.lisp (file)
micmac.lisp (file)
micmac.lisp (file)
micmac.lisp (file)
metropolis-hastings.lisp (file)
micmac.lisp (file)
uct.lisp (file)
Next: Exported generic functions, Previous: Exported special variables, Up: Exported definitions [Contents][Index]
Alpha-beta pruning for two player, zero-sum maximax (like minimax
but both players maximize and the score is negated when passed
between depths). Return the score of the game STATE from the point
of view of the player to move at DEPTH and as the second value the
list of actions of the principal variant.
CALL-WITH-ACTION is a function of (STATE DEPTH ACTION FN). It carries out ACTION (returned by LIST-ACTIONS or NIL) to get the state corresponding to DEPTH and calls FN with that state. It may destructively modify STATE provided it undoes the damage after FN returns. CALL-WITH-ACTION is called with NIL as ACTION for the root of the tree, in this case STATE need not be changed. FN returns the same kinds of values as ALPHA-BETA. They may be useful for logging.
MAYBE-EVALUATE-STATE is a function of (STATE DEPTH). If STATE at
DEPTH is a terminal node then it returns the score from the point of
view of the player to move and as the second value a list of actions
that lead from STATE to the position that was evaluated. The list of
actions is typically empty. If we are not at a terminal node then
MAYBE-EVALUATE-STATE returns NIL.
LIST-ACTIONS is a function of (STATE DEPTH) and returns a non-empty
list of legal candidate moves for non-terminal nodes. Actions are
tried in the order LIST-ACTIONS returns them: stronger moves
CALL-WITH-ACTION, MAYBE-EVALUATE-STATE and LIST-ACTIONS are
mandatory.
RECORD-BEST, if non-NIL, is a function of (DEPTH SCORE ACTIONS). It
is called when at DEPTH a new best action is found. ACTIONS is a
list of all the actions in the principle variant corresonding to the
newly found best score. RECORD-BEST is useful for graceful
degradation in case of timeout.
ALPHA and BETA are typically NIL (equivalent to -infinity,
+infinity) but any real number is allowed if the range of scores can
be boxed.
See ‘test/test-alpha-beta.lisp‘ for an example.
graph-search.lisp (file)
In a graph, search for nodes that with the best scores with [beam
search](http://en.wikipedia.org/wiki/Beam_search). That is, starting
from START-NODES perform a breadth-first search but at each depth
only keep BEAM-WIDTH number of nodes with the best scores. Keep the
best N-SOLUTIONS (at most) complete solutions. Discard nodes known
to be unable to get into the best N-SOLUTIONS (due to
UPPER-BOUND-FN). Finally, return the solutions and the active
nodes (the _beam_) as adjustable arrays sorted by score in
descending order.
START-NODES (a sequence of elements of arbitrary type). SCORE-FN,
UPPER-BOUND-FN, SOLUTIONP-FN, FINISHEDP-FN are all functions of one
argument: the node. SOLUTIONP-FN checks whether a node represents a
complete solution (i.e. some goal is reached). SCORE-FN returns a
real number that’s to be maximized, it’s only called for node for
which SOLUTIONP-FN returned true. UPPER-BOUND-FN (if not NIL)
returns a real number that equal or greater than the score of all
solutions reachable from that node. FINISHEDP-FN returns true iff
there is nowhere to go from the node.
EXPAND-NODE-FN is also a function of a single node argument. It returns a sequence of nodes to ’one step away’ from its argument node. EXPAND-BEAM-FN is similar, but it takes a vector of nodes and returns all nodes one step away from any of them. It’s enough provide either EXPAND-NODE-FN or EXPAND-BEAM-FN. The purpose of EXPAND-BEAM-FN. is to allow more efficient, batch-like operations.
See ‘test/test-beam-search.lisp‘ for an example.
graph-search.lisp (file)
Find a Nash equilibrium of a zero-sum game represented by PAYOFF matrix (a 2d matrix or a nested list). PAYOFF is from the point of view of the row player: the player who choses column wants to minimize, the row player wants to maximize. The first value returned is a vector of unnormalized probabilities assigned to each action of the row player, the second value is the same for the column player and the third is the expected payoff of the row player in the nash equilibrium represented by the oddment vectors.
game-theory.lisp (file)
From the current state of CHAIN make JUMP (from the current distribution of CHAIN) and return the sample where we landed. Reuse RESULT-SAMPLE when possible.
metropolis-hastings.lisp (file)
This is very much like BEAM-SEARCH except it solves a number of
instances of the same search problem starting from different sets of
nodes. The sole purpose of PARALLEL-BEAM-SEARCH is to amortize the
cost EXPAND-BEAM-FN if possible.
EXPAND-BEAMS-FN is called with sequence of beams (i.e. it’s a
sequence of sequence of nodes) and it must return another sequence
of sequences of nodes. Each element of the returned sequence is the
reachable nodes of the nodes in the corresponding element of its
argument sequence.
PARALLEL-BEAM-SEARCH returns a sequence of solutions sequences, and
a sequence of active node sequences.
See ‘test/test-beam-search.lisp‘ for an example.
graph-search.lisp (file)
Starting from the ROOT node search the tree expanding it one node for each playout. Finally return the mutated ROOT. ROOT may be the root node of any tree, need not be a single node with no edges. FRESH-ROOT-STATE is a function that returns a fresh state corresponding to ROOT. This state will be destroyed unless special care is taken in STATE.
uct.lisp (file)
uct.lisp (file)
uct.lisp (file)
Next: Exported classes, Previous: Exported functions, Up: Exported definitions [Contents][Index]
Called when CHAIN accepts JUMP to CANDIDATE.
metropolis-hastings.lisp (file)
Swap the states of CHAIN1 and CHAIN2 of MC3.
metropolis-hastings.lisp (file)
Calculate the acceptance probability of CANDIDATE to which JUMP leads from the current STATE of CHAIN.
metropolis-hastings.lisp (file)
The action represented by the edge.
uct.lisp (file)
uct.lisp (file)
Average reward over random playouts started from
below this node. See UPDATE-UCT-STATISTICS and REWARD.
uct.lisp (file)
automatically generated reader method
uct.lisp (file)
uct.lisp (file)
Outgoing edges.
uct.lisp (file)
The node this edge starts from.
uct.lisp (file)
Take a step on the markov chain. Return a boolean indicating whether the proposed jump was accepted.
metropolis-hastings.lisp (file)
Choose two chains randomly and swap their states with MC3 acceptance probability.
metropolis-hastings.lisp (file)
This function is called by JUMP-TO-SAMPLE. It is where JUMP-TO-SAMPLE behaviour shall be customized.
metropolis-hastings.lisp (file)
Return a list of edges representing the possible
actions from NODE with STATE. This function must be customized.
uct.lisp (file)
Return Q(TARGET->STATE) / Q(STATE->TARGET) where Q
is the jump distribution and JUMP is from the current STATE of CHAIN
to TARGET sample.
metropolis-hastings.lisp (file)
Return P(SAMPLE1)/P(SAMPLE2). It’s in the log
domain to avoid overflows and the ratio part is because that it may
allow computational shortcuts as opposed to calculating unnormalized
probabilities separately.
metropolis-hastings.lisp (file)
Return P(TARGET)/P(STATE) where JUMP is from the
current state of CHAIN to TARGET sample. This can be specialized for
speed. The default implementation just falls back on
LOG-PROBABILITY-RATIO.
metropolis-hastings.lisp (file)
Create a node representing the state of that EDGE
leads to from PARENT. Specialize this if you want to keep track of
the state which is not done by default as it can be expensive,
especially in light of TAKE-ACTION mutating it. The default
implementation simply creates an instance of the class of PARENT so
that one can start from a subclass of UCT-NODE and be sure that that
class is going to be used for nodes below it.
uct.lisp (file)
Randomly accept or reject JUMP to CANDIDATE from
the current state of CHAIN with ACCEPTANCE-PROBABILITY.
metropolis-hastings.lisp (file)
Swap of states of CHAIN1 and CHAIN2 of MC3 with ACCEPTANCE-PROBABILITY.
metropolis-hastings.lisp (file)
Compute the reward for a node in the tree from
OUTCOME that is the result of a playout. This is called by the
default implementation of UPDATE-UCT-STATISTICS. This is where one
typically negates depending on the parity of DEPTH in two player
games.
uct.lisp (file)
Play a random game from NODE with STATE and return
the outcome that’s fed into UPDATE-UCT-STATISTICS. The way the
random game is played is referred to as ‘default policy’ and that’s
what makes or breaks UCT search. This function must be
customized.
uct.lisp (file)
Prepare for sampling from the F(X) = Q(SAMPLE->X)
distribution. Called by RANDOM-JUMP. The around method ensures that
nothing is done unless there was a state change.
metropolis-hastings.lisp (file)
Sample a jump from the current distribution of jumps that was computed by PREPARE-JUMP-DISTRIBUTION.
metropolis-hastings.lisp (file)
Called when CHAIN rejects JUMP to CANDIDATE. It
does nothing by default, it’s just a convenience for debugging.
metropolis-hastings.lisp (file)
Called when the swap of states of CHAIN1 and CHAIN2
is rejected. It does nothing by default, it’s just a convenience for
debugging.
metropolis-hastings.lisp (file)
Choose an action to take from a state, in other
words an edge to follow from NODE in the tree. The default
implementation chooses randomly from the yet unvisited edges or if
there is none moves down the edge with the maximum EDGE-SCORE. If
you are thinking of customizing this, for example to make it choose
the minimum at odd depths, the you may want to consider specializing
REWARD or UPDATE-UCT-STATISTICS instead.
uct.lisp (file)
(setf state) (setf expander)
This is the current sample where the chain is.
metropolis-hastings.lisp (file)
metropolis-hastings.lisp (file)
state (generic function)
Return the state that corresponds to NODE. This is
not a straightforward accessor unless NODE is customized to store
it. The rest of the parameters are provided so that one can
reconstruct the state by taking the action of EDGE in the
PARENT-STATE of PARENT. It’s okay to destroy PARENT-STATE in the
process as long as it’s not stored elsewhere. This function must be
customized.
uct.lisp (file)
The PROBABILITY-RATIO of samples is raised to the
power of 1 / TEMPERATURE before calculating the acceptance
probability. This effectively flattens the peaks if TEMPERATURE >
1 which makes it easier for the chain to traverse deep valleys.
metropolis-hastings.lisp (file)
The node this edge points to if the edge has been visited or NIL.
uct.lisp (file)
Increment the number of visits and update the
average reward in nodes and edges of PATH. By default, edges simply
get their visit counter incremented while nodes also get an update
to AVERAGE-REWARD based on what OUTCOME->REWARD returns.
uct.lisp (file)
Previous: Exported generic functions, Up: Exported definitions [Contents][Index]
A simple abstract chain subclass that explicitly enumerates the probabilities of the distribution.
metropolis-hastings.lisp (file)
mc-chain (class)
p-jumps (generic function)
A simple markov chain for Metropolis Hastings. With temperature it is suitable for MC3.
metropolis-hastings.lisp (file)
standard-object (class)
The PROBABILITY-RATIO of samples is raised to the
power of 1 / TEMPERATURE before calculating the acceptance
probability. This effectively flattens the peaks if TEMPERATURE >
1 which makes it easier for the chain to traverse deep valleys.
:temperature
1.0d0
temperature (generic function)
(setf temperature) (generic function)
This is the current sample where the chain is.
:state
state (generic function)
candidate (generic function)
(setf candidate) (generic function)
jump-distribution-prepared-p (generic function)
(setf jump-distribution-prepared-p) (generic function)
High probability island separated by low valley
make the chain poorly mixing. MC3-CHAIN has a number of HOT-CHAINS
that have state probabilities similar to that of the main chain but
less jagged. Often it suffices to set the temperatures of the
HOT-CHAINS higher use the very same base probability
distribution.
metropolis-hastings.lisp (file)
mc-chain (class)
list
:hot-chains
hot-chains (generic function)
Mix this in with your chain to have it print trace of acceptances/rejections.
metropolis-hastings.lisp (file)
standard-object (class)
An edge in the UCT tree. Represents an action taken
from a state. The value of an action is the value of its target
state which is not quite as generic as it could be; feel free to
specialize AVERAGE-REWARD for the edges if that’s not the case.
uct.lisp (file)
standard-object (class)
The action represented by the edge.
:action
action (generic function)
The node this edge starts from.
:from-node
from-node (generic function)
(setf from-node) (generic function)
The node this edge points to if the edge has been visited or NIL.
to-node (generic function)
(setf to-node) (generic function)
The number of times this action was taken from the parent state.
0
n-visits (generic function)
(setf n-visits) (generic function)
A node in the UCT tree. Roughly translates to a
state in the search space. Note that the state itself is not stored
explicity, but it can be recovered by ‘replaying’ the actions from
the starting state or by customizing MAKE-UCT-NODE.
uct.lisp (file)
standard-object (class)
:depth
0
depth (generic function)
The number of times this node was visited.
0
n-visits (generic function)
(setf n-visits) (generic function)
Outgoing edges.
edges (generic function)
(setf edges) (generic function)
Average reward over random playouts started from
below this node. See UPDATE-UCT-STATISTICS and REWARD.
:average-reward
0
average-reward (generic function)
(setf average-reward) (generic function)
Previous: Exported definitions, Up: Definitions [Contents][Index]
• Internal macros | ||
• Internal functions | ||
• Internal generic functions |
Next: Internal functions, Previous: Internal definitions, Up: Internal definitions [Contents][Index]
graph-search.lisp (file)
graph-search.lisp (file)
Next: Internal generic functions, Previous: Internal macros, Up: Internal definitions [Contents][Index]
Find the first extreme value of the [START,END) subsequence of VECTOR and return it along with its index. The extremum is what would be the first element of VECTOR sorted by SORT with PRED.
game-theory.lisp (file)
uct.lisp (file)
Insert ITEM into VECTOR while keeping it sorted by PRED. Extend the vector if needed while respecting MAX-LENGTH.
graph-search.lisp (file)
game-theory.lisp (file)
metropolis-hastings.lisp (file)
Choose an element randomly from the [START,END) subsequence of SEQ with given probabilities. KEY returns the unormalized probability of an element, SUM is the sum of these unnormalized probalities contains unnormalized probabilties. Return the element chosen and its index.
metropolis-hastings.lisp (file)
uct.lisp (file)
metropolis-hastings.lisp (file)
Return the sum of elements in the [START,END) subsequence of SEQ.
metropolis-hastings.lisp (file)
Previous: Internal functions, Up: Internal definitions [Contents][Index]
automatically generated reader method
metropolis-hastings.lisp (file)
automatically generated writer method
metropolis-hastings.lisp (file)
automatically generated reader method
metropolis-hastings.lisp (file)
automatically generated reader method
metropolis-hastings.lisp (file)
automatically generated writer method
metropolis-hastings.lisp (file)
The number of times this action was taken from the parent state.
uct.lisp (file)
The number of times this node was visited.
uct.lisp (file)
automatically generated reader method
metropolis-hastings.lisp (file)
metropolis-hastings.lisp (file)
Previous: Definitions, Up: Top [Contents][Index]
• Concept index | ||
• Function index | ||
• Variable index | ||
• Data type index |
Next: Function index, Previous: Indexes, Up: Indexes [Contents][Index]
Jump to: | F L M |
---|
Jump to: | F L M |
---|
Next: Variable index, Previous: Concept index, Up: Indexes [Contents][Index]
Jump to: | (
A B C D E F G H I J L M N O P R S T U V |
---|
Jump to: | (
A B C D E F G H I J L M N O P R S T U V |
---|
Next: Data type index, Previous: Function index, Up: Indexes [Contents][Index]
Jump to: | @
A C D E F H J N P S T |
---|
Jump to: | @
A C D E F H J N P S T |
---|
Previous: Variable index, Up: Indexes [Contents][Index]
Jump to: | C E M P S T U |
---|
Jump to: | C E M P S T U |
---|