Do you want BuboFlash to help you learning these things? Click here to log in or create user.

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |

Tags

#artificial-intelligence #reinforcement-learning

Question

What two steps does DeepCubeA use to solve Rubik's cubes?

Answer

- Neural network is trained to learn the cost-to-go function using Autodidactic Iteration (Uniformly sample scramble depth k from 1 to K, apply this many random scrambles to the cube).
- Batch Weighted A* Search is used to actually solve cubes. This is weighted A* search, plus expanding the children of the N nodes with lowest predicted cost (to use computation efficiently)

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

Tags

#artificial-intelligence

Question

What is the point in using Weighted A* over A*?

Answer

We often have that the heuristic h drastically underestimates the true distance, so that we can obtain a more realistic estimate by scaling up its influence with regard to some parameter. Although this compromises optimality, it can lead to a significant speedup; this is an appropriate choice when searching under time or space constraints.

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

Tags

#artificial-intelligence

Question

If we parameterize **[...]** with \(l\in[0,1]\), we obtain a continuous range of best-first search variants \(A_l\), also denoted as weighted A*.

Answer

\(f_l(u) = l\cdot h(u) + (1-l)\cdot g(u)\)

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

Tags

#artificial-intelligence

Question

If we parameterize \(f_l(u) = l\cdot h(u) + (1-l)\cdot g(u)\) with \(l\in[0,1]\), we obtain a continuous range of best-first search variants \(A_l\), also denoted as weighted A*. For l = 0, we simulate a **[...]** traversal of the problem space; for l = 1, we have **[...]** . Algorithm \(A_{0.5}\) selects nodes in the same order as original A*.

Answer

breadth-first

greedy best-first search

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

Tags

#artificial-intelligence

Question

What is an admissable heuristic function?

Answer

Definition 1.8. (Admissible Heuristic) An estimate h is an admissible heuristic if it is a lower bound for the optimal solution costs; that is, h(u) ≤ δ(u,T) for all u ∈ V

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

Tags

#artificial-intelligence

Question

When is a heuristic function *consistent?*

Answer

A goal estimate h is a consistent heuristic if h(u) ≤ h(v) +w(u,v) for all edges e = (u, v) ∈ E

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

Tags

#artificial-intelligence

Question

When is a heuristic function *monotone?*

Answer

Let (\(u_0\),. . . , \(u_k\)) be any path, g(\(u_i\)) be the path cost of (\(u_0\),. . . , \(u_i\)), and define f(\(u_i\)) = g(\(u_i\)) +h(\(u_i\)). A goal estimate h is a monotone heuristic if f (\(u_j\)) ≥ f (\(u_i\)) for all j > i, 0 ≤ i, j ≤ k; that is, the estimate of the total path cost is nondecreasing from a node to its successors.

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

Tags

#artificial-intelligence

Question

What is the relation of *consistent *and *monotone* heuristic functions?

Answer

Theorem 1.1. (Equivalence of Consistent and Monotone Heuristics) A heuristic is consistent if and only if it is monotone.

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

Tags

#artificial-intelligence

Question

What is the relation between *consistent* and *admissable* heuristics?

Answer

Theorem 1.2. (Consistency and Admissibility) Consistent estimates are admissible.

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

Tags

#artificial-intelligence

Question

Under a consistent heuristic, when is weighted A* search monotone?

Answer

Lemma 6.1. For l ≤ 0.5 and a consistent estimate h, f l is monotone.

(with \(f_l(u) = l\cdot h(u) + (1-l)\cdot g(u)\) for \(l\in[0,1]\))

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

Tags

#artificial-intelligence

Question

Define \(\epsilon\)-optimality of a search algorithm.

Answer

Definition 6.1. ( \(\epsilon\)-Optimality) A search algorithm is \(\epsilon\)-optimal if it terminates with a solution of maximum cost (1 + \(\epsilon\)) ·δ(s,T), with \(\epsilon\) denoting an arbitrary small positive constant (s is the starting state, T is a goal state).

status | not learned | measured difficulty | 37% [default] | last interval [days] | |||
---|---|---|---|---|---|---|---|

repetition number in this series | 0 | memorised on | scheduled repetition | ||||

scheduled repetition interval | last repetition or drill |

#artificial-intelligence #reinforcement-learning

Monte-Carlo evaluation consists in estimating a position by averaging the outcome of several random continuations, and can serve as an evaluation function at the leaves of a min-max tree. This paper presents a new framework to combine tree search with Monte-Carlo eval- uation, that does not separate between a min-max phase and a Monte- Carlo phase. Instead of backing-up the min-max value close to the root, and the average value at some depth, a more general backup operator is defined that progressively changes from averaging to min-max as the number of simulations grows. This approach provides a fine-grained con- trol of the tree growth, at the level of individual simulations, and allows efficient selectivity methods. This algorithm was implemented in a 9 × 9 Go-playing program, Crazy Stone, that won the 10th KGS computer-Go tournament.

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |

status | not read | reprioritisations | ||
---|---|---|---|---|

last reprioritisation on | reading queue position [%] | |||

started reading on | finished reading on |

nte Carlo Tree Search – summary Introduction Monte Carlo Tree Search was introduced by Rémi Coulom in 2006 as a building block of Crazy Stone – Go playing engine with an impressive performance. <span>From a helicopter view Monte Carlo Tree Search has one main purpose: given a game state to choose the most promising next move. Throughout the rest of this post we will try to take a look at the details of Monte Carlo Tree Search to understand what we mean by that. We will also refer back to Alpha Go/Zero from time to time trying to explain what MCTS variant has been used in Deepmind’s AI. Finite Two Person Zero-Sum Sequential Game The framework/environment that Monte Carlo Tree Search operates within is a game which by itself is a very abstract broad term, therefore here