Definition Value function Vvπ:H→[0,1] with: A Policy π In Environment v A Discount Factor y A history a1o1r1…at−1ot−1rt−1=h<t∈H Is defined as: $$V_{v}$\pi(h_{<t}) := \mathbb{E}{v}^{\pi} [ \sum{k=1}^{\infty}y^{k-t}r_{k} | h_{<t}] - The optimal value is defined as $V_{v}^{*}(h_{<t}) := sup_{\pi}V_{v}^{\pi}(h_{<t})$