With solely exploration, the value would be a pure "mean" of whatever existed. Any finite window of determination involved would lead to finite control of those probabilities - in that case, it would appear that an optimal decision could be to minimize the exploration component to its minimum, non-zero value possible and so that would be like stacking everything in favor of the most beneficial circumstances possible, but allowing the possibility of a random event nonetheless.
I wonder what form such an algorithm could appear as ... how long a period clock can be made with finite resources? That's probably another science in itself.