Cooperative off-policy prediction of Markov decision processes in adaptive networks