Multi-agent off-policy actor-critic reinforcement learning for partially observable environments