Graph exploration for effective multi-agent Q-learning