The family of linear recurrent neural networks (RNNs) has shown strong performance as recurrent memory units in partially observable reinforcement learning. To better understand the validity of such structures, we simplify the setting to a hidden Markov model (HMM) and investigate two instantiations in the logit space: (i) one that exactly reproduces the pre–softmax logits of the belief vector under a deterministic transition matrix, thereby serving as a sufficient statistic for optimal policy learning, and (ii) another that, under a nearly-deterministic transition matrix, achieves vanishing state-decoding error similar to the belief-based decoder, thus reducing state ambiguity
to near zero. In both cases, the hidden size is fixed to match the number of environment states. We verify our main theoretical results through numerical experiments and further show that the constructed instantiation serves as a strong learning target in a practical reinforcement learning card game.