Effective Policy Gradient Search for Reinforcement Learning through NEAT based Feature Extraction


To improve the effectiveness of commonly used Policy Gradient Search (PGS) algorithms for Reinforcement Learning (RL), many existing works considered the importance of extracting useful state features from raw environment inputs. However, these works only studied the feature extraction process, but the learned features have not been demonstrated to improve reinforcement learning performance. In this paper, we consider NeuroEvolution of Augmenting Topology (NEAT) for automated feature extraction, as it can evolve Neural Networks with suitable topologies that can help extract useful features. Following this idea, we develop a new algorithm called NEAT with Regular Actor Critic for Policy Gradient Search, which integrates a popular Actor-Critic PGS algorithm (i.e., Regular Actor-Critic) with NEAT based feature extraction. The algorithm manages to learn useful state features as well as good policies to tackle complex RL problems. The results on benchmark problems confirm that our proposed algorithm is significantly more effective than NEAT in terms of learning performance, and that the learned features by our proposed algorithm on one learning problem can maintain the effectiveness while it is used with RAC on another related learning problem.