Multi-view reinforcement learning for sequential decision-making with insufficient state information. :: Library search

Academic Journal

View in EDS

Multi-view reinforcement learning for sequential decision-making with insufficient state information.

Bibliographic Details
Title:	Multi-view reinforcement learning for sequential decision-making with insufficient state information.
Authors:	Li, Min, Zhu, William, Wang, Shiping
Source:	International Journal of Machine Learning & Cybernetics; Apr2024, Vol. 15 Issue 4, p1533-1552, 20p
Abstract:	Most reinforcement learning methods describe sequential decision-making as a Markov decision process where the effect of action is only decided by the current state. But this is reasonable only if the state is correctly defined and the state information is sufficiently observed. Thus the learning efficiency of reinforcement learning methods based on Markov decision process is limited when the state information is insufficient. Partially observable Markov decision process and history-based decision process are respectively proposed to describe sequential decision-making with insufficient state information. However, these two processes are easy to ignore the important information from the current observed state. Therefore, the learning efficiency of reinforcement learning methods based on these two processes is also limited when the state information is insufficient. In this paper, we propose a multi-view reinforcement learning method to solve this problem. The motivation is that the interaction information between the agent and its environment should be considered from the views of history, present, and future to overcome the insufficiency of state information. Based on these views, we construct a multi-view decision process to describe sequential decision-making with insufficient state information. A multi-view reinforcement learning method is proposed by combining the multi-view decision process and the actor-critic framework. In the proposed method, multi-view clustering is performed to ensure that each type of sample can be sufficiently exploited. Experiments illustrate that the proposed method is more effective than the compared state-of-the-arts. The source code can be downloaded from https://github.com/jamieliuestc/MVRL. [ABSTRACT FROM AUTHOR]
	Copyright of International Journal of Machine Learning & Cybernetics is the property of Springer Nature and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
ISSN:	18688071
DOI:	10.1007/s13042-023-01981-9
Database:	Complementary Index

Availability

Full text via SpringerLink
Check for other full text
Open access version