Unsupervised learning is more desirable because not only the labeling cost is high but also the labels annotated by human may be inconsistent and insufficient  Kim and Mnih
proposed an unsupervised factor VAE, which enhanced disentanglement over [beta]-VAE by introducing the total correlation penalty .
, "Bayesian probabilistic matrix factorization using markov chain Monte Carlo," in Proceedings of the 25th International Conference on Machine Learning, pp.
 Salakhutdinov R, Mnih
A, "Bayesian probabilistic matrix factorization using markov chain monte carlo," in Proc.
The platform is used to compare, among others, approaches such as RL (see, for example, Mnih
et al ), model learning, model-based planning, imitation learning, and transfer learning.
The approach is proposed by Salakhutdinov and Mnih
. They use the user rating matrix and the probability density function of the Gaussian distribution to factorize it into users and items feature specific matrices, respectively.
Deep reinforcement learning (DRL)  first was proposed by Mnih
et al., which used deep neural networks to capture and infer hidden states, but this method still apply to MDP.
Although the platform has led to some high-profile success stories, including the much-publicized Deep Q-Networks of Volodymyr Mnih
and colleagues, it still offers many unsolved challenges.