In the Latest Machine Learning Research, UC Berkeley Researchers Propose an Efficient, Expressive, Multimodal Parameterization Called Adaptive Categorical Discretization (ADACAT) for Autoregressive Models

On Aug 20, 2022

Autoregressive generative models can estimate complex continuous data distributions such as trajectory rollouts in an RL environment, image intensities, and audio. Traditional techniques discretize continuous data into various bins and approximate the continuous data distribution using categorical distributions over the bins. This approximation is parameter inefficient as it cannot express abrupt changes in density without using a significant number of additional bins. Adaptive Categorical Discretization (ADACAT) is proposed in this paper as a parameterization of 1-D conditionals that is expressive, parameter efficient, and multimodal. A vector of interval widths and masses is used to parameterize the distribution known as ADACAT. Figure 1 showcases the difference between the traditional uniform categorical discretization approach with the proposed ADACAT.

Source: https://arxiv.org/pdf/2208.02246v1.pdf

Each component of the ADACAT distribution has non-overlapping support, making it a specific subfamily of mixtures of uniform distributions. ADACAT generalizes uniformly discretized 1-D categorical distributions. The proposed architecture allows for variable bin widths and more closely approximates the modes of two Gaussians mixture than a uniformly discretized categorical, making it highly expressive than the latter. Additionally, a distribution’s support is discretized using quantile-based discretization, which bins data into groups with similar measured data points. ADACAT uses deep autoregressive frameworks to factorize the joint density into numerous 1-D conditional ADACAT distributions in problems with more than one dimension.

This research uses deep autoregressive models to factor the joint density into numerous 1-D conditional ADACAT distributions in challenges with dimensions greater than 1. The employed autoregressive model can choose the discretization method for each dimension’s conditional distribution based on observations. To estimate autoregressive models with ADACAT conditionals, the Kullback-Leibler (KL)-divergence is minimized between the target distribution and the desired density.

The target smoothed loss is intended to regularize the conditional output distribution, and perturbations on the output space are used. Additionally, the generative model is constrained by unperturbed and clean observations. The proposed work finds that bin collapse happens even with continuous data like the mixtures and that single-sample to estimate expectation doesn’t stop the collapse.

The proposed approach is evaluated using standard parameters such as GAS, MINIBOONE, POWER, and HEPMASS. When dealing with tabular data obtained from UCI datasets, ADACAT invariably produces better results than the uniform baseline approaches. Regarding the estimation of image density on the grayscale MNIST dataset, ADACAT has superior performance on almost all parameter counts, except for uniform discretization at 256 parameters. The prime benefit of ADACAT in speech synthesis is that it captures most of their results without using inductive bias from humans. Model-based offline reinforcement learning modifies the architecture by adding a linear layer instead of the one-hot embedding layer because continuous inputs are more informative. In addition, there is a reduction of the total number of bins by a factor of two to match the parameter size of the output layer.

Hence, in this paper, ADACAT, a multimodal parameterization of 1-D conditionals, is proposed that is both efficient and flexible and can be used in autoregressive models. ADACAT does better than uniform discretization strategies in all situations, including tabular data from the real world, speech synthesis, generating images, and offline reinforcement learning. The results also imply that the discontinuity in the model conditional results from analytical target smoothing rather than a data characteristic. ADACAT has a few limitations, such as the inability to model additional modes and discontinuous parameterization. The proposed framework outperforms uniform discretization regarding downstream task performance and density estimation.

This Article is written as a research summary article by Marktechpost Staff based on the research paper 'ADACAT: Adaptive Categorical Discretization for Autoregressive Models'. All Credit For This Research Goes To Researchers on This Project. Check out the paper.

Please Don't Forget To Join Our ML Subreddit

Priyanka Israni is currently pursuing PhD at Gujarat Technological University, Ahmedabad, India. Her interest area lies in medical image processing, machine learning, deep learning, data analysis and computer vision. She has 8 years of teaching experience to engineering graduates and postgraduates.

Credit: Source link