If I got the idea correctly, one of the main concepts behind the reparametrization trick, first presented in Kingma, D. P., & Welling, M. (2013), Auto-encoding variational bayes (ArXiv Preprint ArXiv:1312.6114), is that we go from sampling a y from a distribution to writing y as a function with respect to some parameters and a random variable, essentially decoupling the parameters we might want to take the gradient with respect to, with the stochastic nature of y.
I found the following trivial example, which I am going to present here. We start from a y which we sample from y~N(μ, σ^2). Since we can't take derivatives of this with respect to μ and σ, we can instead write Y = μ + σ*z, with z~N(z; 0,1). This is now differentiable with respect to the parameters and this is (at least a part of) the reparametrization trick.
Now, I have checked that the expectation value, as well as the standard deviation for both y and Y are equal, but that doesn't mean that they are both distributed in the same way. I think I get that there is a fundamental difference between sampling y from a distribution and writing is as a function of a random variable, but what is it that we want from a function that we can reach using this reparametrization trick, apart from it being differentiable with respect to the desired parameters (here μ and σ)?
Put differently, I could probably write other functions Y=.. that are deterministic in μ and σ and have the same expectation value and standard deviation as y given above, but what makes the function Y a 'good' function to reach using the reparametrication trick?
Aucun commentaire:
Enregistrer un commentaire