Construct a softmax gate distribution structure
D = softmaxfactory(num, datadim) D = softmaxfactory(num)
D = softmaxfactory(num, datadim) returns a structure representing a softmax gate distribution. num is the number of factors. datadim is the dimensionality of the input space.
D = softmaxfactory(num) is the same as above with datadim = 1.
- W (num-by-datadim matrix) : A matrix containing weights corresponding to each distribution.
Discrete Probability Density Function Conditioned on Input Space
The distribution has the following density:
wherein is the number of components, are parameters, and is the log-likelihood of the kth conditional distribution.
Number of components (excluding any fixed components)
num = D.num()
The same as distribution structure common members with the difference that here the log-likelihood of all discrete output values is computed as a function of the input vector.
See distribution structure common members. Here the gradient of the parameters are given by
When using EM algorithm for MoE, M-step estimation for gates is needed The cost function here is convex and therefore it converges rapidly