Total-split algorithm for estimation of mixture model structure and parameters, with generalized internal estimations.


[theta, D] = totalsplit(data, target_num)
[theta, D] = totalsplit(data, target_num, options)
[theta, D, info] = totalsplit(...)
[theta, D, info, options] = totalsplit(...)


This function implements the total-split (split-then-merge) algorithm proposed by Sra et al. (2015), though here we use generalized optimization methods.

[theta, D] = totalsplit(data, target_num) returns the estimated parameters and the mixture distribution D with target_num components, fitted to data.

[theta, D] = totalsplit(data, target_num, options) utilizes applicable options from the options structure in the estimation procedure.

[theta, D, info] = totalsplit(...) also returns info, a structure array containing information about successive iterations performed by iterative estimation functions.

[theta, D, info, options] = totalsplit(...) also returns the effective options used, so you can see what default values the function used on top of the options you possibly specified.

For information about the output theta, see Distribution Parameters Structure. The input argument data is described in Data Input Argument to Functions. You may also want to read about options or info arguments.

Available Options

This function supports all the options described in estimation options. This function accepts the following additional fields in

  • tolCostDiff (default 0) : Minimum decrease in cost to accept a split.
  • numMax (default 2*target_num) : Maximum number of mixture components.
  • componentD (default MVN distribution) : distribution structure defining the mixture component distribution type.

options.inner can be used to set specific options for the inner estimations. To set options only for the partial or full inner estimations use options.inner.partial or options.inner.full respectively.

Returned info fields

The fields present in the returned info structure array, depend on the solver used (options.solver). When a Manopt solver is specified, the info returned by the Manopt solver is returned directly. For the 'default' solver see the documentation of the 'estimatedefault' function for the mixture distribution. You can read more at our documentation on estimation statistics structure.


% generate 1000 random data points
data = randn(1,1000) .* 2 + 1;
% set some options
options.solver = 'cg';
options.verbosity = 2; = 7;
options.inner.partial.maxiter = 10;
% fit mixture model to data
[theta, D] = totalsplit(data, 5, options)


  1. S. Sra, R. Hosseini, L. Theis, and M. Bethge, “Data modeling with the elliptical gamma distribution,” in Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, 2015, pp. 903–911.