机器学习：EM算法

概率模型有的时候既含有观测变量，又含有隐变量。如果概率模型的变量都是观测变量，那么通过给定的数据可以通过极大似然估计，或者贝叶斯估计方法。但是当模型含有隐变量的时候，就不能简单地使用这些估计算法。

EM算法的推导

预备知识：Jensen不等式

$f$是定义域为实数的函数，如果对于所有的实数x。如果对于所有的实数x，$f(x)$的二次导数大于等于0，那么f是凸函数。当x是向量时，如果其hessian矩阵H是半正定的，那么f是凸函数。如果只大于0，不等于0，那么称f是严格凸函数。

Jensen不等式表述如下：

如果f是凸函数，X是随机变量，那么：$E[f(X)]>=f(E[X])$
特别地，如果f是严格凸函数，当且仅当X是常量时，上式取等号。

图中，实线f是凸函数，X是随机变量，有0.5的概率是a，有0.5的概率是b。（就像掷硬币一样）。X的期望值就是a和b的中值了，图中可以看到$E[f(X)]>=f(E[X])$成立。

当$f$是（严格）凹函数当且仅当$-f$是（严格）凸函数。
Jensen不等式应用于凹函数时，不等号方向反向。

推导过程

假设我们有一个样本集${x^{(1)},…,x^{(m)}}$，包含m个独立的样本。但每个样本i对应的类别$z(i)$是未知的（相当于聚类），也即隐含变量。故我们需要估计概率模型$p(x,z)$的参数$θ$，但是由于里面包含隐含变量z，所以很难用最大似然求解，但如果$z$知道了，那我们就很容易求解了。
对于参数估计，我们本质上还是想获得一个使似然函数最大化的那个参数$θ$，现在与最大似然不同的只是似然函数式中多了一个未知的变量$z$，见下式（1）。也就是说我们的目标是找到适合的$θ$和$z$让$L(θ)$最大。那我们也许会想，你就是多了一个未知的变量而已啊，我也可以分别对未知的$θ$和$z$分别求偏导，再令其等于0，求解出来不也一样吗？

本质上我们是需要最大化（1）式（对（1）式，我们回忆下联合概率密度下某个变量的边缘概率密度函数的求解，注意这里$z$也是随机变量。对每一个样本$i$的所有可能类别$z$求等式右边的联合概率密度函数和，也就得到等式左边为随机变量x的边缘概率密度），也就是似然函数，但是可以看到里面有“和的对数”，求导后形式会非常复杂（自己可以想象下$log(f1(x)+ f2(x)+ f3(x)+…)$复合函数的求导），所以很难求解得到未知参数$z$和$θ$。
那我们可否对（1）式做一些改变呢？我们看（2）式，（2）式只是分子分母同乘以一个相等的函数，还是有“和的对数”啊，还是求解不了，那为什么要这么做呢？咱们先不管，看（3）式，发现（3）式变成了“对数的和”，那这样求导就容易了。我们注意点，还发现等号变成了不等号，为什么能这么变呢？这就是Jensen不等式的大显神威的地方

公式（2），因为f(x)=log x为凹函数（其二次导数为$-frac{1}{x^2}<0$）。(2)式中
$sum_{z^{(i)}} Q_i(z^{(i)})frac{p (x^{(i)}, z^{(i)}; theta) }{Q_i(z^{(i)})}$是$frac{p (x^{(i)}, z^{(i)}; theta) }{Q_i(z^{(i)})}$的期望（考虑到$E(X)=sum x*p(x)$，f(X)是X的函数，则$E(f(X))=sum f(x)p(x)$），又

所以就可以得到公式（3）的不等式了

现在式（3）就容易地求导了。
但是式（2）和式（3）是不等号啊，式（2）的最大值不是式（3）的最大值啊，而我们想得到式（2）的最大值，那怎么办呢？
现在我们就需要一点想象力了，上面的式（2）和式（3）不等式可以写成：似然函数$L(θ)>=J(z,Q)$，那么我们可以通过不断的最大化这个下界J，来使得L(θ)不断提高，最终达到它的最大值。

见上图，我们固定θ，调整$Q(z)$使下界$J(z,Q)$上升至与L(θ)在此点θ处相等（绿色曲线到蓝色曲线），然后固定Q(z)，调整θ使下界$J(z,Q)$达到最大值（θt到θt+1），然后再固定θ，调整$Q(z)$……直到收敛到似然函数L(θ)的最大值处的θ*。这里有两个问题：

什么时候下界$J(z,Q)$与$L(θ)$在此点θ处相等？
为什么一定会收敛？

When

在Jensen不等式中说到，当自变量X是常数的时候，等式成立。而在这里，即：

c为常数，不依赖于$z^{(i)}$。对此式子做进一步推导，我们知道$sum_{z^{(i)}} Q_i(z^{(i)}) = 1$，那么也就有$sum_{z^{(i)}} p (x^{(i)}, z^{(i)}; theta) = c$，（多个等式分子分母相加不变，这个认为每个样例的两个概率比值都是c），那么就有以下公式：

至此，我们推出了在固定参$theta$后，使下界拉升的$Q(z)$的计算公式就是后验概率，解决了$Q(z)$如何选择的问题。这一步就是E步，建立$L(θ)$的下界。
接下来的$M$步，就是在给定$Q(z)$后，调整$theta$，去极大化$L(θ)$的下界$J$（在固定$Q(z)$后，下界还可以调整的更大）。那么一般的EM算法的步骤如下:

初始化分布参数θ；
迭代

这个不断的迭代，就可以得到使似然函数$L(θ)$最大化的参数$θ$了

Why 会收敛

因为下界不断提高，所以极大似然估计单调增加，那么最终我们会到达最大似然估计的最大值。进一步证明如下：
http://www.cnblogs.com/jerrylead/archive/2011/04/06/2006936.html

OpenCV 中的EM算法的实现

class CV_EXPORTS EMImpl : public EM
{
public:
    // 需要预先设置的超参数
    int nclusters; // default is 5
    int covMatType;  // default is COV_MAT_DIAGONAL
    TermCriteria termCrit; // default is DEFAULT_MAX_ITERS=100

    // all inner matrices have type CV_64FC1
    Mat trainSamples; // trainSamples.rows x trainSamples.cols

    Mat trainProbs; // trainSamples.rows x nclusters
    Mat trainLogLikelihoods; // trainSamples.rows x 1
    Mat trainLabels; // trainSamples.rows x 1
    // 需要训练学习的参数
    Mat weights;
    Mat means;
    std::vector<Mat> covs;

    std::vector<Mat> covsEigenValues;
    std::vector<Mat> covsRotateMats;
    std::vector<Mat> invCovsEigenValues;
    Mat logWeightDivDet; // 1 x nclusters, CV_64FC1

    bool train(const Ptr<TrainData>& data, int)
    {
        Mat samples = data->getTrainSamples(), labels;
        return trainEM(samples, labels, noArray(), noArray());
    }
    bool trainEM(InputArray samples,
               OutputArray logLikelihoods,
               OutputArray labels,
               OutputArray probs)
    {
        Mat samplesMat = samples.getMat();
        setTrainData(START_AUTO_STEP, samplesMat, 0, 0, 0, 0);
        return doTrain(START_AUTO_STEP, logLikelihoods, labels, probs);
    }

    void setTrainData(int startStep, const Mat& samples,
                      const Mat* probs0,
                      const Mat* means0,
                      const std::vector<Mat>* covs0,
                      const Mat* weights0)
    {
        clear();
        // 检查训练数据
        checkTrainData(startStep, samples, nclusters, covMatType, probs0, means0, covs0, weights0);

        bool isKMeansInit = (startStep == START_AUTO_STEP) || (startStep == START_E_STEP && (covs0 == 0 || weights0 == 0));
        // Set checked data
        // 预处理训练样本，将输入的样本samples拷贝+类型转化给模型属性trainSamples
        preprocessSampleData(samples, trainSamples, isKMeansInit ? CV_32FC1 : CV_64FC1, false);

        // 不同的训练状态，需要的数据是不同的

        // set probs
        if(probs0 && startStep == START_M_STEP)
        {
            preprocessSampleData(*probs0, trainProbs, CV_64FC1, true);
            preprocessProbability(trainProbs);
        }

        // set weights
        if(weights0 && (startStep == START_E_STEP && covs0))
        {
            weights0->convertTo(weights, CV_64FC1);
            weights = weights.reshape(1,1);
            preprocessProbability(weights);
        }

        // set means
        if(means0 && (startStep == START_E_STEP/* || startStep == START_AUTO_STEP*/))
            means0->convertTo(means, isKMeansInit ? CV_32FC1 : CV_64FC1);

        // set covs
        if(covs0 && (startStep == START_E_STEP && weights0))
        {
            covs.resize(nclusters);
            for(size_t i = 0; i < covs0->size(); i++)
                (*covs0)[i].convertTo(covs[i], CV_64FC1);
        }
    }

    // 算法关键函数
    // param startStep: 当前训练的阶段
    bool doTrain(int startStep, OutputArray logLikelihoods, OutputArray labels, OutputArray probs)
    {
        int dim = trainSamples.cols;
        // Precompute the empty initial train data in the cases of START_E_STEP and START_AUTO_STEP
        if(startStep != START_M_STEP)
        {
            if(covs.empty())
            {
                CV_Assert(weights.empty());
                clusterTrainSamples();
            }
        }

        if(!covs.empty() && covsEigenValues.empty() )
        {
            CV_Assert(invCovsEigenValues.empty());
            decomposeCovs();
        }

        if(startStep == START_M_STEP)
            mStep();

        double trainLogLikelihood, prevTrainLogLikelihood = 0.;
        int maxIters = (termCrit.type & TermCriteria::MAX_ITER) ?
            termCrit.maxCount : DEFAULT_MAX_ITERS;
        double epsilon = (termCrit.type & TermCriteria::EPS) ? termCrit.epsilon : 0.;

        for(int iter = 0; ; iter++)
        {
            eStep();
            trainLogLikelihood = sum(trainLogLikelihoods)[0];
            // 停止条件1：迭代次数超过一定阈值
            if(iter >= maxIters - 1) break;

            double trainLogLikelihoodDelta = trainLogLikelihood - prevTrainLogLikelihood;

            // 停止条件2：trainLogLikelihoodDelta足够小
            if( iter != 0 &&
                (trainLogLikelihoodDelta < -DBL_EPSILON ||
                 trainLogLikelihoodDelta < epsilon * std::fabs(trainLogLikelihood)))
                break;

            mStep();

            prevTrainLogLikelihood = trainLogLikelihood;
        }

        if( trainLogLikelihood <= -DBL_MAX/10000. )
        {
            clear();
            return false;
        }

        // postprocess covs
        covs.resize(nclusters);
        for(int clusterIndex = 0; clusterIndex < nclusters; clusterIndex++)
        {
            if(covMatType == COV_MAT_SPHERICAL)
            {
                covs[clusterIndex].create(dim, dim, CV_64FC1);
                setIdentity(covs[clusterIndex], Scalar(covsEigenValues[clusterIndex].at<double>(0)));
            }
            else if(covMatType == COV_MAT_DIAGONAL)
            {
                covs[clusterIndex] = Mat::diag(covsEigenValues[clusterIndex]);
            }
        }

        if(labels.needed())
            trainLabels.copyTo(labels);
        if(probs.needed())
            trainProbs.copyTo(probs);
        if(logLikelihoods.needed())
            trainLogLikelihoods.copyTo(logLikelihoods);

        trainSamples.release();
        trainProbs.release();
        trainLabels.release();
        trainLogLikelihoods.release();

        return true;
    }

    void decomposeCovs()
    {
        CV_Assert(!covs.empty());
        covsEigenValues.resize(nclusters);
        if(covMatType == COV_MAT_GENERIC)
            covsRotateMats.resize(nclusters);
        invCovsEigenValues.resize(nclusters);

        for(int clusterIndex = 0; clusterIndex < nclusters; clusterIndex++)
        {
            CV_Assert(!covs[clusterIndex].empty());

            SVD svd(covs[clusterIndex], SVD::MODIFY_A + SVD::FULL_UV);

            if(covMatType == COV_MAT_SPHERICAL)
            {
                double maxSingularVal = svd.w.at<double>(0);
                covsEigenValues[clusterIndex] = Mat(1, 1, CV_64FC1, Scalar(maxSingularVal));
            }
            else if(covMatType == COV_MAT_DIAGONAL)
            {
                covsEigenValues[clusterIndex] = covs[clusterIndex].diag().clone(); //Preserve the original order of eigen values.
            }
            else //COV_MAT_GENERIC
            {
                covsEigenValues[clusterIndex] = svd.w;
                covsRotateMats[clusterIndex] = svd.u;
            }
            max(covsEigenValues[clusterIndex], minEigenValue, covsEigenValues[clusterIndex]);
            invCovsEigenValues[clusterIndex] = 1./covsEigenValues[clusterIndex];
        }
    }

    void eStep()
    {
        // Compute probs_ik from means_k, covs_k and weights_k.
        trainProbs.create(trainSamples.rows, nclusters, CV_64FC1);
        trainLabels.create(trainSamples.rows, 1, CV_32SC1);
        trainLogLikelihoods.create(trainSamples.rows, 1, CV_64FC1);

        computeLogWeightDivDet();

        for(int sampleIndex = 0; sampleIndex < trainSamples.rows; sampleIndex++)
        {
            Mat sampleProbs = trainProbs.row(sampleIndex);
            Vec2d res = computeProbabilities(trainSamples.row(sampleIndex), &sampleProbs, CV_64F);
            trainLogLikelihoods.at<double>(sampleIndex) = res[0];
            trainLabels.at<int>(sampleIndex) = static_cast<int>(res[1]);
        }
    }

    void mStep()
    {
        // Update means_k, covs_k and weights_k from probs_ik
        int dim = trainSamples.cols;

        // Update weights
        // not normalized first
        reduce(trainProbs, weights, 0, CV_REDUCE_SUM);

        // Update means
        means.create(nclusters, dim, CV_64FC1);
        means = Scalar(0);

        const double minPosWeight = trainSamples.rows * DBL_EPSILON;
        double minWeight = DBL_MAX;
        int minWeightClusterIndex = -1;
        for(int clusterIndex = 0; clusterIndex < nclusters; clusterIndex++)
        {
            if(weights.at<double>(clusterIndex) <= minPosWeight)
                continue;

            if(weights.at<double>(clusterIndex) < minWeight)
            {
                minWeight = weights.at<double>(clusterIndex);
                minWeightClusterIndex = clusterIndex;
            }

            Mat clusterMean = means.row(clusterIndex);
            for(int sampleIndex = 0; sampleIndex < trainSamples.rows; sampleIndex++)
                clusterMean += trainProbs.at<double>(sampleIndex, clusterIndex) * trainSamples.row(sampleIndex);
            clusterMean /= weights.at<double>(clusterIndex);
        }

        // Update covsEigenValues and invCovsEigenValues
        covs.resize(nclusters);
        covsEigenValues.resize(nclusters);
        if(covMatType == COV_MAT_GENERIC)
            covsRotateMats.resize(nclusters);
        invCovsEigenValues.resize(nclusters);
        for(int clusterIndex = 0; clusterIndex < nclusters; clusterIndex++)
        {
            if(weights.at<double>(clusterIndex) <= minPosWeight)
                continue;

            if(covMatType != COV_MAT_SPHERICAL)
                covsEigenValues[clusterIndex].create(1, dim, CV_64FC1);
            else
                covsEigenValues[clusterIndex].create(1, 1, CV_64FC1);

            if(covMatType == COV_MAT_GENERIC)
                covs[clusterIndex].create(dim, dim, CV_64FC1);

            Mat clusterCov = covMatType != COV_MAT_GENERIC ?
                covsEigenValues[clusterIndex] : covs[clusterIndex];

            clusterCov = Scalar(0);

            Mat centeredSample;
            for(int sampleIndex = 0; sampleIndex < trainSamples.rows; sampleIndex++)
            {
                centeredSample = trainSamples.row(sampleIndex) - means.row(clusterIndex);

                if(covMatType == COV_MAT_GENERIC)
                    clusterCov += trainProbs.at<double>(sampleIndex, clusterIndex) * centeredSample.t() * centeredSample;
                else
                {
                    double p = trainProbs.at<double>(sampleIndex, clusterIndex);
                    for(int di = 0; di < dim; di++ )
                    {
                        double val = centeredSample.at<double>(di);
                        clusterCov.at<double>(covMatType != COV_MAT_SPHERICAL ? di : 0) += p*val*val;
                    }
                }
            }

            if(covMatType == COV_MAT_SPHERICAL)
                clusterCov /= dim;

            clusterCov /= weights.at<double>(clusterIndex);

            // Update covsRotateMats for COV_MAT_GENERIC only
            if(covMatType == COV_MAT_GENERIC)
            {
                SVD svd(covs[clusterIndex], SVD::MODIFY_A + SVD::FULL_UV);
                covsEigenValues[clusterIndex] = svd.w;
                covsRotateMats[clusterIndex] = svd.u;
            }

            max(covsEigenValues[clusterIndex], minEigenValue, covsEigenValues[clusterIndex]);

            // update invCovsEigenValues
            invCovsEigenValues[clusterIndex] = 1./covsEigenValues[clusterIndex];
        }

        for(int clusterIndex = 0; clusterIndex < nclusters; clusterIndex++)
        {
            if(weights.at<double>(clusterIndex) <= minPosWeight)
            {
                Mat clusterMean = means.row(clusterIndex);
                means.row(minWeightClusterIndex).copyTo(clusterMean);
                covs[minWeightClusterIndex].copyTo(covs[clusterIndex]);
                covsEigenValues[minWeightClusterIndex].copyTo(covsEigenValues[clusterIndex]);
                if(covMatType == COV_MAT_GENERIC)
                    covsRotateMats[minWeightClusterIndex].copyTo(covsRotateMats[clusterIndex]);
                invCovsEigenValues[minWeightClusterIndex].copyTo(invCovsEigenValues[clusterIndex]);
            }
        }

        // Normalize weights
        weights /= trainSamples.rows;
    }

namespace cv
{
typedef CvStatModel StatModel;
typedef CvKNearest KNearest;
...
typedef CvMLData TrainData;
}

class CV_EXPORTS_W EM : public StatModel
{
public:
    //! 协方差矩阵的类型
    enum Types {
        /** A scaled identity matrix f$mu_k * If$. There is the only
        parameter f$mu_kf$ to be estimated for each matrix. The option may be used in special cases,
        when the constraint is relevant, or as a first step in the optimization (for example in case
        when the data is preprocessed with PCA). The results of such preliminary estimation may be
        passed again to the optimization procedure, this time with
        covMatType=EM::COV_MAT_DIAGONAL. */
        COV_MAT_SPHERICAL=0,
        /** A diagonal matrix with positive diagonal elements. The number of
        free parameters is d for each matrix. This is most commonly used option yielding good
        estimation results. */
        // 正的对角军阵元素，好的评估结果的常用选项；也是训练的协方差矩阵的默认类型
        COV_MAT_DIAGONAL=1,
        /** A symmetric positively defined matrix(对称正定矩阵). The number of free
        parameters in each matrix is about f$d^2/2f$. It is not recommended to use this option, unless
        there is pretty accurate initial estimation of the parameters and/or a huge number of
        training samples. */
        COV_MAT_GENERIC=2, // 一般不用这个选项，除非有较好的初始化参数或者较大的训练集合
        COV_MAT_DEFAULT=COV_MAT_DIAGONAL
    };

    //! 默认参数： 默认簇的个数5， 最大迭代100
    enum {DEFAULT_NCLUSTERS=5, DEFAULT_MAX_ITERS=100};

    //! The initial step
    enum {START_E_STEP=1, START_M_STEP=2, START_AUTO_STEP=0};

    /** The number of mixture components in the Gaussian mixture model.
    Default value of the parameter is EM::DEFAULT_NCLUSTERS=5. Some of %EM implementation could
    determine the optimal number of mixtures within a specified value range, but that is not the
    case in ML yet. */
    /** @see setClustersNumber */
    CV_WRAP virtual int getClustersNumber() const = 0;
    /** @copybrief getClustersNumber @see getClustersNumber */
    CV_WRAP virtual void setClustersNumber(int val) = 0;

    /** Constraint on covariance matrices which defines type of matrices.
    See EM::Types. */
    /** @see setCovarianceMatrixType */
    CV_WRAP virtual int getCovarianceMatrixType() const = 0;
    /** @copybrief getCovarianceMatrixType @see getCovarianceMatrixType */
    CV_WRAP virtual void setCovarianceMatrixType(int val) = 0;

    /** The termination criteria of the %EM algorithm.
    The %EM algorithm can be terminated by the number of iterations termCrit.maxCount (number of
    M-steps) or when relative change of likelihood logarithm is less than termCrit.epsilon. Default
    maximum number of iterations is EM::DEFAULT_MAX_ITERS=100. */
    /** @see setTermCriteria */
    CV_WRAP virtual TermCriteria getTermCriteria() const = 0;
    /** @copybrief getTermCriteria @see getTermCriteria */
    CV_WRAP virtual void setTermCriteria(const TermCriteria &val) = 0;

    /** @brief Returns weights of the mixtures

    Returns vector with the number of elements equal to the number of mixtures.
     */
    CV_WRAP virtual Mat getWeights() const = 0;
    /** @brief Returns the cluster centers (means of the Gaussian mixture)

    Returns matrix with the number of rows equal to the number of mixtures and number of columns
    equal to the space dimensionality.
     */
    CV_WRAP virtual Mat getMeans() const = 0;
    /** @brief Returns covariation matrices

    Returns vector of covariation matrices. Number of matrices is the number of gaussian mixtures,
    each matrix is a square floating-point matrix NxN, where N is the space dimensionality.
     */
    CV_WRAP virtual void getCovs(CV_OUT std::vector<Mat>& covs) const = 0;

    /** @brief Returns posterior probabilities for the provided samples

    @param samples The input samples, floating-point matrix
    @param results The optional output f$ nSamples times nClustersf$ matrix of results. It contains
    posterior probabilities for each sample from the input
    @param flags This parameter will be ignored
     */
    CV_WRAP virtual float predict( InputArray samples, OutputArray results=noArray(), int flags=0 ) const = 0;

    /** @brief Returns a likelihood logarithm value and an index of the most probable mixture component
    for the given sample.

    @param sample A sample for classification. It should be a one-channel matrix of
        f$1 times dimsf$ or f$dims times 1f$ size.
    @param probs Optional output matrix that contains posterior probabilities of each component
        given the sample. It has f$1 times nclustersf$ size and CV_64FC1 type.

    The method returns a two-element double vector. Zero element is a likelihood logarithm value for
    the sample. First element is an index of the most probable mixture component for the given
    sample.
     */
    CV_WRAP virtual Vec2d predict2(InputArray sample, OutputArray probs) const = 0;

    /** @brief Estimate the Gaussian mixture parameters from a samples set.

    This variation starts with Expectation step. Initial values of the model parameters will be
    estimated by the k-means algorithm.

    Unlike many of the ML models, %EM is an unsupervised learning algorithm and it does not take
    responses (class labels or function values) as input. Instead, it computes the *Maximum
    Likelihood Estimate* of the Gaussian mixture parameters from an input sample set, stores all the
    parameters inside the structure: f$p_{i,k}f$ in probs, f$a_kf$ in means , f$S_kf$ in
    covs[k], f$pi_kf$ in weights , and optionally computes the output "class label" for each
    sample: f$texttt{labels}_i=texttt{arg max}_k(p_{i,k}), i=1..Nf$ (indices of the most
    probable mixture component for each sample).

    The trained model can be used further for prediction, just like any other classifier. The
    trained model is similar to the NormalBayesClassifier.

    @param samples Samples from which the Gaussian mixture model will be estimated. It should be a
        one-channel matrix, each row of which is a sample. If the matrix does not have CV_64F type
        it will be converted to the inner matrix of such type for the further computing.
    @param logLikelihoods The optional output matrix that contains a likelihood logarithm value for
        each sample. It has f$nsamples times 1f$ size and CV_64FC1 type.
    @param labels The optional output "class label" for each sample:
        f$texttt{labels}_i=texttt{arg max}_k(p_{i,k}), i=1..Nf$ (indices of the most probable
        mixture component for each sample). It has f$nsamples times 1f$ size and CV_32SC1 type.
    @param probs The optional output matrix that contains posterior probabilities of each Gaussian
        mixture component given the each sample. It has f$nsamples times nclustersf$ size and
        CV_64FC1 type.
     */
    CV_WRAP virtual bool trainEM(InputArray samples,
                         OutputArray logLikelihoods=noArray(),
                         OutputArray labels=noArray(),
                         OutputArray probs=noArray()) = 0;

    /** @brief Estimate the Gaussian mixture parameters from a samples set.

    This variation starts with Expectation step. You need to provide initial means f$a_kf$ of
    mixture components. Optionally you can pass initial weights f$pi_kf$ and covariance matrices
    f$S_kf$ of mixture components.

    @param samples Samples from which the Gaussian mixture model will be estimated. It should be a
        one-channel matrix, each row of which is a sample. If the matrix does not have CV_64F type
        it will be converted to the inner matrix of such type for the further computing.
    @param means0 Initial means f$a_kf$ of mixture components. It is a one-channel matrix of
        f$nclusters times dimsf$ size. If the matrix does not have CV_64F type it will be
        converted to the inner matrix of such type for the further computing.
    @param covs0 The vector of initial covariance matrices f$S_kf$ of mixture components. Each of
        covariance matrices is a one-channel matrix of f$dims times dimsf$ size. If the matrices
        do not have CV_64F type they will be converted to the inner matrices of such type for the
        further computing.
    @param weights0 Initial weights f$pi_kf$ of mixture components. It should be a one-channel
        floating-point matrix with f$1 times nclustersf$ or f$nclusters times 1f$ size.
    @param logLikelihoods The optional output matrix that contains a likelihood logarithm value for
        each sample. It has f$nsamples times 1f$ size and CV_64FC1 type.
    @param labels The optional output "class label" for each sample:
        f$texttt{labels}_i=texttt{arg max}_k(p_{i,k}), i=1..Nf$ (indices of the most probable
        mixture component for each sample). It has f$nsamples times 1f$ size and CV_32SC1 type.
    @param probs The optional output matrix that contains posterior probabilities of each Gaussian
        mixture component given the each sample. It has f$nsamples times nclustersf$ size and
        CV_64FC1 type.
    */
    CV_WRAP virtual bool trainE(InputArray samples, InputArray means0,
                        InputArray covs0=noArray(),
                        InputArray weights0=noArray(),
                        OutputArray logLikelihoods=noArray(),
                        OutputArray labels=noArray(),
                        OutputArray probs=noArray()) = 0;

    /** @brief Estimate the Gaussian mixture parameters from a samples set.

    This variation starts with Maximization step. You need to provide initial probabilities
    f$p_{i,k}f$ to use this option.

    @param samples Samples from which the Gaussian mixture model will be estimated. It should be a
        one-channel matrix, each row of which is a sample. If the matrix does not have CV_64F type
        it will be converted to the inner matrix of such type for the further computing.
    @param probs0
    @param logLikelihoods The optional output matrix that contains a likelihood logarithm value for
        each sample. It has f$nsamples times 1f$ size and CV_64FC1 type.
    @param labels The optional output "class label" for each sample:
        f$texttt{labels}_i=texttt{arg max}_k(p_{i,k}), i=1..Nf$ (indices of the most probable
        mixture component for each sample). It has f$nsamples times 1f$ size and CV_32SC1 type.
    @param probs The optional output matrix that contains posterior probabilities of each Gaussian
        mixture component given the each sample. It has f$nsamples times nclustersf$ size and
        CV_64FC1 type.
    */
    CV_WRAP virtual bool trainM(InputArray samples, InputArray probs0,
                        OutputArray logLikelihoods=noArray(),
                        OutputArray labels=noArray(),
                        OutputArray probs=noArray()) = 0;

    /** Creates empty %EM model.
    The model should be trained then using StatModel::train(traindata, flags) method. Alternatively, you
    can use one of the EM::train* methods or load it from file using Algorithm::load<EM>(filename).
     */
    CV_WRAP static Ptr<EM> create();

    /** @brief Loads and creates a serialized EM from a file
     *
     * Use EM::save to serialize and store an EM to disk.
     * Load the EM from this file again, by calling this function with the path to the file.
     * Optionally specify the node for the file containing the classifier
     *
     * @param filepath path to serialized EM
     * @param nodeName name of node containing the classifier
     */
    CV_WRAP static Ptr<EM> load(const String& filepath , const String& nodeName = String());
};

参考链接

原文:大专栏机器学习：EM算法

原文地址：https://www.cnblogs.com/chinatrump/p/11597056.html

时间： 2024-10-01 06:18:48

机器学习：EM算法

EM算法的推导

预备知识：Jensen不等式

推导过程

When

Why 会收敛

OpenCV 中的EM算法的实现

参考链接

机器学习：EM算法的相关文章

机器学习---EM算法(分类)

实战EM算法与图像分割

机器学习中的EM算法详解及R语言实例（1）

【机器学习】EM算法详细推导和讲解

简单易学的机器学习算法——EM算法

opencv3中的机器学习算法之：EM算法

机器学习笔记（十）EM算法及实践（以混合高斯模型（GMM）为例来次完整的EM）

猪猪的机器学习笔记（十四）EM算法

关于机器学习－EM算法新解

机器学习十大算法之一：EM算法