机器学习:EM算法

概率模型有的时候既含有观测变量,又含有隐变量。如果概率模型的变量都是观测变量,那么通过给定的数据可以通过极大似然估计,或者贝叶斯估计方法。但是当模型含有隐变量的时候,就不能简单地使用这些估计算法。

EM算法的推导

预备知识:Jensen不等式

$f$是定义域为实数的函数,如果对于所有的实数x。如果对于所有的实数x,$f(x)$的二次导数大于等于0,那么f是凸函数。当x是向量时,如果其hessian矩阵H是半正定的,那么f是凸函数。如果只大于0,不等于0,那么称f是严格凸函数。

Jensen不等式表述如下:

如果f是凸函数,X是随机变量,那么:$E[f(X)]>=f(E[X])$
特别地,如果f是严格凸函数,当且仅当X是常量时,上式取等号。

图中,实线f是凸函数,X是随机变量,有0.5的概率是a,有0.5的概率是b。(就像掷硬币一样)。X的期望值就是a和b的中值了,图中可以看到$E[f(X)]>=f(E[X])$成立。

当$f$是(严格)凹函数当且仅当$-f$是(严格)凸函数。
Jensen不等式应用于凹函数时,不等号方向反向。

推导过程

假设我们有一个样本集${x^{(1)},…,x^{(m)}}$,包含m个独立的样本。但每个样本i对应的类别$z(i)$是未知的(相当于聚类),也即隐含变量。故我们需要估计概率模型$p(x,z)$的参数$θ$,但是由于里面包含隐含变量z,所以很难用最大似然求解,但如果$z$知道了,那我们就很容易求解了。
对于参数估计,我们本质上还是想获得一个使似然函数最大化的那个参数$θ$,现在与最大似然不同的只是似然函数式中多了一个未知的变量$z$,见下式(1)。也就是说我们的目标是找到适合的$θ$和$z$让$L(θ)$最大。那我们也许会想,你就是多了一个未知的变量而已啊,我也可以分别对未知的$θ$和$z$分别求偏导,再令其等于0,求解出来不也一样吗?

本质上我们是需要最大化(1)式(对(1)式,我们回忆下联合概率密度下某个变量的边缘概率密度函数的求解,注意这里$z$也是随机变量。对每一个样本$i$的所有可能类别$z$求等式右边的联合概率密度函数和,也就得到等式左边为随机变量x的边缘概率密度),也就是似然函数,但是可以看到里面有“和的对数”,求导后形式会非常复杂(自己可以想象下$log(f1(x)+ f2(x)+ f3(x)+…)$复合函数的求导),所以很难求解得到未知参数$z$和$θ$。
那我们可否对(1)式做一些改变呢?我们看(2)式,(2)式只是分子分母同乘以一个相等的函数,还是有“和的对数”啊,还是求解不了,那为什么要这么做呢?咱们先不管,看(3)式,发现(3)式变成了“对数的和”,那这样求导就容易了。我们注意点,还发现等号变成了不等号,为什么能这么变呢?这就是Jensen不等式的大显神威的地方

公式(2),因为f(x)=log x为凹函数(其二次导数为$-frac{1}{x^2}<0$)。(2)式中
$sum_{z^{(i)}} Q_i(z^{(i)})frac{p (x^{(i)}, z^{(i)}; theta) }{Q_i(z^{(i)})}$是$frac{p (x^{(i)}, z^{(i)}; theta) }{Q_i(z^{(i)})}$的期望(考虑到$E(X)=sum x*p(x)$,f(X)是X的函数,则$E(f(X))=sum f(x)p(x)$),又

所以就可以得到公式(3)的不等式了

现在式(3)就容易地求导了。
但是式(2)和式(3)是不等号啊,式(2)的最大值不是式(3)的最大值啊,而我们想得到式(2)的最大值,那怎么办呢?
现在我们就需要一点想象力了,上面的式(2)和式(3)不等式可以写成:似然函数$L(θ)>=J(z,Q)$,那么我们可以通过不断的最大化这个下界J,来使得L(θ)不断提高,最终达到它的最大值。

见上图,我们固定θ,调整$Q(z)$使下界$J(z,Q)$上升至与L(θ)在此点θ处相等(绿色曲线到蓝色曲线),然后固定Q(z),调整θ使下界$J(z,Q)$达到最大值(θt到θt+1),然后再固定θ,调整$Q(z)$……直到收敛到似然函数L(θ)的最大值处的θ*。这里有两个问题:

  1. 什么时候下界$J(z,Q)$与$L(θ)$在此点θ处相等?
  2. 为什么一定会收敛?

When

在Jensen不等式中说到,当自变量X是常数的时候,等式成立。而在这里,即:

c为常数,不依赖于$z^{(i)}$。对此式子做进一步推导,我们知道$sum_{z^{(i)}} Q_i(z^{(i)}) = 1$,那么也就有$sum_{z^{(i)}} p (x^{(i)}, z^{(i)}; theta) = c$,(多个等式分子分母相加不变,这个认为每个样例的两个概率比值都是c),那么就有以下公式:

至此,我们推出了在固定参$theta$后,使下界拉升的$Q(z)$的计算公式就是后验概率,解决了$Q(z)$如何选择的问题。这一步就是E步,建立$L(θ)$的下界。
接下来的$M$步,就是在给定$Q(z)$后,调整$theta$,去极大化$L(θ)$的下界$J$(在固定$Q(z)$后,下界还可以调整的更大)。那么一般的EM算法的步骤如下:

  1. 初始化分布参数θ;
  2. 迭代

    这个不断的迭代,就可以得到使似然函数$L(θ)$最大化的参数$θ$了

Why 会收敛

因为下界不断提高,所以极大似然估计单调增加,那么最终我们会到达最大似然估计的最大值。进一步证明如下:
http://www.cnblogs.com/jerrylead/archive/2011/04/06/2006936.html

OpenCV 中的EM算法的实现

class CV_EXPORTS EMImpl : public EM
{
public:
    // 需要预先设置的超参数
    int nclusters; // default is 5
    int covMatType;  // default is COV_MAT_DIAGONAL
    TermCriteria termCrit; // default is DEFAULT_MAX_ITERS=100

    // all inner matrices have type CV_64FC1
    Mat trainSamples; // trainSamples.rows x trainSamples.cols

    Mat trainProbs; // trainSamples.rows x nclusters
    Mat trainLogLikelihoods; // trainSamples.rows x 1
    Mat trainLabels; // trainSamples.rows x 1
    // 需要训练学习的参数
    Mat weights;
    Mat means;
    std::vector<Mat> covs;

    std::vector<Mat> covsEigenValues;
    std::vector<Mat> covsRotateMats;
    std::vector<Mat> invCovsEigenValues;
    Mat logWeightDivDet; // 1 x nclusters, CV_64FC1

    bool train(const Ptr<TrainData>& data, int)
    {
        Mat samples = data->getTrainSamples(), labels;
        return trainEM(samples, labels, noArray(), noArray());
    }
    bool trainEM(InputArray samples,
               OutputArray logLikelihoods,
               OutputArray labels,
               OutputArray probs)
    {
        Mat samplesMat = samples.getMat();
        setTrainData(START_AUTO_STEP, samplesMat, 0, 0, 0, 0);
        return doTrain(START_AUTO_STEP, logLikelihoods, labels, probs);
    }

    void setTrainData(int startStep, const Mat& samples,
                      const Mat* probs0,
                      const Mat* means0,
                      const std::vector<Mat>* covs0,
                      const Mat* weights0)
    {
        clear();
        // 检查训练数据
        checkTrainData(startStep, samples, nclusters, covMatType, probs0, means0, covs0, weights0);

        bool isKMeansInit = (startStep == START_AUTO_STEP) || (startStep == START_E_STEP && (covs0 == 0 || weights0 == 0));
        // Set checked data
        // 预处理训练样本,将输入的样本samples拷贝+类型转化给模型属性trainSamples
        preprocessSampleData(samples, trainSamples, isKMeansInit ? CV_32FC1 : CV_64FC1, false);

        // 不同的训练状态,需要的数据是不同的

        // set probs
        if(probs0 && startStep == START_M_STEP)
        {
            preprocessSampleData(*probs0, trainProbs, CV_64FC1, true);
            preprocessProbability(trainProbs);
        }

        // set weights
        if(weights0 && (startStep == START_E_STEP && covs0))
        {
            weights0->convertTo(weights, CV_64FC1);
            weights = weights.reshape(1,1);
            preprocessProbability(weights);
        }

        // set means
        if(means0 && (startStep == START_E_STEP/* || startStep == START_AUTO_STEP*/))
            means0->convertTo(means, isKMeansInit ? CV_32FC1 : CV_64FC1);

        // set covs
        if(covs0 && (startStep == START_E_STEP && weights0))
        {
            covs.resize(nclusters);
            for(size_t i = 0; i < covs0->size(); i++)
                (*covs0)[i].convertTo(covs[i], CV_64FC1);
        }
    }

    // 算法关键函数
    // param startStep: 当前训练的阶段
    bool doTrain(int startStep, OutputArray logLikelihoods, OutputArray labels, OutputArray probs)
    {
        int dim = trainSamples.cols;
        // Precompute the empty initial train data in the cases of START_E_STEP and START_AUTO_STEP
        if(startStep != START_M_STEP)
        {
            if(covs.empty())
            {
                CV_Assert(weights.empty());
                clusterTrainSamples();
            }
        }

        if(!covs.empty() && covsEigenValues.empty() )
        {
            CV_Assert(invCovsEigenValues.empty());
            decomposeCovs();
        }

        if(startStep == START_M_STEP)
            mStep();

        double trainLogLikelihood, prevTrainLogLikelihood = 0.;
        int maxIters = (termCrit.type & TermCriteria::MAX_ITER) ?
            termCrit.maxCount : DEFAULT_MAX_ITERS;
        double epsilon = (termCrit.type & TermCriteria::EPS) ? termCrit.epsilon : 0.;

        for(int iter = 0; ; iter++)
        {
            eStep();
            trainLogLikelihood = sum(trainLogLikelihoods)[0];
            // 停止条件1:迭代次数超过一定阈值
            if(iter >= maxIters - 1) break;

            double trainLogLikelihoodDelta = trainLogLikelihood - prevTrainLogLikelihood;

            // 停止条件2:trainLogLikelihoodDelta足够小
            if( iter != 0 &&
                (trainLogLikelihoodDelta < -DBL_EPSILON ||
                 trainLogLikelihoodDelta < epsilon * std::fabs(trainLogLikelihood)))
                break;

            mStep();

            prevTrainLogLikelihood = trainLogLikelihood;
        }

        if( trainLogLikelihood <= -DBL_MAX/10000. )
        {
            clear();
            return false;
        }

        // postprocess covs
        covs.resize(nclusters);
        for(int clusterIndex = 0; clusterIndex < nclusters; clusterIndex++)
        {
            if(covMatType == COV_MAT_SPHERICAL)
            {
                covs[clusterIndex].create(dim, dim, CV_64FC1);
                setIdentity(covs[clusterIndex], Scalar(covsEigenValues[clusterIndex].at<double>(0)));
            }
            else if(covMatType == COV_MAT_DIAGONAL)
            {
                covs[clusterIndex] = Mat::diag(covsEigenValues[clusterIndex]);
            }
        }

        if(labels.needed())
            trainLabels.copyTo(labels);
        if(probs.needed())
            trainProbs.copyTo(probs);
        if(logLikelihoods.needed())
            trainLogLikelihoods.copyTo(logLikelihoods);

        trainSamples.release();
        trainProbs.release();
        trainLabels.release();
        trainLogLikelihoods.release();

        return true;
    }

    void decomposeCovs()
    {
        CV_Assert(!covs.empty());
        covsEigenValues.resize(nclusters);
        if(covMatType == COV_MAT_GENERIC)
            covsRotateMats.resize(nclusters);
        invCovsEigenValues.resize(nclusters);

        for(int clusterIndex = 0; clusterIndex < nclusters; clusterIndex++)
        {
            CV_Assert(!covs[clusterIndex].empty());

            SVD svd(covs[clusterIndex], SVD::MODIFY_A + SVD::FULL_UV);

            if(covMatType == COV_MAT_SPHERICAL)
            {
                double maxSingularVal = svd.w.at<double>(0);
                covsEigenValues[clusterIndex] = Mat(1, 1, CV_64FC1, Scalar(maxSingularVal));
            }
            else if(covMatType == COV_MAT_DIAGONAL)
            {
                covsEigenValues[clusterIndex] = covs[clusterIndex].diag().clone(); //Preserve the original order of eigen values.
            }
            else //COV_MAT_GENERIC
            {
                covsEigenValues[clusterIndex] = svd.w;
                covsRotateMats[clusterIndex] = svd.u;
            }
            max(covsEigenValues[clusterIndex], minEigenValue, covsEigenValues[clusterIndex]);
            invCovsEigenValues[clusterIndex] = 1./covsEigenValues[clusterIndex];
        }
    }

    void eStep()
    {
        // Compute probs_ik from means_k, covs_k and weights_k.
        trainProbs.create(trainSamples.rows, nclusters, CV_64FC1);
        trainLabels.create(trainSamples.rows, 1, CV_32SC1);
        trainLogLikelihoods.create(trainSamples.rows, 1, CV_64FC1);

        computeLogWeightDivDet();

        for(int sampleIndex = 0; sampleIndex < trainSamples.rows; sampleIndex++)
        {
            Mat sampleProbs = trainProbs.row(sampleIndex);
            Vec2d res = computeProbabilities(trainSamples.row(sampleIndex), &sampleProbs, CV_64F);
            trainLogLikelihoods.at<double>(sampleIndex) = res[0];
            trainLabels.at<int>(sampleIndex) = static_cast<int>(res[1]);
        }
    }

    void mStep()
    {
        // Update means_k, covs_k and weights_k from probs_ik
        int dim = trainSamples.cols;

        // Update weights
        // not normalized first
        reduce(trainProbs, weights, 0, CV_REDUCE_SUM);

        // Update means
        means.create(nclusters, dim, CV_64FC1);
        means = Scalar(0);

        const double minPosWeight = trainSamples.rows * DBL_EPSILON;
        double minWeight = DBL_MAX;
        int minWeightClusterIndex = -1;
        for(int clusterIndex = 0; clusterIndex < nclusters; clusterIndex++)
        {
            if(weights.at<double>(clusterIndex) <= minPosWeight)
                continue;

            if(weights.at<double>(clusterIndex) < minWeight)
            {
                minWeight = weights.at<double>(clusterIndex);
                minWeightClusterIndex = clusterIndex;
            }

            Mat clusterMean = means.row(clusterIndex);
            for(int sampleIndex = 0; sampleIndex < trainSamples.rows; sampleIndex++)
                clusterMean += trainProbs.at<double>(sampleIndex, clusterIndex) * trainSamples.row(sampleIndex);
            clusterMean /= weights.at<double>(clusterIndex);
        }

        // Update covsEigenValues and invCovsEigenValues
        covs.resize(nclusters);
        covsEigenValues.resize(nclusters);
        if(covMatType == COV_MAT_GENERIC)
            covsRotateMats.resize(nclusters);
        invCovsEigenValues.resize(nclusters);
        for(int clusterIndex = 0; clusterIndex < nclusters; clusterIndex++)
        {
            if(weights.at<double>(clusterIndex) <= minPosWeight)
                continue;

            if(covMatType != COV_MAT_SPHERICAL)
                covsEigenValues[clusterIndex].create(1, dim, CV_64FC1);
            else
                covsEigenValues[clusterIndex].create(1, 1, CV_64FC1);

            if(covMatType == COV_MAT_GENERIC)
                covs[clusterIndex].create(dim, dim, CV_64FC1);

            Mat clusterCov = covMatType != COV_MAT_GENERIC ?
                covsEigenValues[clusterIndex] : covs[clusterIndex];

            clusterCov = Scalar(0);

            Mat centeredSample;
            for(int sampleIndex = 0; sampleIndex < trainSamples.rows; sampleIndex++)
            {
                centeredSample = trainSamples.row(sampleIndex) - means.row(clusterIndex);

                if(covMatType == COV_MAT_GENERIC)
                    clusterCov += trainProbs.at<double>(sampleIndex, clusterIndex) * centeredSample.t() * centeredSample;
                else
                {
                    double p = trainProbs.at<double>(sampleIndex, clusterIndex);
                    for(int di = 0; di < dim; di++ )
                    {
                        double val = centeredSample.at<double>(di);
                        clusterCov.at<double>(covMatType != COV_MAT_SPHERICAL ? di : 0) += p*val*val;
                    }
                }
            }

            if(covMatType == COV_MAT_SPHERICAL)
                clusterCov /= dim;

            clusterCov /= weights.at<double>(clusterIndex);

            // Update covsRotateMats for COV_MAT_GENERIC only
            if(covMatType == COV_MAT_GENERIC)
            {
                SVD svd(covs[clusterIndex], SVD::MODIFY_A + SVD::FULL_UV);
                covsEigenValues[clusterIndex] = svd.w;
                covsRotateMats[clusterIndex] = svd.u;
            }

            max(covsEigenValues[clusterIndex], minEigenValue, covsEigenValues[clusterIndex]);

            // update invCovsEigenValues
            invCovsEigenValues[clusterIndex] = 1./covsEigenValues[clusterIndex];
        }

        for(int clusterIndex = 0; clusterIndex < nclusters; clusterIndex++)
        {
            if(weights.at<double>(clusterIndex) <= minPosWeight)
            {
                Mat clusterMean = means.row(clusterIndex);
                means.row(minWeightClusterIndex).copyTo(clusterMean);
                covs[minWeightClusterIndex].copyTo(covs[clusterIndex]);
                covsEigenValues[minWeightClusterIndex].copyTo(covsEigenValues[clusterIndex]);
                if(covMatType == COV_MAT_GENERIC)
                    covsRotateMats[minWeightClusterIndex].copyTo(covsRotateMats[clusterIndex]);
                invCovsEigenValues[minWeightClusterIndex].copyTo(invCovsEigenValues[clusterIndex]);
            }
        }

        // Normalize weights
        weights /= trainSamples.rows;
    }
namespace cv
{
typedef CvStatModel StatModel;
typedef CvKNearest KNearest;
...
typedef CvMLData TrainData;
}

class CV_EXPORTS_W EM : public StatModel
{
public:
    //! 协方差矩阵的类型
    enum Types {
        /** A scaled identity matrix f$mu_k * If$. There is the only
        parameter f$mu_kf$ to be estimated for each matrix. The option may be used in special cases,
        when the constraint is relevant, or as a first step in the optimization (for example in case
        when the data is preprocessed with PCA). The results of such preliminary estimation may be
        passed again to the optimization procedure, this time with
        covMatType=EM::COV_MAT_DIAGONAL. */
        COV_MAT_SPHERICAL=0,
        /** A diagonal matrix with positive diagonal elements. The number of
        free parameters is d for each matrix. This is most commonly used option yielding good
        estimation results. */
        // 正的对角军阵元素,好的评估结果的常用选项;也是训练的协方差矩阵的默认类型
        COV_MAT_DIAGONAL=1,
        /** A symmetric positively defined matrix(对称正定矩阵). The number of free
        parameters in each matrix is about f$d^2/2f$. It is not recommended to use this option, unless
        there is pretty accurate initial estimation of the parameters and/or a huge number of
        training samples. */
        COV_MAT_GENERIC=2, // 一般不用这个选项,除非有较好的初始化参数或者较大的训练集合
        COV_MAT_DEFAULT=COV_MAT_DIAGONAL
    };

    //! 默认参数: 默认簇的个数5, 最大迭代100
    enum {DEFAULT_NCLUSTERS=5, DEFAULT_MAX_ITERS=100};

    //! The initial step
    enum {START_E_STEP=1, START_M_STEP=2, START_AUTO_STEP=0};

    /** The number of mixture components in the Gaussian mixture model.
    Default value of the parameter is EM::DEFAULT_NCLUSTERS=5. Some of %EM implementation could
    determine the optimal number of mixtures within a specified value range, but that is not the
    case in ML yet. */
    /** @see setClustersNumber */
    CV_WRAP virtual int getClustersNumber() const = 0;
    /** @copybrief getClustersNumber @see getClustersNumber */
    CV_WRAP virtual void setClustersNumber(int val) = 0;

    /** Constraint on covariance matrices which defines type of matrices.
    See EM::Types. */
    /** @see setCovarianceMatrixType */
    CV_WRAP virtual int getCovarianceMatrixType() const = 0;
    /** @copybrief getCovarianceMatrixType @see getCovarianceMatrixType */
    CV_WRAP virtual void setCovarianceMatrixType(int val) = 0;

    /** The termination criteria of the %EM algorithm.
    The %EM algorithm can be terminated by the number of iterations termCrit.maxCount (number of
    M-steps) or when relative change of likelihood logarithm is less than termCrit.epsilon. Default
    maximum number of iterations is EM::DEFAULT_MAX_ITERS=100. */
    /** @see setTermCriteria */
    CV_WRAP virtual TermCriteria getTermCriteria() const = 0;
    /** @copybrief getTermCriteria @see getTermCriteria */
    CV_WRAP virtual void setTermCriteria(const TermCriteria &val) = 0;

    /** @brief Returns weights of the mixtures

    Returns vector with the number of elements equal to the number of mixtures.
     */
    CV_WRAP virtual Mat getWeights() const = 0;
    /** @brief Returns the cluster centers (means of the Gaussian mixture)

    Returns matrix with the number of rows equal to the number of mixtures and number of columns
    equal to the space dimensionality.
     */
    CV_WRAP virtual Mat getMeans() const = 0;
    /** @brief Returns covariation matrices

    Returns vector of covariation matrices. Number of matrices is the number of gaussian mixtures,
    each matrix is a square floating-point matrix NxN, where N is the space dimensionality.
     */
    CV_WRAP virtual void getCovs(CV_OUT std::vector<Mat>& covs) const = 0;

    /** @brief Returns posterior probabilities for the provided samples

    @param samples The input samples, floating-point matrix
    @param results The optional output f$ nSamples times nClustersf$ matrix of results. It contains
    posterior probabilities for each sample from the input
    @param flags This parameter will be ignored
     */
    CV_WRAP virtual float predict( InputArray samples, OutputArray results=noArray(), int flags=0 ) const = 0;

    /** @brief Returns a likelihood logarithm value and an index of the most probable mixture component
    for the given sample.

    @param sample A sample for classification. It should be a one-channel matrix of
        f$1 times dimsf$ or f$dims times 1f$ size.
    @param probs Optional output matrix that contains posterior probabilities of each component
        given the sample. It has f$1 times nclustersf$ size and CV_64FC1 type.

    The method returns a two-element double vector. Zero element is a likelihood logarithm value for
    the sample. First element is an index of the most probable mixture component for the given
    sample.
     */
    CV_WRAP virtual Vec2d predict2(InputArray sample, OutputArray probs) const = 0;

    /** @brief Estimate the Gaussian mixture parameters from a samples set.

    This variation starts with Expectation step. Initial values of the model parameters will be
    estimated by the k-means algorithm.

    Unlike many of the ML models, %EM is an unsupervised learning algorithm and it does not take
    responses (class labels or function values) as input. Instead, it computes the *Maximum
    Likelihood Estimate* of the Gaussian mixture parameters from an input sample set, stores all the
    parameters inside the structure: f$p_{i,k}f$ in probs, f$a_kf$ in means , f$S_kf$ in
    covs[k], f$pi_kf$ in weights , and optionally computes the output "class label" for each
    sample: f$texttt{labels}_i=texttt{arg max}_k(p_{i,k}), i=1..Nf$ (indices of the most
    probable mixture component for each sample).

    The trained model can be used further for prediction, just like any other classifier. The
    trained model is similar to the NormalBayesClassifier.

    @param samples Samples from which the Gaussian mixture model will be estimated. It should be a
        one-channel matrix, each row of which is a sample. If the matrix does not have CV_64F type
        it will be converted to the inner matrix of such type for the further computing.
    @param logLikelihoods The optional output matrix that contains a likelihood logarithm value for
        each sample. It has f$nsamples times 1f$ size and CV_64FC1 type.
    @param labels The optional output "class label" for each sample:
        f$texttt{labels}_i=texttt{arg max}_k(p_{i,k}), i=1..Nf$ (indices of the most probable
        mixture component for each sample). It has f$nsamples times 1f$ size and CV_32SC1 type.
    @param probs The optional output matrix that contains posterior probabilities of each Gaussian
        mixture component given the each sample. It has f$nsamples times nclustersf$ size and
        CV_64FC1 type.
     */
    CV_WRAP virtual bool trainEM(InputArray samples,
                         OutputArray logLikelihoods=noArray(),
                         OutputArray labels=noArray(),
                         OutputArray probs=noArray()) = 0;

    /** @brief Estimate the Gaussian mixture parameters from a samples set.

    This variation starts with Expectation step. You need to provide initial means f$a_kf$ of
    mixture components. Optionally you can pass initial weights f$pi_kf$ and covariance matrices
    f$S_kf$ of mixture components.

    @param samples Samples from which the Gaussian mixture model will be estimated. It should be a
        one-channel matrix, each row of which is a sample. If the matrix does not have CV_64F type
        it will be converted to the inner matrix of such type for the further computing.
    @param means0 Initial means f$a_kf$ of mixture components. It is a one-channel matrix of
        f$nclusters times dimsf$ size. If the matrix does not have CV_64F type it will be
        converted to the inner matrix of such type for the further computing.
    @param covs0 The vector of initial covariance matrices f$S_kf$ of mixture components. Each of
        covariance matrices is a one-channel matrix of f$dims times dimsf$ size. If the matrices
        do not have CV_64F type they will be converted to the inner matrices of such type for the
        further computing.
    @param weights0 Initial weights f$pi_kf$ of mixture components. It should be a one-channel
        floating-point matrix with f$1 times nclustersf$ or f$nclusters times 1f$ size.
    @param logLikelihoods The optional output matrix that contains a likelihood logarithm value for
        each sample. It has f$nsamples times 1f$ size and CV_64FC1 type.
    @param labels The optional output "class label" for each sample:
        f$texttt{labels}_i=texttt{arg max}_k(p_{i,k}), i=1..Nf$ (indices of the most probable
        mixture component for each sample). It has f$nsamples times 1f$ size and CV_32SC1 type.
    @param probs The optional output matrix that contains posterior probabilities of each Gaussian
        mixture component given the each sample. It has f$nsamples times nclustersf$ size and
        CV_64FC1 type.
    */
    CV_WRAP virtual bool trainE(InputArray samples, InputArray means0,
                        InputArray covs0=noArray(),
                        InputArray weights0=noArray(),
                        OutputArray logLikelihoods=noArray(),
                        OutputArray labels=noArray(),
                        OutputArray probs=noArray()) = 0;

    /** @brief Estimate the Gaussian mixture parameters from a samples set.

    This variation starts with Maximization step. You need to provide initial probabilities
    f$p_{i,k}f$ to use this option.

    @param samples Samples from which the Gaussian mixture model will be estimated. It should be a
        one-channel matrix, each row of which is a sample. If the matrix does not have CV_64F type
        it will be converted to the inner matrix of such type for the further computing.
    @param probs0
    @param logLikelihoods The optional output matrix that contains a likelihood logarithm value for
        each sample. It has f$nsamples times 1f$ size and CV_64FC1 type.
    @param labels The optional output "class label" for each sample:
        f$texttt{labels}_i=texttt{arg max}_k(p_{i,k}), i=1..Nf$ (indices of the most probable
        mixture component for each sample). It has f$nsamples times 1f$ size and CV_32SC1 type.
    @param probs The optional output matrix that contains posterior probabilities of each Gaussian
        mixture component given the each sample. It has f$nsamples times nclustersf$ size and
        CV_64FC1 type.
    */
    CV_WRAP virtual bool trainM(InputArray samples, InputArray probs0,
                        OutputArray logLikelihoods=noArray(),
                        OutputArray labels=noArray(),
                        OutputArray probs=noArray()) = 0;

    /** Creates empty %EM model.
    The model should be trained then using StatModel::train(traindata, flags) method. Alternatively, you
    can use one of the EM::train* methods or load it from file using Algorithm::load<EM>(filename).
     */
    CV_WRAP static Ptr<EM> create();

    /** @brief Loads and creates a serialized EM from a file
     *
     * Use EM::save to serialize and store an EM to disk.
     * Load the EM from this file again, by calling this function with the path to the file.
     * Optionally specify the node for the file containing the classifier
     *
     * @param filepath path to serialized EM
     * @param nodeName name of node containing the classifier
     */
    CV_WRAP static Ptr<EM> load(const String& filepath , const String& nodeName = String());
};

参考链接

|版权声明:本文为博主原创文章,未经博主允许不得转载。

原文:大专栏  机器学习:EM算法

原文地址:https://www.cnblogs.com/chinatrump/p/11597056.html

时间: 2024-10-01 06:18:48

机器学习:EM算法的相关文章

机器学习---EM算法(分类)

很多时候算法没有搞明白其实是一堆符号没有明白是神马意思...所以本文,着重告诉大家,这堆符号,到底都,代表神马! 我就奇怪了,谁发明了这么多符号(--多么希望是我-.- 以下使用到的图片来自上海交大杨旸老师的课件,网址如下:http://bcmi.sjtu.edu.cn/~yangyang/ml/ 我们首先来宏观认识一下EM算法.其实EMs就是K-means的升级版,也是就是说K-means是EM的一种特殊情况~相当于一个二维一个多维的关系 分割线前边讲算法,后边讲证明. 想知道K-means是

实战EM算法与图像分割

EM 算法是求参数极大似然估计的一种方法,它可以从非完整数据集中对参数进行估计,是一种非常简单实用的学习算法.这种方法可以广泛地应用于处理缺损数据.截尾数据以及带有噪声等所谓的不完全数据,可以具体来说,我们可以利用EM算法来填充样本中的缺失数据.发现隐藏变量的值.估计HMM中的参数.估计有限混合分布中的参数以及可以进行无监督聚类等等. 贴相关几个好文章:从最大似然到EM算法浅解 混合高斯模型(Mixtures of Gaussians)和EM算法 斯坦福大学机器学习--EM算法求解高斯混合模型

机器学习中的EM算法详解及R语言实例(1)

最大期望算法(EM) K均值算法非常简单(可参见之前发布的博文),详细读者都可以轻松地理解它.但下面将要介绍的EM算法就要困难许多了,它与极大似然估计密切相关. 1 算法原理 不妨从一个例子开始我们的讨论,假设现在有100个人的身高数据,而且这100条数据是随机抽取的.一个常识性的看法是,男性身高满足一定的分布(例如正态分布),女性身高也满足一定的分布,但这两个分布的参数不同.我们现在不仅不知道男女身高分布的参数,甚至不知道这100条数据哪些是来自男性,哪些是来自女性.这正符合聚类问题的假设,除

【机器学习】EM算法详细推导和讲解

[机器学习]EM算法详细推导和讲解 今天不太想学习,炒个冷饭,讲讲机器学习十大算法里有名的EM算法,文章里面有些个人理解,如有错漏,还请读者不吝赐教. 众所周知,极大似然估计是一种应用很广泛的参数估计方法.例如我手头有一些东北人的身高的数据,又知道身高的概率模型是高斯分布,那么利用极大化似然函数的方法可以估计出高斯分布的两个参数,均值和方差.这个方法基本上所有概率课本上都会讲,我这就不多说了,不清楚的请百度. 然而现在我面临的是这种情况,我手上的数据是四川人和东北人的身高合集,然而对于其中具体的

简单易学的机器学习算法——EM算法

一.机器学习中的參数预计问题 在前面的博文中,如"简单易学的机器学习算法--Logistic回归"中,採用了极大似然函数对其模型中的參数进行预计,简单来讲即对于一系列样本,Logistic回归问题属于监督型学习问题,样本中含有训练的特征 X_i" title="X_i" >以及标签.在Logistic回归的參数求解中.通过构造样本属于类别和类别的概率: 这样便能得到Logistic回归的属于不同类别的概率函数: 此时,使用极大似然预计便可以预计出模型

opencv3中的机器学习算法之:EM算法

不同于其它的机器学习模型,EM算法是一种非监督的学习算法,它的输入数据事先不需要进行标注.相反,该算法从给定的样本集中,能计算出高斯混和参数的最大似然估计.也能得到每个样本对应的标注值,类似于kmeans聚类(输入样本数据,输出样本数据的标注).实际上,高斯混和模型GMM和kmeans都是EM算法的应用. 在opencv3.0中,EM算法的函数是trainEM,函数原型为: bool trainEM(InputArray samples, OutputArray logLikelihoods=n

机器学习笔记(十)EM算法及实践(以混合高斯模型(GMM)为例来次完整的EM)

今天要来讨论的是EM算法.第一眼看到EM我就想到了我大枫哥,EM Master,千里马,RUA!!!不知道看这个博客的人有没有懂这个梗的.好的,言归正传,今天要讲的EM算法,全称是Expectation maximization,期望最大化.怎么个意思呢,就是给你一堆观测样本,让你给出这个模型的参数估计.我靠,这套路我们前面讨论各种回归的时候不是已经用烂了吗?求期望,求对数期望,求导为0,得到参数估计值,这套路我懂啊,MLE!但问题在于,如果这个问题存在中间的隐变量呢?会不会把我们的套路给带崩呢

猪猪的机器学习笔记(十四)EM算法

EM算法 作者:樱花猪   摘要: 本文为七月算法(julyedu.com)12月机器学习第十次次课在线笔记.EM算法全称为Expectation Maximization Algorithm,既最大期望算法.它是一种迭代的算法,用于含有隐变量的概率参数模型的最大似然估计和极大后验概率估计.EM算法经常用于机器学习和机器视觉的聚类领域,是一个非常重要的算法.而EM算法本身从使用上来讲并不算难,但是如果需要真正的理解则需要许多知识的相互串联. 引言:      EM算法是机器学习十大经典算法之一.

关于机器学习-EM算法新解

我希望自己能通俗地把它理解或者说明白,但是,EM这个问题感觉真的不太好用通俗的语言去说明白,因为它很简单,又很复杂.简单在于它的思想,简单在于其仅包含了两个步骤就能完成强大的功能,复杂在于它的数学推理涉及到比较繁杂的概率公式等.如果只讲简单的,就丢失了EM算法的精髓,如果只讲数学推理,又过于枯燥和生涩,但另一方面,想把两者结合起来也不是件容易的事.所以,我也没法期待我能把它讲得怎样.希望各位不吝指导. EM模型 在统计计算中,最大期望(EM)算法是在概率(probabilistic)模型中寻找参

机器学习十大算法之一:EM算法

机器学习十大算法之一:EM算法.能评得上十大之一,让人听起来觉得挺NB的.什么是NB啊,我们一般说某个人很NB,是因为他能解决一些别人解决不了的问题.神为什么是神,因为神能做很多人做不了的事.那么EM算法能解决什么问题呢?或者说EM算法是因为什么而来到这个世界上,还吸引了那么多世人的目光. 我希望自己能通俗地把它理解或者说明白,但是,EM这个问题感觉真的不太好用通俗的语言去说明白,因为它很简单,又很复杂.简单在于它的思想,简单在于其仅包含了两个步骤就能完成强大的功能,复杂在于它的数学推理涉及到比