AP(affinity propagation)研究

待补充……

AP算法,即Affinity propagation,是Brendan J. Frey* 和Delbert Dueck于2007年在science上提出的一种算法(文章链接,维基百科)

现在只是初步研究了一下官网上提供的MATLAB源码:apcluster.m

%APCLUSTER Affinity Propagation Clustering (Frey/Dueck, Science 2007)
% [idx,netsim,dpsim,expref]=APCLUSTER(s,p) clusters data, using a set
% of real-valued pairwise data point similarities as input. Clusters
% are each represented by a cluster center data point (the "exemplar").
% The method is iterative and searches for clusters so as to maximize
% an objective function, called net similarity.
%
% For N data points, there are potentially N^2-N pairwise similarities;
% this can be input as an N-by-N matrix ‘s‘, where s(i,k) is the
% similarity of point i to point k (s(i,k) needn抰 equal s(k,i)).  In
% fact, only a smaller number of relevant similarities are needed; if
% only M similarity values are known (M < N^2-N) they can be input as
% an M-by-3 matrix with each row being an (i,j,s(i,j)) triple.
%
% APCLUSTER automatically determines the number of clusters based on
% the input preference ‘p‘, a real-valued N-vector. p(i) indicates the
% preference that data point i be chosen as an exemplar. Often a good
% choice is to set all preferences to median(s); the number of clusters
% identified can be adjusted by changing this value accordingly. If ‘p‘
% is a scalar, APCLUSTER assumes all preferences are that shared value.
%
% The clustering solution is returned in idx. idx(j) is the index of
% the exemplar for data point j; idx(j)==j indicates data point j
% is itself an exemplar. The sum of the similarities of the data points to
% their exemplars is returned as dpsim, the sum of the preferences of
% the identified exemplars is returned in expref and the net similarity
% objective function returned is their sum, i.e. netsim=dpsim+expref.
%
%     [ ... ]=apcluster(s,p,‘NAME‘,VALUE,...) allows you to specify
%       optional parameter name/value pairs as follows:
%
%   ‘maxits‘     maximum number of iterations (default: 1000)
%   ‘convits‘    if the estimated exemplars stay fixed for convits
%          iterations, APCLUSTER terminates early (default: 100)
%   ‘dampfact‘   update equation damping level in [0.5, 1).  Higher
%        values correspond to heavy damping, which may be needed
%        if oscillations occur. (default: 0.9)
%   ‘plot‘       (no value needed) Plots netsim after each iteration
%   ‘details‘    (no value needed) Outputs iteration-by-iteration
%      details (greater memory requirements)
%   ‘nonoise‘    (no value needed) APCLUSTER adds a small amount of
%      noise to ‘s‘ to prevent degenerate cases; this disables that.
%
% Copyright (c) B.J. Frey & D. Dueck (2006). This software may be
% freely used and distributed for non-commercial purposes.
%          (RUN APCLUSTER WITHOUT ARGUMENTS FOR DEMO CODE)
function [idx,netsim,dpsim,expref]=apcluster(s,p,varargin);
if nargin==0, % display demo
    fprintf(‘Affinity Propagation (APCLUSTER) sample/demo code\n\n‘);
    fprintf(‘N=100; x=rand(N,2); % Create N, 2-D data points\n‘);
    fprintf(‘M=N*N-N; s=zeros(M,3); % Make ALL N^2-N similarities\n‘);
    fprintf(‘j=1;\n‘);
    fprintf(‘for i=1:N\n‘);
    fprintf(‘  for k=[1:i-1,i+1:N]\n‘);
    fprintf(‘    s(j,1)=i; s(j,2)=k; s(j,3)=-sum((x(i,:)-x(k,:)).^2);\n‘);
    fprintf(‘    j=j+1;\n‘);
    fprintf(‘  end;\n‘);
    fprintf(‘end;\n‘);
    fprintf(‘p=median(s(:,3)); % Set preference to median similarity\n‘);
    fprintf(‘[idx,netsim,dpsim,expref]=apcluster(s,p,‘‘plot‘‘);\n‘);
    fprintf(‘fprintf(‘‘Number of clusters: %%d\\n‘‘,length(unique(idx)));\n‘);
    fprintf(‘fprintf(‘‘Fitness (net similarity): %%g\\n‘‘,netsim);\n‘);
    fprintf(‘figure; % Make a figures showing the data and the clusters\n‘);
    fprintf(‘for i=unique(idx)‘‘\n‘);
    fprintf(‘  ii=find(idx==i); h=plot(x(ii,1),x(ii,2),‘‘o‘‘); hold on;\n‘);
    fprintf(‘  col=rand(1,3); set(h,‘‘Color‘‘,col,‘‘MarkerFaceColor‘‘,col);\n‘);
    fprintf(‘  xi1=x(i,1)*ones(size(ii)); xi2=x(i,2)*ones(size(ii)); \n‘);
    fprintf(‘  line([x(ii,1),xi1]‘‘,[x(ii,2),xi2]‘‘,‘‘Color‘‘,col);\n‘);
    fprintf(‘end;\n‘);
    fprintf(‘axis equal tight;\n\n‘);
    return;
end;
start = clock;
% Handle arguments to function
if nargin<2 error(‘Too few input arguments‘);
else
    maxits=1000; convits=100; lam=0.9; plt=0; details=0; nonoise=0;
    i=1;
    while i<=length(varargin)
        if strcmp(varargin{i},‘plot‘)
            plt=1; i=i+1;
        elseif strcmp(varargin{i},‘details‘)
            details=1; i=i+1;
        elseif strcmp(varargin{i},‘sparse‘)
%             [idx,netsim,dpsim,expref]=apcluster_sparse(s,p,varargin{:});
            fprintf(‘‘‘sparse‘‘ argument no longer supported; see website for additional software\n\n‘);
            return;
        elseif strcmp(varargin{i},‘nonoise‘)
            nonoise=1; i=i+1;
        elseif strcmp(varargin{i},‘maxits‘)
            maxits=varargin{i+1};
            i=i+2;
            if maxits<=0 error(‘maxits must be a positive integer‘); end;
        elseif strcmp(varargin{i},‘convits‘)
            convits=varargin{i+1};
            i=i+2;
            if convits<=0 error(‘convits must be a positive integer‘); end;
        elseif strcmp(varargin{i},‘dampfact‘)
            lam=varargin{i+1};
            i=i+2;
            if (lam<0.5)||(lam>=1)
                error(‘dampfact must be >= 0.5 and < 1‘);
            end;
        else i=i+1;
        end;
    end;
end;
if lam>0.9
    fprintf(‘\n*** Warning: Large damping factor in use. Turn on plotting\n‘);
    fprintf(‘    to monitor the net similarity. The algorithm will\n‘);
    fprintf(‘    change decisions slowly, so consider using a larger value\n‘);
    fprintf(‘    of convits.\n\n‘);
end;

% Check that standard arguments are consistent in size
if length(size(s))~=2 error(‘s should be a 2D matrix‘);
elseif length(size(p))>2 error(‘p should be a vector or a scalar‘);
elseif size(s,2)==3
    tmp=max(max(s(:,1)),max(s(:,2)));
    if length(p)==1 N=tmp; else N=length(p); end;
    if tmp>N
        error(‘data point index exceeds number of data points‘);
    elseif min(min(s(:,1)),min(s(:,2)))<=0
        error(‘data point indices must be >= 1‘);
    end;
elseif size(s,1)==size(s,2)
    N=size(s,1);
    if (length(p)~=N)&&(length(p)~=1)
        error(‘p should be scalar or a vector of size N‘);
    end;
else error(‘s must have 3 columns or be square‘); end;

% Construct similarity matrix
if N>3000
    fprintf(‘\n*** Warning: Large memory request. Consider activating\n‘);
    fprintf(‘    the sparse version of APCLUSTER.\n\n‘);
end;
if size(s,2)==3 && size(s,1)~=3,
    S=-Inf*ones(N,N,class(s));
    for j=1:size(s,1), S(s(j,1),s(j,2))=s(j,3); end;
else S=s;
end;

if S==S‘, symmetric=true; else symmetric=false; end;
realmin_=realmin(class(s)); realmax_=realmax(class(s));

% In case user did not remove degeneracies from the input similarities,
% avoid degenerate solutions by adding a small amount of noise to the
% input similarities
if ~nonoise
    rns=randn(‘state‘); randn(‘state‘,0);
    S=S+(eps*S+realmin_*100).*rand(N,N);
    randn(‘state‘,rns);
end;

% Place preferences on the diagonal of S
if length(p)==1 for i=1:N S(i,i)=p; end;
else for i=1:N S(i,i)=p(i); end;
end;

% Numerical stability -- replace -INF with -realmax
n=find(S<-realmax_); if ~isempty(n), warning(‘-INF similarities detected; changing to -REALMAX to ensure numerical stability‘); S(n)=-realmax_; end; clear(‘n‘);
if ~isempty(find(S>realmax_,1)), error(‘+INF similarities detected; change to a large positive value (but smaller than +REALMAX)‘); end;

% Allocate space for messages, etc
dS=diag(S); A=zeros(N,N,class(s)); R=zeros(N,N,class(s)); t=1;
if plt, netsim=zeros(1,maxits+1); end;
if details
    idx=zeros(N,maxits+1);
    netsim=zeros(1,maxits+1);
    dpsim=zeros(1,maxits+1);
    expref=zeros(1,maxits+1);
end;

% Execute parallel affinity propagation updates
e=zeros(N,convits); dn=0; i=0;
if symmetric, ST=S; else ST=S‘; end; % saves memory if it‘s symmetric
while ~dn
    i=i+1; 

    % Compute responsibilities
    A=A‘; R=R‘;
    for ii=1:N,
        old = R(:,ii);
        AS = A(:,ii) + ST(:,ii); [Y,I]=max(AS); AS(I)=-Inf;
        [Y2,I2]=max(AS);
        R(:,ii)=ST(:,ii)-Y;
        R(I,ii)=ST(I,ii)-Y2;
        R(:,ii)=(1-lam)*R(:,ii)+lam*old; % Damping
        R(R(:,ii)>realmax_,ii)=realmax_;
    end;
    A=A‘; R=R‘;

    % Compute availabilities
    for jj=1:N,
        old = A(:,jj);
        Rp = max(R(:,jj),0); Rp(jj)=R(jj,jj);
        A(:,jj) = sum(Rp)-Rp;
        dA = A(jj,jj); A(:,jj) = min(A(:,jj),0); A(jj,jj) = dA;
        A(:,jj) = (1-lam)*A(:,jj) + lam*old; % Damping
    end;

    % Check for convergence
    E=((diag(A)+diag(R))>0); e(:,mod(i-1,convits)+1)=E; K=sum(E);
    if i>=convits || i>=maxits,
        se=sum(e,2);
        unconverged=(sum((se==convits)+(se==0))~=N);
        if (~unconverged&&(K>0))||(i==maxits) dn=1; end;
    end;

    % Handle plotting and storage of details, if requested
    if plt||details
        if K==0
            tmpnetsim=nan; tmpdpsim=nan; tmpexpref=nan; tmpidx=nan;
        else
            I=find(E); notI=find(~E); [tmp c]=max(S(:,I),[],2); c(I)=1:K; tmpidx=I(c);
            tmpdpsim=sum(S(sub2ind([N N],notI,tmpidx(notI))));
            tmpexpref=sum(dS(I));
            tmpnetsim=tmpdpsim+tmpexpref;
        end;
    end;
    if details
        netsim(i)=tmpnetsim; dpsim(i)=tmpdpsim; expref(i)=tmpexpref;
        idx(:,i)=tmpidx;
    end;
    if plt,
        netsim(i)=tmpnetsim;
        figure(234);
        plot(((netsim(1:i)/10)*100)/10,‘r-‘); xlim([0 i]); % plot barely-finite stuff as infinite
        xlabel(‘# Iterations‘);
        ylabel(‘Fitness (net similarity) of quantized intermediate solution‘);
%         drawnow;
    end;
end; % iterations
I=find((diag(A)+diag(R))>0); K=length(I); % Identify exemplars
if K>0
    [tmp c]=max(S(:,I),[],2); c(I)=1:K; % Identify clusters
    % Refine the final set of exemplars and clusters and return results
    for k=1:K ii=find(c==k); [y j]=max(sum(S(ii,ii),1)); I(k)=ii(j(1)); end; notI=reshape(setdiff(1:N,I),[],1);
    [tmp c]=max(S(:,I),[],2); c(I)=1:K; tmpidx=I(c);
    tmpdpsim=sum(S(sub2ind([N N],notI,tmpidx(notI))));
    tmpexpref=sum(dS(I));
    tmpnetsim=tmpdpsim+tmpexpref;
else
    tmpidx=nan*ones(N,1); tmpnetsim=nan; tmpexpref=nan;
end;
if details
    netsim(i+1)=tmpnetsim; netsim=netsim(1:i+1);
    dpsim(i+1)=tmpdpsim; dpsim=dpsim(1:i+1);
    expref(i+1)=tmpexpref; expref=expref(1:i+1);
    idx(:,i+1)=tmpidx; idx=idx(:,1:i+1);
else
    netsim=tmpnetsim; dpsim=tmpdpsim; expref=tmpexpref; idx=tmpidx;
end;
if plt||details
    fprintf(‘\nNumber of exemplars identified: %d  (for %d data points)\n‘,K,N);
    fprintf(‘Net similarity: %g\n‘,tmpnetsim);
    fprintf(‘  Similarities of data points to exemplars: %g\n‘,dpsim(end));
    fprintf(‘  Preferences of selected exemplars: %g\n‘,tmpexpref);
    fprintf(‘Number of iterations: %d\n\n‘,i);
    fprintf(‘Elapsed time: %g sec\n‘,etime(clock,start));
end;
if unconverged
    fprintf(‘\n*** Warning: Algorithm did not converge. Activate plotting\n‘);
    fprintf(‘    so that you can monitor the net similarity. Consider\n‘);
    fprintf(‘    increasing maxits and convits, and, if oscillations occur\n‘);
    fprintf(‘    also increasing dampfact.\n\n‘);
end;

实际使用的示例数据:

s矩阵以及p的取值,

s=[1 0.85 0.9 0.5 0.45 0.5 0.4 0.4 0.5 0.45;
   0.85 1 0.85 0.6 0.65 0.7 0.6 0.55 0.8 0.7;
   0.9 0.85 1 0.75 0.7 0.65 0.55 0.5 0.6 0.5;
   0.5 0.6 0.75 1 0.9 0.7 0.7 0.85 0.5 0.45;
   0.45 0.65 0.7 0.9 1 0.9 0.9 0.85 0.6 0.65;
   0.5 0.7 0.65 0.7 0.9 1 0.85 0.75 0.75 0.75;
   0.4 0.6 0.55 0.7 0.9 0.85 1 0.85 0.5 0.55;
   0.4 0.55 0.5 0.85 0.85 0.75 0.85 1 0.3 0.25;
   0.5 0.8 0.6 0.5 0.6 0.75 0.5 0.3 1 0.9;
   0.45 0.7 0.5 0.45 0.65 0.75 0.55 0.25 0.9 1;
    ];
p=median(median(s));

最后的运行结果:

idx =

     1
     1
     1
     5
     5
     5
     5
     5
     9
     9

netsim =

    8.1875

dpsim =

    6.2000

expref =

    1.9875
时间: 2024-10-27 10:55:24

AP(affinity propagation)研究的相关文章

Affinity Propagation Algorithm

The principle of Affinity Propagation Algorithm is discribed at above. It is widly applied in many fields.

AP聚类算法(Affinity propagation Clustering Algorithm )

AP聚类算法是基于数据点间的"信息传递"的一种聚类算法.与k-均值算法或k中心点算法不同,AP算法不需要在运行算法之前确定聚类的个数.AP算法寻找的"examplars"即聚类中心点是数据集合中实际存在的点,作为每类的代表. 算法描述: 假设$\{ {x_1},{x_2}, \cdots ,{x_n}\} $数据样本集,数据间没有内在结构的假设.令是一个刻画点之间相似度的矩阵,使得$s(i,j) > s(i,k)$当且仅当$x_i$与$x_j$的相似性程度要大

伪AP检测技术研究

转载自:http://www.whitecell-club.org/?p=310 随着城市无线局域网热点在公共场所大规模的部署,无线局域网安全变得尤为突出和重要,其中伪AP钓鱼攻击是无线网络中严重的安全威胁之一. 受到各种客观因素的限制,很多数据在WiFi网络上传输时都是明文的,如一般的网页.图片等:甚至还有很多网站或邮件系统在手机用户进行登陆时,将帐号和密码也进行了明文传输或只是简单加密传输(加密过程可逆).因此,一旦有手机接入攻击者架设的伪AP,那么通过该伪AP传输的各种信息,包括帐号和密码

0A04 无监督学习:聚类(2) 近邻算法(Affinity Propagation)

AP算法,具有结果稳定可重现 训练前不用制定K-means中K值,但是算法的时间复杂度比K-means高 import numpy as npfrom sklearn.cluster import AffinityPropagation # 引入AP算法聚类 X = np.array([[1,2],[1,4],[0.7,0],[0.2,5],[0,4],[1.3,0],[0.1,2],[0,4],[0.4,0]]) # 训练数据af = AffinityPropagation(preferenc

【Papers】Affinity propagation

参考资料:Clustering by Passing Messages Between Data Points,Brendan J. Frey* and Delbert Dueck

图像处理与机器视觉行业分析

图像处理与机器视觉 一 行业分析 数字图像处理是对图像进行分析.加工.和处理,使其满足视觉.心理以及其他要求的技术.图像处理是信号处理在图像域上的一个应用.目前大多数的图像是以数字形式 存储,因而图像处理很多情况下指数字图像处理.此外,基于光学理论的处理方法依然占有重要的地位. 数字图像处理是信号处理的子类, 另外与计算机科学.人工智能等领域也有密切的关系. 传统的一维信号处理的方法和概念很多仍然可以直接应用在图像处理上,比如降噪.量化等.然而,图像属于二维信号,和一维信号相比,它有自己特殊的一

机器学习------资源分享

=======================国内==================== 之前自己一直想总结一下国内搞机器学习和数据挖掘的大牛,但是自己太懒了.所以没搞… 最近看到了下面转载的这篇博文,感觉总结的比较全面了. 个人认为,但从整体研究实力来说,机器学习和数据挖掘方向国内最强的地方还是在MSRA, 那边的相关研究小组太多,很多方向都能和数据挖掘扯上边.这里我再补充几个相关研究方向 的年轻老师和学者吧. 蔡登:http://www.cad.zju.edu.cn/home/dengca

水木-机器学习推荐论文和书籍

发信人: zibuyu (得之我幸), 信区: NLP 标 题: 机器学习推荐论文和书籍 发信站: 水木社区 (Thu Oct 30 21:00:39 2008), 站内 我们组内某小神童师弟通读论文,拟了一个机器学习的推荐论文和书籍列表. 经授权发布在这儿,希望对大家有用.:) ====================================== 基本模型: HMM(Hidden Markov Models): A Tutorial on Hidden Markov Models an

AP聚类算法

一.算法简介 Affinity Propagation聚类算法简称AP,是一个在07年发表在Science上的聚类算法.它实际属于message-passing algorithms的一种.算法的基本思想将数据看成网络中的节点,通过在数据点之间传递消息,分别是吸引度(responsibility)和归属度(availability),不断修改聚类中心的数量与位置,直到整个数据集相似度达到最大,同时产生高聚类中心,并将其余各点分配到相应的聚类中. 二.算法描述 1.相关概念 Exemplar:指的