本人硕士阶段做了很久的高分辨率遥感影像道路提取,颇有心得,在此,本人将最新的研究成果进行开源。。。
大家都知道,传统的基于机器学习的分类方法通常需要正负样本的同时参与,才能得到目标类,但是负样本的勾选,通常很困难,也非常难获得,根据文献-
《Elkan, Charles, and Keith Noto. "Learning classifiers from only positive and unlabeled data." Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2008.》中提出的方法,本人构建了一种单样本分类器,其中原理如下:
根据上面原理,本人基于传统的随机森林算法,构建了单分类器,random forest positive and unlabeled learning classifer,matlab 代码如下所示:
这是主要的单分类器函数:
function Pro= RF_pul(sample,Image,train,r_pul,ntree,depth) %positive and unlabeled learning ,finished by wang zhipan, southwest %jiaotonguniversiy; % sample:the number of train samples,the first column is column in original % image,the second column is row in original image % Image:the original image % train :the ratio of trained sample, % r_pul: the ratio between positive samples and unlabed samles % ntree:default 200; % depth : default 2; if nargin<3 train=0.75; r_pul=1; ntree=200; depth=2; elseif nargin<4 r_pul=1; ntree=200; depth=2; elseif nargin<5 ntree=200; depth=2; elseif nargin<6 depth=2; end [m_train,n_train]=size(sample); n_train=n_train-2; [m,n,~]=size(Image); for i=1:n_train predict(:,i)=reshape(Image(:,:,i),m*n,1); end %% train matrix rand_negtive=randperm(m*n)‘; n_tr=floor(m_train*train+(m_train*train*r_pul)); Train_matrix=zeros(n_tr,n_train); % the matirx used to train Label_train=zeros(n_tr,1); %label Label_train(1:m_train*train,1)=1; Label_train(m_train*train+1:end,1)=0; Train_matrix(1:m_train*train,1:end)=sample(1:m_train*train,3:end); for i=1:m_train*train*r_pul Train_matrix(floor(m_train*train)+i,:)=predict(rand_negtive(i),:); end %% predictmatirx min_val=min(predict); max_val=max(predict); for i=1:n_train %linear normlized predict(:,i)=(predict(:,i)-min_val(i))./(max_val(i)-min_val(i)); Train_matrix(:,i)=(Train_matrix(:,i)-min_val(i))./(max_val(i)-min_val(i)); end model = classRF_train(Train_matrix,Label_train,ntree,depth); [~,votes] = classRF_predict(predict,model); g=votes./ntree; %use the number of tree the stand for probability G=reshape(g(:,2),m,n); valid_index=sub2ind([m,n],sample(m_train*train+1:end,2),sample(m_train*train+1:end,1)); c=mean(G(valid_index)); % the c in paper,we use mean value of all data Pro=mat2gray(G./c); %Normalized Probability end
下面写一个script,进行运行:
%% test RF_pul clear tic Text=textread(‘Sample3.txt‘); [Img,ref]=geotiffread(‘test3.tif‘);% 获取坐标信息 Img=double(Img); info=geotiffinfo(‘test3.tif‘); [m,n,z]=size(Img); Sample=zeros(size(Text,1),5); Sample(1:end,1:2)=Text(1:end,2:3); %以行列的形式存储原始影像坐标,其中第一列表示原始影像中的列,第二列表示原始影像中的行 Sample(1:end,3:end)=Text(1:end,8:end); % 存储特征 clear Text Pro= RF_pul(Sample,Img); Proimg=reshape(Pro,m,n); clear Pro Img figure,imshow(Proimg),title(‘后验概率‘); geotiffwrite(‘Probility.tif‘,Proimg,ref,‘GeoKeyDirectoryTag‘, info.GeoTIFFTags.GeoKeyDirectoryTag); toc
得到的最终结果如下所示:
尽请各位同行批评指正,如果对代码有比较难理解的地方,欢迎联系,手机:13094445837,qq1044625113
时间: 2024-10-10 04:28:27