k-means处理图片

问题描述:把给定图片,用图片中最主要的三种颜色来表示该图片

k-means思想:

  1、选择k个点作为初始中心

  2、将每个点指派到最近的中心,形成k个簇cluster

  3、重新计算每个簇的中心

  4、如果簇中心发生明显变化或未达到最大迭代次数,则回到step2

  问题:初始点不对的时候,容易收敛到局部最优值

  解决办法:

    1、选择k个点作为初始中心——canopy,模拟退火,贝叶斯准则

    2、将每个点指派到最近的中心,形成k个簇cluster

    3、重新计算每个簇的中心

    4、如果簇中心发生了明显的变化或未达到最大迭代次数,则回到step2

  例子:给你一幅图像,找出其中最主要的三种颜色,并将图片用三种最主要的颜色表示出来

# -*- coding: utf-8 -*-
# https://github.com/ZeevG/python-dominant-image-colour
# commented by heibanke

from PIL import Image
import random
import numpy

class Cluster(object):
    """
    pixels: 主要颜色所依据的像素点
    centroid: 主要颜色的RGB值
    """
    def __init__(self):
        self.pixels = []
        self.centroid = None
#cluster有两个属性,centroid表示聚类中心,pixels表示依附于该聚类中心的那些像素点
#每个聚类中心都是一个单独的Cluster对象
    def addPoint(self, pixel):
        self.pixels.append(pixel)

    def setNewCentroid(self):
        """
        通过pixels均值重新计算主要颜色
        """
        R = [colour[0] for colour in self.pixels]
        G = [colour[1] for colour in self.pixels]
        B = [colour[2] for colour in self.pixels]

        R = sum(R) / len(R)
        G = sum(G) / len(G)
        B = sum(B) / len(B)

        self.centroid = (R, G, B)
        self.pixels = []

        return self.centroid

class Kmeans(object):

    def __init__(self, k=3, max_iterations=5, min_distance=5.0, size=400):
        """
        k: 主要颜色的分类个数
        max_iterations: 最大迭代次数
        min_distance: 当新的颜色和老颜色的距离小于该最小距离时,提前终止迭代
        size: 用于计算的图像大小
        """
        self.k = k
        self.max_iterations = max_iterations
        self.min_distance = min_distance
        self.size = (size, size)

    def run(self, image):
        self.image = image
        #生成缩略图,节省运算量
        self.image.thumbnail(self.size)
        self.pixels = numpy.array(image.getdata(), dtype=numpy.uint8)
        self.clusters = [None]*self.k
        self.oldClusters = None
        #在图像中随机选择k个像素作为初始主要颜色
        randomPixels = random.sample(self.pixels, self.k)

        for idx in range(self.k):
            self.clusters[idx] = Cluster()#生成idx个Cluster的对象
            self.clusters[idx].centroid = randomPixels[idx]#每个centroid是随机采样得到的

        iterations = 0

        #开始迭代
        while self.shouldExit(iterations) is False:
            self.oldClusters= [cluster.centroid for cluster in self.clusters]
            print iterations

            #对pixel和self.clusters中的主要颜色分别计算距离,将pixel加入到离它最近的主要颜色所在的cluster中
            for pixel in self.pixels:
                self.assignClusters(pixel)
            #对每个cluster中的pixels,重新计算新的主要颜色
            for cluster in self.clusters:
                cluster.setNewCentroid()

            iterations += 1

        return [cluster.centroid for cluster in self.clusters]

    def assignClusters(self, pixel):
        shortest = float(‘Inf‘)
        for cluster in self.clusters:
            distance = self.calcDistance(cluster.centroid, pixel)
            if distance < shortest:
                shortest = distance
                nearest = cluster#nearest实际上是cluster的引用,不是复制
        nearest.addPoint(pixel)

    def calcDistance(self, a, b):
        result = numpy.sqrt(sum((a - b) ** 2))
        return result

    def shouldExit(self, iterations):

        if self.oldClusters is None:
            return False
        #计算新的中心和老的中心之间的距离
        for idx in range(self.k):
            dist = self.calcDistance(
                numpy.array(self.clusters[idx].centroid),
                numpy.array(self.oldClusters[idx])
            )
            if dist < self.min_distance:
                return True

        if iterations <= self.max_iterations:
            return False

        return True

    # The remaining methods are used for debugging
    def showImage(self):
        """
        显示原始图像
        """
        self.image.show()

    def showCentroidColours(self):
        """
        显示主要颜色
        """
        for cluster in self.clusters:
            image = Image.new("RGB", (200, 200), cluster.centroid)
            image.show()

    def showClustering(self):
        """
        将原始图像的像素完全替换为主要颜色后的效果
        """
        localPixels = [None] * len(self.image.getdata())

        #enumerate用于既需要遍历元素下边也需要得到元素值的情况,用for循环比较麻烦
        for idx, pixel in enumerate(self.pixels):
                shortest = float(‘Inf‘) #正无穷
                for cluster in self.clusters:
                    distance = self.calcDistance(
                        cluster.centroid,
                        pixel
                    )
                    if distance < shortest:
                        shortest = distance
                        nearest = cluster

                localPixels[idx] = nearest.centroid

        w, h = self.image.size
        localPixels = numpy.asarray(localPixels)            .astype(‘uint8‘)            .reshape((h, w, 3))

        colourMap = Image.fromarray(localPixels)
        return colourMap

if __name__=="__main__":
    from PIL import Image
    import os

    k_image=Kmeans(k=3) #默认参数
    path = ‘./pics/‘
    fp = open(‘file_color.txt‘,‘w‘)
    for filename in os.listdir(path):
        print path+filename
        try:
            color = k_image.run(Image.open(path+filename))
         #   w_image = k_image.showClustering()
            w_image = k_image.showCentroidColours()
            w_image.save(path+‘mean_‘+filename,‘jpeg‘)
            fp.write(‘The color of ‘+filename+‘ is ‘+str(color)+‘\n‘)
        except:
            print "This file format is not support"
    fp.close()

处理前的图片:

  

  处理后的图片:

  

参考:http://blog.zeevgilovitz.com/detecting-dominant-colours-in-python/

时间: 2024-11-29 13:29:04

k-means处理图片的相关文章

软件——机器学习与Python,聚类,K——means

K-means是一种聚类算法: 这里运用k-means进行31个城市的分类 城市的数据保存在city.txt文件中,内容如下: BJ,2959.19,730.79,749.41,513.34,467.87,1141.82,478.42,457.64TianJin,2459.77,495.47,697.33,302.87,284.19,735.97,570.84,305.08HeBei,1495.63,515.90,362.37,285.32,272.95,540.58,364.91,188.63

k means聚类过程

k-means是一种非监督 (从下图0 当中我们可以看到训练数据并没有标签标注类别)的聚类算法 0.initial 1.select centroids randomly 2.assign points 3.update centroids 4.reassign points 5.update centroids 6.reassign points 7.iteration reference: https://www.naftaliharris.com/blog/visualizing-k-me

快速查找无序数组中的第K大数?

1.题目分析: 查找无序数组中的第K大数,直观感觉便是先排好序再找到下标为K-1的元素,时间复杂度O(NlgN).在此,我们想探索是否存在时间复杂度 < O(NlgN),而且近似等于O(N)的高效算法. 还记得我们快速排序的思想麽?通过“partition”递归划分前后部分.在本问题求解策略中,基于快排的划分函数可以利用“夹击法”,不断从原来的区间[0,n-1]向中间搜索第k大的数,大概搜索方向见下图: 2.参考代码: 1 #include <cstdio> 2 3 #define sw

HDU 3657 Game(取数 最小割)经典

Game Time Limit: 4000/2000 MS (Java/Others)    Memory Limit: 32768/32768 K (Java/Others) Total Submission(s): 1065    Accepted Submission(s): 449 Problem Description onmylove has invented a game on n × m grids. There is one positive integer on each g

HDU 5175 Misaki&#39;s Kiss again (异或运算,公式变形)

Misaki's Kiss again Time Limit: 2000/1000 MS (Java/Others)    Memory Limit: 65536/65536 K (Java/Others) Total Submission(s): 201    Accepted Submission(s): 57 Problem Description After the Ferries Wheel, many friends hope to receive the Misaki's kiss

HDU 6065 RXD, tree and sequence (LCA DP)

RXD, tree and sequence Time Limit: 6000/3000 MS (Java/Others)    Memory Limit: 524288/524288 K (Java/Others)Total Submission(s): 234    Accepted Submission(s): 82 Problem Description RXD has a rooted tree T with size n, the root ID is 1, with the dep

Android 常用 adb 命令

在开发或者测试的过程中,我们可以通过 adb 来管理多台设备,其一般的格式为: adb [-e | -d | -s <设备序列号>] <子命令> 在配好环境变量的前提下,在命令窗口当中输入 adb help 或者直接输入 adb ,将会列出所有的选项说明及子命令.这里介绍一些里面常用的命令: 1 adb devices , 获取设备列表及设备状态 2 [xuxu:~]$ adb devices 3 List of devices attached 4 44c826a0 device

HDU 2126 01背包(求方案数)

Buy the souvenirs Time Limit: 10000/1000 MS (Java/Others)    Memory Limit: 32768/32768 K (Java/Others)Total Submission(s): 1886    Accepted Submission(s): 699 Problem Description When the winter holiday comes, a lot of people will have a trip. Genera

自学成才的数据科学家告诉你5个学习大数据的正确姿势!

对于数据科学来说,现在是发展的黄金时期.这是个新领域,但增长迅速,同时数据科学家的缺口也很大,据说他们的平均年薪可以达到10万美元.哪里有高薪,哪里就吸引人们,但是数据科学技能的差距意味着许多人需要努力学习.      第一步当然是询问“我怎么学习数据科学”,这个问题的答案往往是一长串的课程和书籍阅读,从线性代数到统计数据,这几年我也是这样学习过来的.我没有编程背景,但我知道我喜欢处理数据. 我比较不能理解在完全没有理解别人的背景时就给他一份长长的书单或者技能表.这就类似于一个老师给你一堆教科书

fafu 1568 Matrix(二分匹配+二分)

Description: You are given a matrix which is n rows m columns (1 <= n <= m <= 100). You are supposed to choose n elements, there is no more than 1 element in the same row and no more than 1 element in the same column. What is the minimum value of