用Java实现MVPtree——MVPtree核心算法代码的搭建

  项目需要,需要把MVPtree这种冷门的数据结构写入Java,然网上没有成形的Java实现,虽说C++看惯了不过对C++实现复杂结构也是看得蒙蔽,幸好客户给了个github上job什么的人用Java写的VPtree,大体结构可以嵌入MVPtree。

  对于MVPtree的其他信息请左转百度= =本文只讲述算法实现。

  点查找树结构主要需解决的问题有2个:如何减少非必要点的搜索,以及如何减少距离计算次数。前者的解决方法比较容易想到,把点集分割为左右对称的两半长方形,或者脑洞大点的,通过距离切分(效率很高,因为所有查询都是基于点距离的)成为圆和圆环。后者适用面不是很广,优化思路通常是预先计算与基准点的距离,查询点时筛点。

  VPtree就是使用距离划分点集的例子。每个结点一个点集,随意定个点作为基准点,然后把点集根据与基准点距离分成数量相等的2个子集,这2个子集再分别进入此结点的子结点,用点查找出点集的过程如出一辙,但是没有对第2点进行优化,这个结构适合于距离函数是曼哈顿距离或者欧几里得距离的情况。

  MVPtree继承了VPtree用距离划分的特点,只不过一个结点会划分4个点集,同时通过path数组限制距离函数运行次数。划分为4个点集而不是2个点集,可以分割得细一些,减少无效点;使用一定数量的基准点限制,可以在查询频繁的情况下减少距离计算次数,并且这些基准点通常被切分得很散,大片大片的无效区域被排除了,效果拔群。这个结构适合于距离函数是计算次数过高的切比雪夫函数之流。

  接下来就是代码的实现了。

  MVPtree与VPtree的点有个不同之处,就是MVPtree的点还附上了与基准点的距离数组,这里就需要使用特别的点数据结构:MVPtree用点

  核心代码如下:

public class MVPTreePoint<P> {

    private ArrayList<Double> path;

    private P point;

    private final int maxLevel;

    public MVPTreePoint(final P point, final int maxLevel) {
        this.point = point;
        this.maxLevel = maxLevel;
        this.path = new ArrayList<>();
    }

    public void addDistanceToSelf(final MVPTreePoint<P> vantagePointElement, final DistanceFunction<P> distanceFunction) {
        if(this.path.size() < this.maxLevel)
            this.path.add(distanceFunction.getDistance(this.point, vantagePointElement.point));
    }

    public void addDistanceToSelf(final P vantagePoint, final DistanceFunction<P> distanceFunction) {
        if(this.path.size() < this.maxLevel)
            this.path.add(distanceFunction.getDistance(this.point, vantagePoint));
    }

    public void addDistanceToSelf(final double distance) {
        if(this.path.size() < this.maxLevel) {
            this.path.add(distance);
        }
    }

    public void removeDistanceToSelf(final int position) {
        if(position < this.path.size()) {
            this.path.remove(position);
        }
    }

    public double getDistanceToSelf(int i) {
        return this.path.get(i);
    }

    public int size() {
        return this.path.size();
    }

    public void clearPath() {
        this.path.clear();
    }

    public P getPoint() {
        return this.point;
    }

    @SuppressWarnings("unchecked")
    public boolean equals(Object o){
        MVPTreePoint<P> t = (MVPTreePoint<P>) o;
        return this.point.equals(t.point);
    }
}

MVPTreePoint

  把距离数组写到点类上而不是集成到树结点类上,结构会清晰一些,并且从点里取出距离也方便。

  MVPtree与VPtree有好多不同的地方,但是好多都只是改一下类名,把P,E改成MVPTreePoint<P>,MVPTreePoint<E>,这里主讲核心算法——初始化树和点查询。

  初始化MVPtree不仅要多选出一个基准点,多切分2次数组,还要把基准点到每个点的距离都分别储存起来。

  capacity就是叶子结点的容量,要设中间一些,根据数据规模定吧。

  原论文把基准点从点集取出来放到单独的位置上,但是实际编写程序时,把基准点仅仅当作一个基准点,基准点还是作为点集的一部分初始化。这样,数据结构仅仅是多出quantityOfPoint/capacity个点,但是程序编写方便了很多。

public MVPTreeNode(
            final Collection<MVPTreePoint<E>> pointNodes,
            final DistanceFunction<P> distanceFunction,
            final MVPThresholdSelectionStrategy<P, E> thresholdSelectionStrategy,
            final int capacity, final int maxLevel) {

        if (capacity < 1) {
            throw new IllegalArgumentException("Capacity must be positive.");
        }

        if (pointNodes.isEmpty()) {
            throw new IllegalArgumentException(
                    "Cannot create a MVPTreeNode with an empty list of points.");
        }

        this.capacity = capacity;
        this.maxLevel = maxLevel;
        this.distanceFunction = distanceFunction;
        this.thresholdSelectionStrategy = thresholdSelectionStrategy;
        this.pointNodes = new ArrayList<>(pointNodes);
        this.children = new MVPTreeNode[2][2];
        this.vantagePoint = (E[]) new Object[2];
        this.secondThreshold = new double[2];

        this.anneal();
    }

    protected void anneal() {
        if (this.pointNodes == null) {
            int childrenSize[][] = new int[2][2];
            for (int i = 0; i < 2; i++) {
                for (int j = 0; j < 2; j++) {
                    childrenSize[i][j] = this.children[i][j].size();
                }
            }

            if (childrenSize[0][0] == 0 || childrenSize[0][1] == 0
                    || childrenSize[1][0] == 0 || childrenSize[1][1] == 0) {
                // One of the child nodes has become empty, and needs to be
                // pruned.
                this.pointNodes = new ArrayList<>(childrenSize[0][0]
                        + childrenSize[0][1] + childrenSize[1][0]
                        + childrenSize[1][1]);
                this.addAllPointsToCollection(this.pointNodes);
                for (MVPTreePoint<E> pointNode : this.pointNodes) {
                    pointNode.clearPath();
                }
                for (int i = 0; i < 2; i++) {
                    for (int j = 0; j < 2; j++) {
                        this.children[i][j] = null;
                    }
                }
                this.anneal();
            } else {
                for (int i = 0; i < 2; i++) {
                    for (int j = 0; j < 2; j++) {
                        this.children[i][j].anneal();
                    }
                }
            }
        } else {
            int firstVantagePointIndex = new Random().nextInt(this.pointNodes
                    .size());
            this.vantagePoint[0] = this.pointNodes.get(firstVantagePointIndex)
                    .getPoint();
            this.firstThreshold = this.thresholdSelectionStrategy
                    .selectThreshold(this.pointNodes, this.vantagePoint[0],
                            this.distanceFunction);
            int firstIndexPastThreshold;
            try {
                firstIndexPastThreshold = MVPTreeNode.partitionPoints(
                        this.pointNodes, this.vantagePoint[0],
                        this.firstThreshold, this.distanceFunction);

            } catch (final PartitionException e) {
                this.storeInOneNode();
                return;
            }

            if (this.pointNodes.size() > this.capacity) {
                List<MVPTreePoint<E>> subTreeList[] = new List[2];

                subTreeList[0] = this.pointNodes.subList(0,
                        firstIndexPastThreshold);
                subTreeList[1] = this.pointNodes.subList(
                        firstIndexPastThreshold, this.pointNodes.size());

                // if points can be divided into 2 parts, find second vantage
                // point and try to split point array
                int secondVantagePointIndex = new Random()
                        .nextInt(subTreeList[1].size());
                this.vantagePoint[1] = subTreeList[1].get(
                        secondVantagePointIndex).getPoint();
                int splitPosition[] = new int[2];
                for (int i = 0; i < 2; i++) {
                    this.secondThreshold[i] = this.thresholdSelectionStrategy
                            .selectThreshold(subTreeList[i],
                                    this.vantagePoint[1], this.distanceFunction);
                    try {
                        splitPosition[i] = MVPTreeNode.partitionPoints(
                                subTreeList[i], this.vantagePoint[1],
                                this.secondThreshold[i], this.distanceFunction);
                    } catch (final PartitionException e) {
                        this.storeInOneNode();
                        return;
                    }
                }
                for (MVPTreePoint<E> pointNode : this.pointNodes) {
                    pointNode.addDistanceToSelf(this.distanceFunction
                            .getDistance(pointNode.getPoint(),
                                    this.vantagePoint[0]));
                    pointNode.addDistanceToSelf(this.distanceFunction
                            .getDistance(pointNode.getPoint(),
                                    this.vantagePoint[1]));
                }
                for (int i = 0; i < 2; i++) {
                    this.children[i][0] = new MVPTreeNode<>(
                            subTreeList[i].subList(0, splitPosition[i]),
                            this.distanceFunction,
                            this.thresholdSelectionStrategy, this.capacity,
                            this.maxLevel);
                    this.children[i][1] = new MVPTreeNode<>(
                            subTreeList[i].subList(splitPosition[i],
                                    subTreeList[i].size()),
                            this.distanceFunction,
                            this.thresholdSelectionStrategy, this.capacity,
                            this.maxLevel);
                }
                this.pointNodes = null;
            } else {
                this.storeInOneNode();
            }
        }
    }

    private void storeInOneNode() {
        int maxIndex = 0;
        double maxDistance = this.distanceFunction.getDistance(this.pointNodes
                .get(0).getPoint(), this.vantagePoint[0]);
        for (int i = 1; i < this.pointNodes.size(); i++) {
            double curDistance = this.distanceFunction.getDistance(
                    this.pointNodes.get(i).getPoint(), this.vantagePoint[0]);
            if (maxDistance < curDistance) {
                maxDistance = curDistance;
                maxIndex = i;
            }
        }
        this.vantagePoint[1] = this.pointNodes.get(maxIndex).getPoint();

        for (int i = 0; i < 2; i++) {
            for (int j = 0; j < 2; j++) {
                this.children[i][j] = null;
            }
        }
    }

init MVPtree

  原作者给出了2种查询方式:找离查询点前k近点和找离查询点不远于u点。

  找离查询点前k点的算法可以沿用查询VPtree时的做法,先查找查询点所在的子结点,再查找其他子结点,注意要先判定收集者是否装满(没装满的话,不管是啥点都直接塞),再判定收集者与查询点的最远距离(对第二种查找方式来说是固定距离)是否小于点/点集与查询点的最近距离(在树结点和叶子结点都有用处)。

public void collectNearestNeighbors(
            final NearestNeighborCollector<P, E> collector, int depth) {
        if (this.pointNodes == null) {
            // O1-Q
            final double distanceFromFirstVantagePointToQueryPoint = this.distanceFunction
                .getDistance(this.vantagePoint[0],
                    collector.getQueryPoint().getPoint());

            // O2-Q
            final double distanceFromSecondVantagePointToQueryPoint = this.distanceFunction
                .getDistance(this.vantagePoint[1],
                    collector.getQueryPoint().getPoint());

            collector.getQueryPoint().addDistanceToSelf(
                    distanceFromFirstVantagePointToQueryPoint);
            collector.getQueryPoint().addDistanceToSelf(
                    distanceFromSecondVantagePointToQueryPoint);

            final MVPTreeNode<P, E> index = this
                    .getChildNodeForPoint(collector.getQueryPoint().getPoint());
            index.collectNearestNeighbors(collector, depth + 1);

            // O1-Q - O1-S1
            double basicDistance = distanceFromFirstVantagePointToQueryPoint
                    - this.firstThreshold;

            for(int i = 0;i < 2;i ++){
                if (!collector.isFull() || basicDistance <= collector.getRadius()) {
                    // O2-Q - O2-S2
                    double touchDistance = distanceFromSecondVantagePointToQueryPoint
                            - this.secondThreshold[i];

                    for(int j = 0;j < 2;j ++){
                        if (index != this.children[i][j]
                                && (!collector.isFull() || touchDistance <= collector.getRadius())) {
                            this.children[i][j].collectNearestNeighbors(collector, depth + 1);
                        }
                        touchDistance *= -1;
                    }
                }
                basicDistance *= -1;
            }
            collector.getQueryPoint().removeDistanceToSelf(depth + depth + 1);
            collector.getQueryPoint().removeDistanceToSelf(depth + depth);
        } else {
            for (final MVPTreePoint<E> pointNode : this.pointNodes) {
                if(!collector.isFull() || this.isAbleToInsert(collector.getRadius(),
                                collector.getQueryPoint(), pointNode)) {
                    collector.offerPoint(pointNode.getPoint());
                }
            }
        }
    }

collectNearestNeighbors

  找离查询点不远于u点算法就是论文里讲述的算法,执行步骤与收集第k近有相同之处,不同在于限定距离是固定值,且任何时候都必须判定

public void collectAllWithinDistance(final MVPTreePoint<P> queryPoint,
            final double maxDistance, final Collection<E> collection, int depth) {
        if (this.pointNodes == null) {
            final double distanceFromFirstVantagePointToQueryPoint = this.distanceFunction
                    .getDistance(this.vantagePoint[0], queryPoint.getPoint());
            final double distanceFromSecondVantagePointToQueryPoint = this.distanceFunction
                    .getDistance(this.vantagePoint[1], queryPoint.getPoint());

            queryPoint
                    .addDistanceToSelf(distanceFromFirstVantagePointToQueryPoint);
            queryPoint
                    .addDistanceToSelf(distanceFromSecondVantagePointToQueryPoint);

            // We want to search any of this node‘s children that intersect with
            // the query region
            if (distanceFromFirstVantagePointToQueryPoint <= this.firstThreshold
                    + maxDistance) {
                if (distanceFromSecondVantagePointToQueryPoint <= this.secondThreshold[0]
                        + maxDistance) {
                    this.children[0][0].collectAllWithinDistance(queryPoint,
                            maxDistance, collection, depth + 1);
                }

                if (distanceFromSecondVantagePointToQueryPoint + maxDistance >= this.secondThreshold[0]) {
                    this.children[0][1].collectAllWithinDistance(queryPoint,
                            maxDistance, collection, depth + 1);
                }
            }

            if (distanceFromFirstVantagePointToQueryPoint + maxDistance >= this.firstThreshold) {
                if (distanceFromSecondVantagePointToQueryPoint <= this.secondThreshold[1]
                        + maxDistance) {
                    this.children[1][0].collectAllWithinDistance(queryPoint,
                            maxDistance, collection, depth + 1);
                }

                if (distanceFromSecondVantagePointToQueryPoint + maxDistance >= this.secondThreshold[1]) {
                    this.children[1][1].collectAllWithinDistance(queryPoint,
                            maxDistance, collection, depth + 1);
                }
            }
            queryPoint.removeDistanceToSelf(depth + depth + 1);
            queryPoint.removeDistanceToSelf(depth + depth);
        } else {
            for (MVPTreePoint<E> pointNode : pointNodes) {
                if (this.isAbleToInsert(maxDistance, queryPoint, pointNode))
                    collection.add(pointNode.getPoint());
            }
        }
    }

collectAllWithinDistance

  这两种查询方式都需要比较预先计算的距离,把这种计算合为一个函数:

public boolean isAbleToInsert(double limitDistance,
            MVPTreePoint<P> queryPoint, MVPTreePoint<E> pointNode) {

        for (int i = 0; i < queryPoint.size(); i++) {
            double disOffset = queryPoint.getDistanceToSelf(i)
                    - pointNode.getDistanceToSelf(i);

            if (Math.abs(disOffset) > limitDistance) {
                return false;
            }
        }

        return this.distanceFunction.getDistance(pointNode.getPoint(),
                queryPoint.getPoint()) <= limitDistance;
    }

isAbleToInsert

  其他函数也需要修改,但是没有像这3个函数一样大幅度的修改结构。

时间: 2024-10-02 03:44:06

用Java实现MVPtree——MVPtree核心算法代码的搭建的相关文章

x264代码剖析(十五):核心算法之宏块编码中的变换编码

x264代码剖析(十五):核心算法之宏块编码中的变换编码 为了进一步节省图像的传输码率,需要对图像进行压缩,通常采用变换编码及量化来消除图像中的相关性以减少图像编码的动态范围.本文主要介绍变换编码的相关内容,并给出x264中变换编码的代码分析. 1.变换编码 变换编码将图像时域信号变换成频域信号,在频域中图像信号能量大部分集中在低频区域,相对时域信号,码率有较大的下降. H.264对图像或预测残差采用4×4整数离散余弦变换技术,避免了以往标准中使用的通用8×8离散余弦变换逆变换经常出现的失配问题

java文本相似度计算(Levenshtein Distance算法(中文翻译:编辑距离算法))----代码和详解

算法代码实现: package com.util; public class SimFeatureUtil { private static int min(int one, int two, int three) { int min = one; if (two < min) { min = two; } if (three < min) { min = three; } return min; } public static int ld(String str1, String str2)

java桶式排序算法代码下载

原文:java桶式排序算法代码下载 代码下载地址:http://www.zuidaima.com/share/1550463272176640.htm 桶式排序: * 桶式排序不再是基于比较的了,它和基数排序同属于分配类的排序, * 这类排序的特点是事先要知道待排 序列的一些特征. * 桶式排序事先要知道待排 序列在一个范围内,而且这个范围应该不是很大的. * 比如知道待排序列在[0,M)内,那么可以分配M个桶,第I个桶记录I的出现情况, * 最后根据每个桶收到的位置信息把数据输出成有序的形式.

x264代码剖析(十七):核心算法之熵编码(Entropy Encoding)

x264代码剖析(十七):核心算法之熵编码(Entropy Encoding) 熵编码是无损压缩编码方法,它生产的码流可以经解码无失真地恢复出原始数据.熵编码是建立在随机过程的统计特性基础上的.本文对熵编码中的CAVLC(基于上下文自适应的可变长编码)和CABAC(基于上下文的自适应二进制算术熵编码)进行简单介绍,并给出x264中熵编码对应的代码分析. 在H.264的CAVLC中,通过根据已编码句法元素的情况,动态调整编码中使用的码表,取得了极高的压缩比.CAVLC用于亮度和色度残差数据的编码,

【转】Java数字抽奖游戏核心代码

1. [代码][Java]代码    package com.luiszhang.test; import java.util.Arrays; /** * NumberLotteryGame * 一个简单的数字彩票游戏类 * @author LuisZhang * 参考了core java 8th中的例3-7的设计思想 */public class NumberLotteryGame {    private int gamesNumber;    // 生成游戏的数量,为以后多线程扩展做考虑 

java排列组合算法代码实现

原文:java排列组合算法代码实现 源代码下载地址:http://www.zuidaima.com/share/1550463479024640.htm java排列组合算法,有需要研究的童鞋可以下载,运行结果如下: package com.zuidaima.test; /** *@author www.zuidaima.com **/ public class Pailie { public static void main(String[] args) { int[] ia = {1, 2,

JAVA md5算法代码

public class MD5 { /** * 签名字符串 * @param text 需要签名的字符串 * @param key 密钥 * @param input_charset 编码格式 * @return 签名结果 */ public static String sign(String text, String key, String input_charset) { text = text + key; return DigestUtils.md5Hex(getContentByte

java基数排序算法代码下载

原文:java基数排序算法代码下载 代码下载地址:http://www.zuidaima.com/share/1550463272684544.htm 基数排序:基数排序可以说是扩展了的桶式排序, * 比如当待排序列在一个很大的范围内,比如0到999999内,那么用桶式排序是很浪费空间的. * 而基数排序把每个排序码拆成由d个排序码,比如任何一个6位数(不满六位前面补0)拆成6个排序码, * 分别是个位的,十位的,百位的.... * 排序时,分6次完成,每次按第i个排序码来排. * 一般有两种方

java冒泡排序Bubble Sort算法代码

java冒泡排序Bubble Sort算法代码 下载代码地址:http://www.zuidaima.com/share/1550463269096448.htm package com.zuidaima.util; /** *冒泡排序 *@paramsrc待排序数组 *@author www.zuidaima.com */ void doBubbleSort(int[] src) { int len=src.length; for(int i=0;i<len;i++) { for(int j=