只是一个文件节点类为了项目的数据处理

  已经研究生二年级下学期了,已经为了这个检索项目写了差不多2年代码了,回想大四下学期就开始接触的这个项目,在研一的时候根本不知道科研如何做,而且项目就自己一个人,也是胡乱写了代码,而且心事太多,简直只能用一个词语形容就是混乱。

  但是在大二上学期10月份的时候,随着一位同学加入简直就是可以说这个项目才真正开始。在我们的系统完成后,我便心血来潮整理我之前写过的代码,因为我们要写论文,所以需要做很多的数据处理来完成实验对比部分,其实这部分数据处理我在大一的时候就已经写过类似的代码,结果现在不得不重新再写,因为写的时间比回想代码时候更短,所以我发现好多代码都重复写了,这是我整理代码的初衷。我更加想的是用一个文件树的数据结构+数据处理算法流程去流水化我们数据处理模块,以后数据处理的代码就可以复用,干苦力的总是应该想办法提高自己的工作效率。所以我带着这个想法实现了下面这个类。用Python写的,因为Python做数据处理,字符处理,批处理真的太便利。其实这个类或许只能我自己用,为什么我会写出一个博客来,或许是因为以后我带研一新生做论文的时候我会让他去看回我们所写过的代码。让他去用我们写过的代码,我并没太多时间带一个新生,所以我让他来看我的博客。

  我的数据结构其实就是个多叉树,用来表示文件目录结构。每一个结点其实就是一个文件,并且用栈和队列实现遍历树的算法,实现添加节点的算法。直接上代码了,以后有时间的时候在回来写注释:

import os
from strOp import strExt
from collections import deque
from tblOp import tblConcat

class FileNode:
    def __init__(self, _fileName_s=‘‘,
                 _brothers=None,
                 _sons=[],
                 _isDir_b=False,
                 _parent= None
                 ):
        self.fileName_s = _fileName_s
        self.bro = _brothers
        self.sons = _sons
        self.isDir_b = _isDir_b
        self.parent = _parent

def addNodeUnderPathUnrecur(root, _path_s):
    ‘‘‘ inputs:
            root -> the root of directory tree. It must give the root of the d
            _path_s -> add the sons under the path of _path_s.
                       if _path_s is equal to ‘D:\\CS_DATA\\‘
                       then all the file under it is added as sons of the node named ‘CS_DATA‘
        outputs:
            Add all the files under _path_s as its sons. The input must give the root of directory
    ‘‘‘
    node = searchNodeFromGivenFilePath(root, _path_s)
    filesUnderPath = os.listdir(_path_s)
    lenOfFilesUnderPath = len(filesUnderPath)
    for i in range(lenOfFilesUnderPath):
        if len(node.sons) == 0:
            newNode = FileNode(filesUnderPath[i], None, [], os.path.isdir(_path_s+filesUnderPath[i]), node)
            node.sons.append(newNode)
        else:
            newNode = FileNode(filesUnderPath[i], None, [], os.path.isdir(_path_s+filesUnderPath[i]), node)
            node.sons[len(node.sons)-1].bro = newNode
            node.sons.append(newNode)
            #isSameName(node, newNode) file system will ensure that no the same name files exist.

def searchNodeFromGivenFilePath(root, _path_s):
    ‘‘‘ inputs:
            root -> Must give the root of directory. Meaning the absolute path of a node.
            _path_s -> The absolute path of a node. Examples: ‘D:\\CS_DATA\\‘
        output:
            Search the directory tree from root to find the node whose fileName_s is equal to ‘CS_DATA‘.
            So, you must give the absolute path. Whether ‘D:\\CS_DATA\\‘ or ‘D:\\CS_DATA‘ would be fine.
    ‘‘‘
    if _path_s[-1] != ‘\\‘:
        _path_s += ‘\\‘

    folderStructure = _path_s.split(‘\\‘)
    if root.bro != None:
        print ‘input root is not root of file tree‘
        return
    if folderStructure[0] != root.fileName_s:
        print ‘the head of input path is not same as root‘
        return
    stack = []
    stack.append(root)
    for i in range(1,len(folderStructure)-1):
        if len(stack) == 0:
            print ‘stack is empty‘
            break
        node = stack.pop()
        flag = 0
        for j in node.sons:
            if folderStructure[i] == j.fileName_s:
                stack.append(j)
                flag = 1
        if flag == 0:
            print ‘can not find the folder %s‘ % folderStructure[i]
            return None
    node = stack.pop()
    return node

def addNodeAsSonFromGivenNode(root, _sonPath_s):
    ‘‘‘ inputs:
            root -> The root of the directory. Which directory that you want to add the node.
            _sonPath_s -> The absolute path of added node.
            Examples: ‘D:\\CS_DATA\\tree\\‘ means add the node named ‘tree‘ to its parent ‘CS_DATA‘
        outputs:
            The directory tree with added node.
    ‘‘‘
    if _sonPath_s[-1] != ‘\\‘:
        _sonPath_s += ‘\\‘
    fileStructure = _sonPath_s.split(‘\\‘)
    lenOfFileStructure = len(fileStructure)
    if lenOfFileStructure <= 2:
        print ‘These is not son in the input path %s‘ % _sonPath_s
        return

    _sonFileName_s = fileStructure[-2]
    _parentPath_s = ‘‘
    for i in range(len(fileStructure)-2):
        _parentPath_s = _parentPath_s + fileStructure[i] + ‘\\‘
    _addNodeAsSonFromGivenNode(root, _parentPath_s, _sonFileName_s)

def _addNodeAsSonFromGivenNode(root, _parentPath_s, _sonFileName_s):
    ‘‘‘ inputs:
            root -> The root of directory tree.
            _parentPath_s -> The absolute path of parent
            _sonFileName_s -> the filename of added node
        outputs:
            This function is a auxiliary function of addNodeAsSonFromGivenNode
    ‘‘‘
    if _parentPath_s[-1] != ‘\\‘:
        _parentPath_s += ‘\\‘

    parentNode = searchNodeFromGivenFilePath(root, _parentPath_s)
    if parentNode == None:
        print ‘can not find the parent folder %s‘ % _parentPath_s
        return None
    if len(parentNode.sons) == 0:
        newNode = FileNode(_sonFileName_s, None, [], os.path.isdir(_parentPath_s+_sonFileName_s), parentNode)
        if isSameName(parentNode, newNode):
            return
        parentNode.sons.append(newNode)
    else:
        newNode = FileNode(_sonFileName_s, None, [], os.path.isdir(_parentPath_s+_sonFileName_s), parentNode)
        if isSameName(parentNode, newNode):
            return
        parentNode.sons[len(parentNode.sons)-1].bro = newNode
        parentNode.sons.append(newNode)

def isSameName(parentNode, sonNode):
    ‘‘‘ inputs:
            parentNode -> The parent node.
            sonNode -> the son node.
        outputs:
            If sonNode is already in parentNode.sons then return True.
    ‘‘‘
    for node in parentNode.sons:
        if node.fileName_s == sonNode.fileName_s:
            print ‘has same node %s\\%s -> %s‘ % (parentNode.fileName_s, node.fileName_s, sonNode.fileName_s)
            return True
    return False

def addNodeUnderPathRecur(root, _path_s):
    ‘‘‘ inputs:
            root -> The root of directory.
            _path_s -> The absolute path wanted to be added. Examples: ‘D:\\CS_DATA\\‘
        outputs:
            1. Add all the file nodes under _path_s recursively.
            2. The _path_s must exist in root.
        Unsafe:
            1. Some system directory can not be added recursively. Examples: ‘D:\\System Volume Information‘
            2. I do not make the judgment between files whether have same name when adding.
            3. So, this function must use in the premise of operation system ensuring the rule for us.
    ‘‘‘
    if _path_s[-1] != ‘\\‘:
        _path_s = _path_s + ‘\\‘

    fileStructure = _path_s.split(‘\\‘)
    if fileStructure[0] == root.fileName_s and len(fileStructure) == 2:
        print ‘_path_s can not be the root‘
        return

    returnNode = currentNode = searchNodeFromGivenFilePath(root, _path_s)
    if currentNode == None:
        print ‘can not find the path‘
        return
    queue = deque([])
    fileName_sl = os.listdir(_path_s)
    for fileName_s in fileName_sl:
        file_s = _path_s + fileName_s
        newNode = FileNode(fileName_s, None, [], os.path.isdir(file_s), currentNode)
        queue.append(newNode)
    while(len(queue) != 0):
        newNode = queue.popleft()
        currentNode = newNode.parent
        lenOfSonsCurrentNode = len(currentNode.sons)
        if lenOfSonsCurrentNode == 0:
            currentNode.sons.append(newNode)
        else:
            currentNode.sons[lenOfSonsCurrentNode-1].bro = newNode
            currentNode.sons.append(newNode)

        if newNode.isDir_b == True:
            fullPathOfNewNode = getFullPathOfNode(newNode)
            subFileName_sl = os.listdir(fullPathOfNewNode)
            for subFileName_s in subFileName_sl:
                subNewNode = FileNode(subFileName_s, None, [], os.path.isdir(fullPathOfNewNode+subFileName_s), newNode)
                queue.append(subNewNode)
    return returnNode       

def printBrosOfGivenNode(root, _path_s):
    ‘‘‘ inputs:
            root -> The root of the directory.
            _path_s -> Examples: ‘D:\\CS_DATA‘ , ‘D:\\CS_DATA\\‘
        outputs:
            print out the bros of ‘CS_DATA‘ for ‘D:\\CS_DATA‘
            print out the sons of ‘CS_DATA‘ for ‘D:\\CS_DATA\\‘
    ‘‘‘
    if _path_s[-1] != ‘\\‘:
        node = searchNodeFromGivenFilePath(root, _path_s)
        if node == None:
            print ‘can not find the node‘
        parentOfNode = node.parent
        headOfSons = parentOfNode.sons[0]
        printStr = headOfSons.fileName_s + ‘,‘
        while(headOfSons.bro != None):
            headOfSons = headOfSons.bro
            printStr = printStr + headOfSons.fileName_s + ‘,‘
    else:
        node = searchNodeFromGivenFilePath(root, _path_s)
        if node == None:
            print ‘can not find the node‘
        printStr = ‘‘
        if len(node.sons) == 0:
            print ‘its sons is empty‘
        else:
            for son in node.sons:
                printStr = printStr + son.fileName_s + ‘,‘
    print printStr[:-1]

def crtFileTreeFromPath(_path_s):
    ‘‘‘ inputs:
            _path_s -> Examples: ‘D:\\sketchDataset\\‘
        outputs:
            This function will create the root node by ‘D:‘,
            and then, call addNodeUnderPathUnrecur to add files under ‘D:\\‘,
            and then, again call addNodeUnderPathUnrecur to add files under ‘D:\\sketchDataset\\‘
            This process is a loop until the last separator of _path_s.
    ‘‘‘
    if _path_s[-1] != ‘\\‘:
        _path_s += ‘\\‘
    fileStructure = _path_s.split(‘\\‘)
    lenOfFileStructure = len(fileStructure)
    root = FileNode(_fileName_s=fileStructure[0], _isDir_b=os.path.isdir(fileStructure[0]))

    fileStr = root.fileName_s + ‘\\‘
    addNodeUnderPathUnrecur(root, fileStr)
    for i in range(1, lenOfFileStructure-1):
        file_s = fileStructure[i]
        fileStr = fileStr + file_s + ‘\\‘
        addNodeUnderPathUnrecur(root, fileStr)
    return root

def searchLeafNodeUnderGivenNode(root, _path_s):
    ‘‘‘ inputs:
            root -> For the given directory tree.
            _path_s -> The absolute path of node that wanted to search all the leafs under it.
        outputs:
            Return all the leafs under the given _path_s.
            Leaf is the file whose has not sons and it is not a directory
    ‘‘‘
    node = searchNodeFromGivenFilePath(root, _path_s)
    leafs = []
    if node == None:
        print ‘can not find the node in searchLeafNodeUnderGivenNode‘
        return
    queue = deque([])
    queue.append(node)
    while(len(queue) != 0):
        currentNode = queue.popleft()
        if len(currentNode.sons) == 0 and (currentNode.isDir_b == False):
            leafs.append(currentNode)
        else:
            for son in currentNode.sons:
                queue.append(son)
    return leafs        

def getFullPathOfNode(givenNode):
    ‘‘‘
        find the full(absolute) path of the input node.
    ‘‘‘
    tmpNode = givenNode
    fullPathOfNode = tmpNode.fileName_s + ‘\\‘
    while(tmpNode.parent != None):
        tmpNode = tmpNode.parent
        fullPathOfNode = tmpNode.fileName_s + ‘\\‘ + fullPathOfNode
    return fullPathOfNode

比如我要计算草图检索的验证集,可以上上面的代码后面添加代码:

if __name__ == ‘__main__‘:
    root = crtFileTreeFromPath(‘D:\\sketchDataset\\‘)
    categroyNode = addNodeUnderPathRecur(root, ‘D:\\sketchDataset\\category\\‘)
    leafs = searchLeafNodeUnderGivenNode(root, ‘D:\\sketchDataset\\category\\‘)
    containModel_t = {}
    for i in range(len(leafs)):
        if leafs[i].parent.fileName_s not in containModel_t:
            containModel_t[leafs[i].parent.fileName_s] = []

            containModel_t[leafs[i].parent.fileName_s].append(strExt.extractModelIdWithSuffix(leafs[i].fileName_s, suffix_s=‘.off‘))
        else:
            containModel_t[leafs[i].parent.fileName_s].append(strExt.extractModelIdWithSuffix(leafs[i].fileName_s, suffix_s=‘.off‘))

    categroyNode = addNodeUnderPathRecur(root, ‘D:\\sketchDataset\\all_categorized_sketches\\‘)
    sketchToCate_t = {}
    for son in categroyNode.sons:
        sketchNodes = son.sons
        for sketchNode in sketchNodes:
            sketchName = strExt.extractSketchNameWithSuffix(sketchNode.fileName_s, suffix_s=‘.txt‘)
            if sketchName not in sketchToCate_t:
                sketchToCate_t[sketchName] = son.fileName_s

    wanted = tblConcat.concatTableByKey_ValAndVal_Vals(sketchToCate_t, containModel_t)
    print wanted

结果就是,也就是草图165号的验证模型是‘m1646.off, m1647.off‘等等。

{‘s165.txt‘: [‘m1646.off‘, ‘m1647.off‘, ‘m1648.off‘, ‘m1649.off‘, ‘m1650.off‘, ‘m1651.off‘, ‘m1652.off‘, ‘m1653.off‘, ‘m1654.off‘, ‘m1655.off‘, ‘m1656.off‘, ‘m1657.off‘, ‘m1658.off‘, ‘m1659.off‘, ‘m1660.off‘, ‘m1661.off‘, ‘m1662.off‘, ‘m1663.off‘, ‘m1664.off‘, ‘m1665.off‘] ......}
时间: 2024-10-10 10:01:10

只是一个文件节点类为了项目的数据处理的相关文章

C++学习47 文件的概念 文件流类与文件流对象 文件的打开与关闭

迄今为止,我们讨论的输入输出是以系统指定的标准设备(输入设备为键盘,输出设备为显示器)为对象的.在实际应用中,常以磁盘文件作为对象.即从磁盘文件读取数据,将数据输出到磁盘文件.磁盘是计算机的外部存储器,它能够长期保留信息,能读能写,可以刷新重写,方便携带,因而得到广泛使用. 文件(file)是程序设计中一个重要的概念.所谓“文件”,一般指存储在外部介质上数据的集合.一批数据是以文件的形式存放在外部介质(如磁盘.光盘和U盘)上的.操 作系统是以文件为单位对数据进行管理的,也就是说,如果想找存在外部

C++文件流类与文件流对象具体介绍

文件流是以外存文件为输入输出对象的数据流.输出文件流是从内存流向外存文件的数据,输入文件流是从外存文件流向内存的数据.每一个文件流都有一个内存缓冲区与之对应. 请区分文件流与文件的概念,不用误以为文件流是由若干个文件组成的流.文件流本身不是文件,而只是以文件为输入输出对象的流.若要对磁盘文件输入输出,就必须通过文件流来实现. 在C++的I/O类库中定义了几种文件类,专门用于对磁盘文件的输入输出操作.在 图13.2(详情请查看:与C++输入输出有关的类和对象)中可以看到除了标准输入输出流类istr

【spring cloud】导入一个新的spring boot项目作为spring cloud的一个子模块微服务,怎么做/或者 每次导入一个新的spring boot项目,IDEA不识别子module,启动类无法启动/右下角没有蓝色图标

如题:导入一个新的spring boot项目作为spring cloud的一个子模块微服务,怎么做 或者说每次导入一个新的spring boot项目,IDEA不识别,启动类无法启动,怎么解决 下面一起来走一遍这个流程: 1.将一个spring boot服务导入spring cloud中作为一个子模块 如图:这里有一个现成的spring cloud微服务集群,[如何创建一个spring cloud微服务:https://www.cnblogs.com/sxdcgaq8080/p/9035724.h

PHP加载另一个文件类的方法

加载另一个文件类的方法 当前文件下有a.php 和b.php,想要在class b中引入class a <?php    class a    {        public $name = 'zhouqi';        public function say()        {            echo 'hello '.$this->name;        }    } <?php    class b    {        //require('a.php'); 错

一个Ini文件解析类CIniFileParse

使用方法非常简单,看看源代码就明白了,该类支持Unicode文本. 一个Ini文件解析类CIniFileParse.rar

在eclipse导入项目如何将多个项目放入一个文件夹中

在Package Explorer顶部的右侧有有机表图标按钮, 点击倒三角 Top Level Elements->Working Set.此时就会发现,很多项目会自动纳入一个文件夹,这个文件夹的名字叫做other Projects,这是默认的 如果想自己建立自己的文件夹,点击倒三角,Configure Working Set,会出现弹框. 出现弹框后,点击New按钮,就可以创建文件夹. 在此界面,双击某一文件夹,就会进入编辑界面,可以修改文件夹名称,也可以选择将 那个项目放置到该目录下. 想删

Eclipse里的web项目名有红叉,但是底下的每一个文件都没有红叉解决方法

问题描述:Eclipse里的web项目名有红叉,但是底下的每一个文件都没有红叉? 原       因:原因其实很多,这里我只记录我出错原因:是因为用了Eclipse自带的basic下面的服务器,然后就出错了. 解决方法:重新new一个Apache(tomcat)服务,成功解决. 操作:点击服务-->右键--->点击 new-->点击 Server-->选择Apache下的你有的tomcat(需要自己下载)-->点击finish .完成 具体如图:

eclipse如何把多个项目放在一个文件夹下【eclipse中对项目进行分类管理】-图示详解

1.在Package Explorer顶部的右侧的倒数第三个有个倒三角,点击倒三角 2.选择Top Level Elements->Working Set 3.此时就会发现,很多项目会自动纳入一个文件夹,这个文件夹的名字叫做other Projects,这是默认的 4.如果想自己建立自己的文件夹,点击倒三角,Configure Working Set,会出现弹框. 5.在上界面中,点New进入以下界面.双击某一文件夹,就会进入编辑界面,可以修改文件夹名称,也可以选择将那个项目放置到该目录下.想删

扩展银行项目,添加一个(客户类)Customer类。Customer类将包含一个Account对象。

练习目标-使用引用类型的成员变量:在本练习中,将扩展银行项目,添加一个(客户类)Customer类.Customer类将包含一个Account对象. 任务 在banking包下的创建Customer类.该类必须实现上面的UML图表中的模型. a. 声明三个私有对象属性:firstName.lastName和account. b. 声明一个公有构造器,这个构造器带有两个代表对象属性的参数(f和l) c. 声明两个公有存取器来访问该对象属性,方法getFirstName和getLastName返回相