python读取文本数据写入到数据库及查询优化

文本数据格式

ip2int函数用于IP地址转化为整数

int2ip函数用于整数转化为IP地址

insert_row函数用于插入数据库记录

from __future__ import print_function
import torndb

def get_mysql_conn():
    return torndb.Connection(
        host=mysql["host"] + ":" + mysql["port"],
        database=mysql["database"],
        user=mysql["user"],
        password=mysql["password"],
        charset="utf8")

mysql = {
        "host": "127.0.0.1",
        "port": "3306",
        "database": "test",
        "password": "",
        "user": "root",
        "charset": "utf8"
    }

def ip2int(ip):
    try:
        hexn = ‘‘.join(["%02X" % long(i) for i in ip.split(‘.‘)])
    except Exception, e:
        hexn = ‘‘.join(["%02X" % long(i) for i in ‘0.0.0.0‘.split(‘.‘)])
    return long(hexn, 16)

def int2ip(n):
    d = 256 * 256 * 256
    q = []
    while d > 0:
        m,n = divmod(n,d)
        q.append(str(m))
        d = d/256
    return ‘.‘.join(q)

def insert_row():
    with open("./ipdata.csv", ‘r‘) as fr:
        lines = fr.readlines()
    nl_p_list = []
    for l in lines:
        ls = l.strip().split(‘,‘, 4)
        c1, c2, c3, c4, c5 = ls[0], ip2int(ls[1]), ip2int(ls[2]), ls[3], ls[4]
        nl = [c2, c3, c4, c5]
        nl_p_list.append(nl)

    db = get_mysql_conn()
    db.execute("START TRANSACTION")
    for i in range(len(nl_p_list)/1000 + 1):
        tmp_nl_p_list = nl_p_list[i*1000: (i+1)*1000]
        ret = db.insertmany(‘insert into ipdata (startip, endip, country, carrier) values (%s, %s, %s, %s)‘, tmp_nl_p_list)
    db.execute("COMMIT")

if __name__ == ‘__main__‘:
    insert_row()
    # print(ip2int(‘106.39.222.36‘))
    with open("./ipdata.csv", ‘r‘) as fr:
        lines = fr.readlines()
    nl_p_list = []
    for l in lines:
        ls = l.strip().split(‘,‘, 4)
        c1, c2, c3, c4, c5 = ls[0], ip2int(ls[1]), ip2int(ls[2]), ls[3], ls[4]
        nl = [c2, c3, c4, c5]
        nl_p_list.append(nl)
    import random
    import time
    ip_list = map(lambda x: x[1], random.sample(nl_p_list, 100))
    db = get_mysql_conn()
    ret_list = []
    #{0}表名
    sql_tmp = ‘select {0}.* from (SELECT * FROM `test`.ipdata where %s>=startip order by startip Desc limit 1) {0}‘
    sql_list = []
    #拼接一个很长的sql
    for i in range(len(ip_list)):
        sql_list.append(sql_tmp.format(‘t‘ + str(i)) % ip_list[i])
    sql = ‘ union all ‘.join(sql_list)
    t0 = time.time()
    # for row in db.query(sql):
    #     print(row)
    dict(zip(ip_list, db.query(sql)))

    t1 = time.time()
    for ip in ip_list:
        ret = db.get(‘SELECT * FROM `test`.ipdata where %s>=startip order by startip Desc limit 1‘, ip)
        startip, endip = ret.get(‘startip‘), ret.get(‘endip‘)
        if startip <= ip <= endip:
            ret_list.append((ip, ret.get(‘country‘)))
        else:
            ret_list.append((ip, u"unknown"))
    t2 = time.time()
    print(t1-t0)
    print(t2-t1)

格式化输出字符串函数format()

使用字符串的参数使用{NUM}进行表示,0, 表示第一个参数,1, 表示第二个参数, 以后顺次递加;

zip()函数接受任意多个（包括0个和1个）序列作为参数，返回一个tuple列表

dict()函数是从可迭代对象来创建新字典。比如一个元组组成的列表

参考文章：

Python标准库：内置函数dict

http://www.2cto.com/kf/201411/354739.html

优化的途径：

字段加索引效率提高1000倍

使用union all一次查询查出

时间： 2024-12-22 09:07:03

python读取文本数据写入到数据库及查询优化的相关文章

用python在后端将数据写入到数据库并读取

用python在后端将数据写入到数据库: # coding:utf-8 import pandas as pd from sqlalchemy import create_engine # 初始化数据库连接,使用pymysql模块 # MySQL的用户:root, 密码:147369, 端口:3306,数据库:mydb engine = create_engine('mysql+pymysql://root:[email protected]:3306/python1') import nump

python读取文本、配对、插入数据脚本

#-*- coding:UTF-8 -*- #-*- author:Zahoor Wang -*- import codecs, os, sys, platform, string def env(): return platform.system() def read_file(uri, charset = "utf-8"): f = codecs.open(uri, "r", charset) s = f.read() f.close() return s de

python 读取SQLServer数据插入到MongoDB数据库中

# -*- coding: utf-8 -*-import pyodbcimport osimport csvimport pymongofrom pymongo import ASCENDING, DESCENDINGfrom pymongo import MongoClientimport binascii '''连接mongoDB数据库'''client = MongoClient('10.20.4.79', 27017)#client = MongoClient('10.20.66.10

将pandas的DataFrame数据写入MySQL数据库 + sqlalchemy

将pandas的DataFrame数据写入MySQL数据库 + sqlalchemy [python] view plain copy print? import pandas as pd from sqlalchemy import create_engine ##将数据写入mysql的数据库,但需要先通过sqlalchemy.create_engine建立连接,且字符编码设置为utf8,否则有些latin字符不能处理 yconnect = create_engine('mysql+mysql

python读取文本文件数据

本文要点刚要: (一)读文本文件格式的数据函数:read_csv,read_table 1.读不同分隔符的文本文件,用参数sep 2.读无字段名(表头)的文本文件 ,用参数names 3.为文本文件制定索引,用index_col 4.跳行读取文本文件,用skiprows 5.数据太大时需要逐块读取文本数据用chunksize进行分块. (二)将数据写成文本文件格式函数:to_csv 范例如下: (一)读取文本文件格式的数据集 1.read_csv和read_table的区别: #read_cs

PHP如何通过SQL语句将数据写入MySQL数据库呢？

1,php和MySQL建立连接关系 2,打开 3,接受页面数据,PHP录入到指定的表中 1.2两步可直接使用一个数据库链接文件即可:conn.php <?phpmysql_connect("localhost","root","");//连接MySQLmysql_select_db("hello");//选择数据库?> 当然,前提是已经安装WEB服务器.PHP和MySQL,并且建立MySQL表“webjx” mys

Structured Streaming 实战案例读取文本数据

1.1.1.读取文本数据 spark应用可以监听某一个目录,而web服务在这个目录上实时产生日志文件,这样对于spark应用来说,日志文件就是实时数据 Structured Streaming支持的文件类型有text,csv,json,parquet ●准备工作在people.json文件输入如下数据: {"name":"json","age":23,"hobby":"running"} {"n

Flink RichSourceFunction应用，读关系型数据(mysql)数据写入关系型数据库(mysql)

1. 写在前面 Flink被誉为第四代大数据计算引擎组件,即可以用作基于离线分布式计算,也可以应用于实时计算.Flink的核心是转化为流进行计算.Flink三个核心:Source,Transformation,Sink.其中Source即为Flink计算的数据源,Transformation即为进行分布式流式计算的算子,也是计算的核心,Sink即为计算后的数据输出端.Flink Source原生支持包括Kafka,ES,RabbitMQ等一些通用的消息队列组件或基于文本的高性能非关系型数据库.而

Python爬虫爬数据写入到文件

#coding=utf-8 import requests from bs4 import BeautifulSoup import sys reload(sys) sys.setdefaultencoding('utf8') r=requests.get('http://html-color-codes.info/color-names/') html=r.text #print html soup=BeautifulSoup(html,'html.parser') trs=soup.f