Nginx 流量统计分析

程序简介

  • 通过分析nginx日志,统计出nginx流量(统计nginx日志中 $body_bytes_sent 字段),能自定义时间间隔,默认时间间隔为5分钟,单位为分钟

输出结果

开始时间 结束时间 分割线 统计流量
2019-11-23 03:26:00 2019-11-23 04:26:00 <=========> 2.04M
2019-11-23 04:27:43 2019-11-23 05:27:43 <=========> 895.05K
2019-11-23 05:28:25 2019-11-23 06:28:25 <=========> 1.88M
2019-11-23 06:33:08 2019-11-23 07:33:08 <=========> 1.29M
2019-11-23 07:37:28 2019-11-23 08:37:28 <=========> 1.16M

环境

  • python3+
  • 需要安装python argparse
  • 目前只支持nginx 日志

程序要求

  • nginx日志格式要求,第四个字段为 [$time_local] 和 第7个字段为 $body_bytes_sent 或者 $bytes_sent

    log_format  main  ‘$remote_addr - $remote_user [$time_local] "$request" ‘
                      ‘$status $body_bytes_sent $request_time "$http_referer" ‘
                      ‘$host DIRECT/$upstream_addr $upstream_http_content_type ‘
                      ‘"$http_user_agent" "$http_x_forwarded_for"‘;
  • body_bytes_sent:发送给客户端的字节数,不包括响应头的大小
  • bytes_sent:发送给客户端的字节数
  • 注意:nginx日志中间不能有空行,否则程序读取不到空行后面的日志

例子

# 分析 nginx access.log 日志,以 1小时 切割,统计每小时产生的流量
$ ./nginx_large_file_flow_analysis3.py -f /var/log/nginx/access.log -m 60

下面是 nginx_large_file_flow_analysis3.py 代码程序代码

#!/usr/bin/python3
#-*-coding=utf-8-*-

#-----------------------------------------------------------------------------
# 注意:日志中间不能有空行,否则程序读取不到空行后面的日志
#-----------------------------------------------------------------------------

import time
import os
import sys
import argparse

class displayFormat():
    def format_size(self, size):
        # 格式化流量单位
        KB = 1024  # KB -> B  B是字节
        MB = 1048576  # MB -> B
        GB = 1073741824  # GB -> B
        TB = 1099511627776  # TB -> B
        if size >= TB:
            size = str("%.2f" % (float(size / TB)) ) + ‘T‘
        elif size < KB:
            size = str(size) + ‘B‘
        elif size >= GB and size < TB:
            size = str("%.2f" % (float(size / GB))) + ‘G‘
        elif size >= MB and size < GB:
            size = str("%.2f" % (float(size / MB))) + ‘M‘
        else:
            size = str("%.2f" % (float(size / KB))) + ‘K‘
        return size

    def execut_time(self):
        # 输出脚本执行的时间
        print(‘\n‘)
        print("Script Execution Time: %.3f second" % time.clock())

class input_logfile_sort():
    # 内存优化
    __slots__ = [‘read_logascii_dict‘, ‘key‘]

    def __init__(self):
        self.read_logascii_dict = {}
        self.key = 1

    def logascii_sortetd(self, logfile):
        with open(logfile, ‘r‘) as f:
            while 1:
                    list_line = f.readline().split()
                    try:
                        if not list_line:
                            break
                        timeArray = time.strptime(list_line[3].strip(‘[‘), "%d/%b/%Y:%H:%M:%S")
                        timeStamp_start = int(time.mktime(timeArray))
                        list_line1 = [timeStamp_start, list_line[9]]
                        # 生成字典
                        self.read_logascii_dict[self.key] = list_line1
                        self.key += 1
                    except ValueError:
                        continue
                    except IndexError:
                        continue
        sorted_list_ascii = sorted(self.read_logascii_dict.items(), key=lambda k: (k[1][0]))
        return sorted_list_ascii
        # out [(4, [1420686592, ‘1024321222‘]), (3, [1449544192, ‘10243211111‘])]

class log_partition():
    display_format = displayFormat()
    def __init__(self):
        self.size1 = 0
        self.j = 0

    def time_format(self, time_stamps_start, time_stamps_end):
        time_start = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(time_stamps_start))
        time_end = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(time_stamps_end))
        print(time_start + ‘ -- ‘ + time_end + ‘ ‘ * 6 + ‘<=========>‘ + ‘ ‘ * 6 + self.display_format.format_size(self.size1))

    def log_pr(self, stored_log, times):
        time_stamps_start = stored_log[0][1][0]
        time_stamps_end = time_stamps_start + times
        lines = len(stored_log)
        for line in stored_log:
            self.j += 1
            if int(line[1][0]) <= time_stamps_end:
                try:
                    self.size1 = self.size1 + int(line[1][1])
                    if self.j == lines:
                        self.time_format(time_stamps_start, time_stamps_end)
                except ValueError:
                    continue
            else:
                try:
                    self.time_format(time_stamps_start, time_stamps_end)
                    self.size1 = 0
                    self.size1 = self.size1 + int(line[1][1])
                    time_stamps_start = int(line[1][0])
                    time_stamps_end = time_stamps_start + times
                    if self.j == lines:
                        self.time_format(time_stamps_start, time_stamps_end)
                except ValueError:
                    continue

class Main():
    #主调函数
    def main(self):
        parser = argparse.ArgumentParser(
            description="Nginx flow analysis, Supported file types ascii text.")
        parser.add_argument(‘-f‘, ‘--file‘,
                            dest=‘file‘,
                            nargs=‘?‘,
                            help="log file input.")
        parser.add_argument(‘-m‘, ‘--minute‘,
                            dest=‘minute‘,
                            default=5,
                            nargs=‘?‘,
                            type=int,
                            help="Nginx separation time,Default 5 min.")
        args = parser.parse_args()

        Input_sorted_log = input_logfile_sort()
        display_format1 = displayFormat()
        times = args.minute * 60
        for type in os.popen("""file {}""".format(args.file)):
            file_type = type.strip().split()[1]
        if file_type.lower() == ‘ascii‘:
            logascii_analysis = log_partition()
            logascii_analysis.log_pr(Input_sorted_log.logascii_sortetd(args.file), times)
            print(‘\033[1;32;40m‘)
            display_format1.execut_time()
            print(‘\033[0m‘)
        else:
            print(‘\033[1;32;40m‘)
            print("Supported file types ascii text.")
            print("Example: python3 {} -f nginxlogfile -m time".format(sys.argv[0]))
            print(‘\033[0m‘)

if __name__ == ‘__main__‘:
    main_obj = Main()
    main_obj.main()

转载自https://www.yp14.cn/2019/11/23/Nginx-%E6%B5%81%E9%87%8F%E7%BB%9F%E8%AE%A1%E5%88%86%E6%9E%90/

原文地址:https://www.cnblogs.com/cangqinglang/p/12202508.html

时间: 2024-11-05 22:34:59

Nginx 流量统计分析的相关文章

Nginx 流量带宽等请求状态统计( ngx_req_status)

Nginx流量带宽等请求状态统计 ( ngx_req_status) 插件下载地址: wget http://nginx.org/download/nginx-1.4.2.tar.gz git clone https://github.com/zls0424/ngx_req_status.git 配置示例: 1.服务目录 mkdir /opt/server 2.补丁导入: patch -p1 < /opt/server/ngx_req_status/write_filter-VERSION.pa

Nginx流量带宽请求状态统计(ngx_req_status)

介绍 ngx_req_status 用来展示 nginx 请求状态信息,类似于 apache 的 status, nginx 自带的模块只能显示连接数等等 信息,我们并不能知道到底有哪些请求.以及各 url 域名所消耗的带宽是多少. ngx_req_status 提供了这些功能 按域名. url. ip 等等统计信息 统计总流量 统计当前带宽\峰值带宽 统计总请求数量 安装   # cd /usr/local/src/ # wget "http://nginx.org/download/ngin

nginx流量带宽等请求状态统计( ngx_req_status)

介绍 ngx_req_status用来展示nginx请求状态信息,类似于apache的status,nginx自带的模块只能显示连接数等等信息,我们并不能知道到底有哪些请求.以及各url域名所消耗的带宽是多少.ngx_req_status提供了这些功能. 功能特性 按域名.url.ip等等统计信息 统计总流量 统计当前带宽\峰值带宽 统计总请求数量 1. 安装 # cd /usr/local/src/ # wget "http://nginx.org/download/nginx-1.4.2.t

运维笔记-nginx流量监控

要监控web nginx的流量宽带,需要先安装对应的模块 ngx_req_status wget https://github.com/zls0424/ngx_req_status/archive/master.zip -O ngx_req_status.zip 具体安装过程略过 线上服务器在编译的过程中已经加入故此无需在编译 如果之前没编译的,属于线上添加新模块的,在编译过程中请勿使用make install如果用了,就属于重新编译,不建议如此,具体度娘... 在nginx.conf的配置文件

nginx日志统计分析

本文主要使用的是grep,awk,cut等工具来对nginx日志进行统计和分析,具体如下: 1,列出当天访问最多次数的ip地址 cut -d- -f 1 /usr/local/nginx/logs/20160329/access_2016032913.log |uniq -c | sort -rn | head -20 [[email protected] 20160329]# cut -d- -f 1 /usr/local/nginx/logs/20160329/access_20160329

基于MapReduce的手机流量统计分析

1,代码 package mr; import java.io.IOException; import org.apache.commons.lang.StringUtils; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.ArrayWritable; import org.apache.hadoop.io.LongWritabl

整理得很全面的Nginx学习资源

Nginx基础 1.  nginx安装 2.  nginx 编译参数详解 3.  nginx安装配置+清缓存模块安装 4.  nginx+PHP 5.5 5.  nginx配置虚拟主机 6.  nginx location配置 7.  nginx root&alias文件路径配置 8.  ngx_http_core_module模块提供的变量 9.  nginx日志配置 10. apache和nginx支持SSI配置 12. nginx日志切割 13. Nginx重写规则指南 14. nginx

nginx教程全集汇总

Nginx基础1.  nginx安装:httpwww.ttlsa.comnginxnginx-install-on-linux2.  nginx 编译参数详解(运维不得不看):http://www.ttlsa.com/nginx/nginx-configure-descriptions/3.  nginx安装配置+清缓存模块安装:http://www.ttlsa.com/nginx/nginx-modules-ngx_cache_purge/4.  nginx+PHP 5.5:http://ww

nginx状态页两种方式笔记:一

状态页 stub_status {on|off}; 仅能用于location上下文; location /status{ stub_status on; allow  172.16.0.0/16; deny all; } Active connections: 11921server accepts handled requests 11989 11989 11991Reading: 0 Writing: 7 Waiting: 42 active connections – 活跃的连接数量ser