python在进行差值统计时，怎么设定统计大小范围

2012-05-04

python在进行差值统计时，如何设定统计大小范围2012-04-18 12:33:33 192.168.13.106 218.16.121.240 802012

python在进行差值统计时，如何设定统计大小范围
2012-04-18 12:33:33 192.168.13.106 218.16.121.240 80
2012-04-18 12:33:43 192.168.13.106 110.75.187.22 80
2012-04-18 12:34:13 192.168.65.27 192.168.0.188 443
2012-04-18 12:34:27 192.168.40.117 192.168.0.174 80
2012-04-18 12:35:39 192.168.20.109 119.147.113.98 80
2012-04-18 12:35:59 192.168.20.109 119.147.113.98 80
2012-04-18 12:36:13 192.168.65.27 192.168.0.189 443
2012-04-18 12:36:20 192.168.13.106 113.11.195.106 80
2012-04-18 12:36:26 192.168.50.112 192.168.0.174 80
2012-04-18 12:36:33 192.168.50.146 118.186.66.51 80
2012-04-18 12:36:43 192.168.30.105 192.168.0.174 80
2012-04-18 12:36:53 192.168.50.145 119.147.194.250 80
2012-04-18 12:37:01 192.168.40.105 192.168.0.174 80
2012-04-18 12:37:12 192.168.13.106 182.50.0.106 80
2012-04-18 12:37:33 192.168.13.106 182.50.0.106 80
2012-04-19 12:34:13 192.168.65.27 192.168.0.188 443

文本格式如上所示
希望统计出后三段相同的他们的时间间隔有没有一定的周期
思路如下：
1、提取后三段相同的，列到一起
2、后三段相同的他们的时间做差(这里会有跨天发生，python如何实现跨天的减法？)
3、统计出现最多的时间间隔所占的百分比，如果大于90%，就将此段信息输出到high.txt 并返回1
大于60%小于90% 输出到middle.txt 返回0
小于60%，输出到low.txt

等于说最后三个txt的格式应为(随便举例)：
high.txt
192.168.13.106 182.50.0.106 80，95%
middle.txt
192.168.65.27 192.168.0.188 443,70%
low.txt
192.168.65.27 192.168.0.188 443,30%

现在处理的代码如下：

Python code

import refrom datetime import datetime# read data from files = open(r'/home/test').read()print s# format data# srcIP->destIP:port = date timedateDict = {}# srcIP->destIP:port = number of slotsslotDict = {}# total number of slotstotalNum = 0# loopfor line in s.split('\n'):    items = line.split(' ')    if len(items)==5:        # total time slot        totalNum += 1        # new key        newkey = items[-3]+'->'+items[-2]+':'+items[-1]        # dateDict        if dateDict.has_key(newkey):            dateDict[newkey].append(items[0]+' '+items[1])        else:            dateDict[newkey] = [items[0]+' '+items[1]]        # slotDist        if slotDict.has_key(newkey):            slotDict[newkey] += 1        else:            slotDict[newkey] = 0# write filesfor k in slotDict.keys():    # ratio    ratio = slotDict[k]*1.0/totalNum    # line string    newline = k+', '+str(int(ratio*100))+'%\n'    # open file    if ratio>0.9:        fid = open('/home/susy/work/data/high1.txt','a+')        print 1     elif 0.6<ratio<=0.9:        fid = open('/home/susy/work/data/middle1.txt','a+')        print 0    elif ratio<=0.6:        fid = open('/home/susy/work/data/low.txt1','a+')    # write, close    fid.write(newline)    fid.close()print 'DONE!'

现在的问题是
对于时间差的统计，我不想统计小于10秒的差值，
只想对大于或等于10秒的差值进行统计并算百分比。
该如何实现啊？

[解决办法]

Python code

#!/usr/bin/python# encoding: utf-8import reimport datetimepatt = re.compile(r'''  (?P<dt>\d{4}\-\d{2}\-\d{2}\s\d{2}:\d{2}:\d{2})\s  (?P<src>\d+(\.\d+){3})\s  (?P<tag>\d+(\.\d+){3})\s  (?P<port>\d+)  ''', re.I|re.U|re.X)def dataReader(filename):    with open(filename, 'rt') as handle:        for ln in handle:            m = patt.match(ln.strip())            if m: yield m.groupdict()            else: continuedef s2dt(s, fmt='%Y-%m-%d %H:%M:%S'):    return datetime.datetime.strptime(s, fmt)def dataCollector(filename):    collector = {}    for d in dataReader(filename):        collector.setdefault(            (d['src'],d['tag'],d['port']),[]        ).append(s2dt(d['dt']))    return collectordef delta(timelist):    timelist.sort()    dlist = []    t0 = timelist.pop(0)    for t in timelist:        d = (t - t0).total_seconds()        t0 = t        if d < 10:            continue        dlist.append(d)    return countdlist(dlist)def countdlist(dlist):    dd, totalcnt = {}, 0    for d in dlist:        totalcnt += 1        dd.setdefault(d,[]).append(d)    lst = [(len(dd[d]),d) for d in dd]    if not lst:        return None    lst.sort()    cnt, dur = lst[-1]    return dur, '%.2f%%'%(1.*cnt/totalcnt)for category, timelist in dataCollector(r'test').items():    print category, delta(timelist)

热点排行

perl python

python在进行差值统计时，怎么设定统计大小范围