用AWK来过滤nginx日志中的特定值~~~
?用AWK来过滤nginx日志中的特定值~~~2010-01-26 14:46:52标签:AWK?nginx原创作品,允许转载,转载时请务必以超链接形式标明文章?原始出处?、作者信息和本声明。否则将追究法律责任。http://storysky.blog.51cto.com/628458/270671??这篇文章说是原创的其实里面包含了很多朋友的帮助,在此对朋友们表示感谢!!
? 前天开发的同事让我帮忙分析下 nginx访问日志,我用了awstat做成了图表,结果人家说不要图,他只要访问日志里面的4个值...(早说啊),我看了下nginx的日志格式,下面是其中一段
124.227.66.162 - - [25/Jan/2010:13:42:07 +0800] "POST /design/game.php HTTP/1.1" "uid=355288&cuid=355287×tamp=1264484517&check=68230e418e28a9d05b8cf1e2f7cbf392&action=plantInfo" 200 1019 "http://www.ime.com/design/flash/main.swf?v=439/[[DYNAMIC]]/1" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)" -
124.240.39.49 - - [25/Jan/2010:13:42:07 +0800] "POST /design/game.php HTTP/1.1" "cid=2&lid=4&oid=2&action=researchLayer&cuid=496990×tamp=1264398138&check=b50cd4ade18c0797df24cb1a8828ae18" 200 219 "http://www.ime.com/design/flash/main.swf?v=439/[[DYNAMIC]]/1" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; SLCC1; .NET CLR 2.0.50727; .NET CLR 3.5.21022; .NET CLR 3.5.30729; .NET CLR 3.0.30618)" -
121.236.118.126 - - [25/Jan/2010:13:42:07 +0800] "POST /design/game.php HTTP/1.1" "check=8ec1521fc3df9c03d83af9a4d933dbb0&cuid=509590×tamp=1264398703&oid=2&action=oreInfo" 200 261 "http://www.ime.com/design/flash/main.swf?v=439/[[DYNAMIC]]/1" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 1.7; .NET CLR 2.0.50727)" -
同事让我帮忙取?IP地址 时间 还有 cuid= 和 action=?的值
看上去好乱,但是还是有规律的,里面好多行没有 action 和cuid,我先把他过滤掉
awk '/action/{print $0}' access.log > action.log
因为 如果有action 就肯定会有cuid 所以只过滤一个action就好了
现在的所有行都有 cuid 和 action了
好了,我再来改一改格式,让他看起来更清晰一些
awk -F "[ '&''[']" '{print $1"\t"$5"\t"$10"\t"$11"\t"$12"\t"$13"\t"$14"\t"$15}' action.log > newlog
这样比较麻烦,不过确实能让他更清晰一些,下面是得到的结果
117.83.131.36??? 25/Jan/2010:14:31:34??? "uid=438824??? cuid=511252??? timestamp=1264401079??? check=fbb9ad922f01888e6c0757d117bf304e??? action=plantInfo"??? 200
221.9.32.181??? 25/Jan/2010:14:31:34??? "cuid=506517??? action=plantInfo??? timestamp=1264401075??? check=01661377f346538eba790e856dd3713a??? uid=539860"??? 200
221.178.128.146??? 25/Jan/2010:14:31:34??? "timestamp=1264401105??? check=7d5e41feeb3ae0482e1fe990f27ddc67??? cuid=303367??? display=1??? action=plantInfuid=303367"
124.131.80.68??? 25/Jan/2010:14:31:34??? "cuid=393678??? timestamp=1264401093??? action=checkResearchLayer??? check=2f2cc50cc99aa9e05f02b6f6a47cbef6"200??? 765
125.107.199.28??? 25/Jan/2010:14:31:34??? "timestamp=1264401094??? oid=4??? uid=350003??? action=oreInfo??? check=5d835e252b841c86da041b8b63b4b67e??? cuid=356549"
111.167.145.209??? 25/Jan/2010:14:31:34??? "action=plantInfo??? cuid=154228??? timestamp=1264401094??? check=5d835e252b841c86da041b8b63b4b67e??? uid=372981"??? 200
看到这里我有点发愁了,因为cuid 和 action 所在的列不是固定的,用简单的AWK过滤不行,需要借助AWK的循环和判断了,而这方面我没有做过于是就在群里发了求助信息,这时候有两个朋友 给了我回复一个是 辉太郎 另一个是 jeremy.zhang
他们的方案也不同,一个是用perl 脚本,另一个是直接用awk
先说说 用perl吧,其实perl我也不太懂,直接把他写的脚本贴上来
#!/usr/bin/perl -w
open(MYFILE,"/mnt/disk/newlog") || die "$!";
while(<MYFILE>)
???? {
??????????? $str = $_;
??????????????? if ($str =~ m/(.*?)\[/s)
??????????????????????? {
????????????????????????????? $var1 =? $1;
???????????????????????????????? print? $var1;
???????????????????????????????????? }
??????????????? if ($str =~ m/\[(.*?)"/s)
?????????????????????? {
????????????????????????????? $var4 = $1;
???????????????????????????????? print $var4;
??????????????????????????????????? }
??????????????? if ($str =~ m/cuid=(\d+)/s)
????????????????????????? {
????????????????????????????????? $var2 = $1;
??????????????????????????????????? print "cuid=",$var2,"\t";
???????????????????????????????????????? }
?????????????? if ($str =~ m/action=(\w+)/s)
?????????????????????????? {
?????????????????????????????????? $var3 = $1;
??????????????????????????????????? print? "action=",$var3,"\n";
??????????????????????????????????????? }
??????????????????????? }
/mnt/disk/newlog 这个是我刚才过滤出来的文件,执行的时候用perl 执行
perl 1.sh > newlog1
但是这条我执行后格式出了一点小偏差
124.197.61.124? 25/Jan/2010:14:42:17??? cuid=430334???? action=plantInfo
54955 124.79.7.236??? 25/Jan/2010:14:42:17??? cuid=318701???? action=petsInfo
54956 122.230.66.90?? 25/Jan/2010:14:42:17??? cuid=223422???? action=compQuest
54957 113.128.147.225 25/Jan/2010:14:42:17??? cuid=362043???? action=plantInfo
54958 220.184.20.99?? 25/Jan/2010:14:42:17??? cuid=484582???? action=wordInfo
54959 222.161.49.201? 25/Jan/2010:14:42:17??? cuid=304167???? 218.95.48.90??? 25/Jan/2010:14:42:17??? cuid=476480???? action=plantInfo
54960 218.106.242.20? 25/Jan/2010:14:42:17??? cuid=501942???? action=oreInfo
54961 221.137.223.58? 25/Jan/2010:14:42:17??? cuid=445595???? action=takeQuest
54962 124.126.155.202 25/Jan/2010:14:42:17??? cuid=0? action=initData
54963 113.224.227.68? 25/Jan/2010:14:42:17??? cuid=529218???? action=editName
54964 121.4.66.146??? 25/Jan/2010:14:42:17??? cuid=187626???? action=researchLayer
54965 220.190.82.170? 25/Jan/2010:14:42:17??? cuid=62789????? action=steal
54966 218.5.38.250??? 25/Jan/2010:14:42:17??? cuid=456212???? 124.90.203.86?? 25/Jan/2010:14:42:17??? cuid=492016???? action=oreInfo
但是总体来讲还是可以接受的,谢谢辉太郎
下面看看jeremy 的awk 命令,
第一步?awk '/action/{print $0}' access.log >tmp.log?过滤出包含action的行
第二步
awk '{print $1"\t"$4"\t"$9}' tmp.log > action.log?
将没用的列去掉
第三部
过滤并输出 IP 时间 cuid= action=
awk -F"[ '['"'&''=']+" '{printf $1"\t"$2"\t";for(i=3;i<=NF;i++){if($i=="cuid" || $i=="action")printf "%s",$i"="$(i+1)"\t"};printf "\n"}' action.log > cuid_action.log
下面是最终的结果
202.113.30.144??? ??? 25/Jan/2010:13:42:07??? ??? cuid=181188??? action=compound????
124.227.66.162??? ??? 25/Jan/2010:13:42:07??? ??? cuid=355287??? action=plantInfo????
124.240.39.49??? ??? 25/Jan/2010:13:42:07??? ??? action=researchLayer??? cuid=496990????
121.236.118.126??? ??? 25/Jan/2010:13:42:07??? ??? cuid=509590??? action=oreInfo????
113.139.18.82??? ??? 25/Jan/2010:13:42:07??? ??? cuid=512461??? action=oreInfo????
222.184.232.183??? ??? 25/Jan/2010:13:42:07??? ??? cuid=520595??? action=oreInfo????
218.59.80.95??? ??? 25/Jan/2010:13:42:07??? ??? cuid=293339??? action=questInfo????
221.6.38.37??? ??? 25/Jan/2010:13:42:07??? ??? action=plantInfo??? cuid=518015????
125.39.143.96??? ??? 25/Jan/2010:13:42:07??? ??? cuid=133987??? action=pkResult????
119.180.17.218??? ??? 25/Jan/2010:13:42:07??? ??? cuid=452667??? action=wordInfo
其实上面这三步可以合并成一步但是分开来弄更清晰一些
大家可以通过修改上面这些命令来 定制过滤自己需要的字段,希望对大家有所帮助
再次感谢jerrmy
本文出自 “story的天空” 博客,请务必保留此出处http://storysky.blog.51cto.com/628458/270671