tomcat日志提取

问题

之前以为xx网红直播能带来大量流量导入商城,所以做了大量准备服务器做集群,每台服务器上都有tomcat日志,前面因为时间太赶,所以没有写分析用户行为的代码,所以现在就想着简单提取一下

linux工具提取

日志在如下文件夹下

/home/ch/logs/lwcmall_A001.logs/localhost_access_log.2017-01-03.txt
/home/ch/logs/lwcmall_A002.logs/localhost_access_log.2017-01-03.txt  
/home/ch/logs/lwcmall_A003.logs/localhost_access_log.2017-01-03.txt  
/home/ch/logs/lwcmall_A004.logs/localhost_access_log.2017-01-03.txt  
/home/ch/logs/lwcmall_B001.logs/localhost_access_log.2017-01-03.txt    
/home/ch/logs/lwcmall_B002.logs/localhost_access_log.2017-01-03.txt    
/home/ch/logs/lwcmall_B003.logs/localhost_access_log.2017-01-03.txt    

我们只需要GET请求的部分,所以先把它过滤出来

find ./ -name "localhost_access_log*" | xargs grep "GET"  

得到如下结果
2.png

我们需要晚八点以后的,所以把时间也过滤出来,修改如下

find ./ -name "localhost_access_log*" | xargs grep "GET" | grep "2017:2"  

得到如下结果 3.png

我们只关心时间和请求类型,修改如下

 find ./ -name "localhost_access_log*" | xargs grep "GET" | grep "2017:2" | awk   '{print $4" "$5 " " $6 " " $7}'

得到如下结果 4.png

可以看出静态资源在/template/或者/plug-in/目录下
那么如此下去

#查看controller没有page的请求有哪些  
find ./ -name "localhost_access_log*" | xargs grep "GET" | grep "2017:2" | awk   '{print $4" "$5 " " $6 " " $7}'  | grep weixinShopController | grep -v "page="

#去除重复相同page值的行
find ./ -name "localhost_access_log*" | xargs grep "GET" | grep "2017:2" | awk   '{print $4" "$5 " " $6 " " $7}'  | grep weixinShopController | awk -F '&' '{print $2}' | awk -F '=' '!a[$0]++{print}'  
得到结果如下
page=index  
page=goodsdetail  
page=addresslist

#访问首页index的次数
find ./ -name "localhost_access_log*" | xargs grep "GET" | grep "2017:2" | awk   '{print $4" "$5 " " $6 " " $7}'  | grep weixinShopController | grep page=index | wc

#访问商品详情页的次数
find ./ -name "localhost_access_log*" | xargs grep "GET" | grep "2017:2" | awk   '{print $4" "$5 " " $6 " " $7}'  | grep weixinShopController | grep page=goodsdetail | wc

#访问商品详情页的次数
find ./ -name "localhost_access_log*" | xargs grep "GET" | grep "2017:2" | awk   '{print $4" "$5 " " $6 " " $7}'  | grep weixinShopController | grep page=addresslist | wc