问题
之前以为xx网红直播能带来大量流量导入商城,所以做了大量准备服务器做集群,每台服务器上都有tomcat日志,前面因为时间太赶,所以没有写分析用户行为的代码,所以现在就想着简单提取一下
linux工具提取
日志在如下文件夹下
/home/ch/logs/lwcmall_A001.logs/localhost_access_log.2017-01-03.txt
/home/ch/logs/lwcmall_A002.logs/localhost_access_log.2017-01-03.txt
/home/ch/logs/lwcmall_A003.logs/localhost_access_log.2017-01-03.txt
/home/ch/logs/lwcmall_A004.logs/localhost_access_log.2017-01-03.txt
/home/ch/logs/lwcmall_B001.logs/localhost_access_log.2017-01-03.txt
/home/ch/logs/lwcmall_B002.logs/localhost_access_log.2017-01-03.txt
/home/ch/logs/lwcmall_B003.logs/localhost_access_log.2017-01-03.txt
我们只需要GET请求的部分,所以先把它过滤出来
find ./ -name "localhost_access_log*" | xargs grep "GET"
得到如下结果
我们需要晚八点以后的,所以把时间也过滤出来,修改如下
find ./ -name "localhost_access_log*" | xargs grep "GET" | grep "2017:2"
得到如下结果
我们只关心时间和请求类型,修改如下
find ./ -name "localhost_access_log*" | xargs grep "GET" | grep "2017:2" | awk '{print $4" "$5 " " $6 " " $7}'
得到如下结果
可以看出静态资源在/template/或者/plug-in/目录下
那么如此下去
#查看controller没有page的请求有哪些
find ./ -name "localhost_access_log*" | xargs grep "GET" | grep "2017:2" | awk '{print $4" "$5 " " $6 " " $7}' | grep weixinShopController | grep -v "page="
#去除重复相同page值的行
find ./ -name "localhost_access_log*" | xargs grep "GET" | grep "2017:2" | awk '{print $4" "$5 " " $6 " " $7}' | grep weixinShopController | awk -F '&' '{print $2}' | awk -F '=' '!a[$0]++{print}'
得到结果如下
page=index
page=goodsdetail
page=addresslist
#访问首页index的次数
find ./ -name "localhost_access_log*" | xargs grep "GET" | grep "2017:2" | awk '{print $4" "$5 " " $6 " " $7}' | grep weixinShopController | grep page=index | wc
#访问商品详情页的次数
find ./ -name "localhost_access_log*" | xargs grep "GET" | grep "2017:2" | awk '{print $4" "$5 " " $6 " " $7}' | grep weixinShopController | grep page=goodsdetail | wc
#访问商品详情页的次数
find ./ -name "localhost_access_log*" | xargs grep "GET" | grep "2017:2" | awk '{print $4" "$5 " " $6 " " $7}' | grep weixinShopController | grep page=addresslist | wc