用Shell采集爱站关键词

  1. #!/bin/bash china.makepolo.com 换成你要采集的URL 就行,其它 不用动
  2. url=’china.makepolo.com’
  3. curl “www.aizhan.com/baidu/’$url’/0/position/”|sed -n ‘838,854p’|grep -oP “\/\w+[\\/]*[\\/]\D+[\\/]\w+[\\/]\w+[\\/]”|grep -a “baidu”|sed ‘s/^/www.aizhan.com&/g’ >fy.txt
  4. cat fy.txt|while read line; do curl “$line”|grep -a “\”zhishu”|sed -n ‘4,50p’|awk -F “\/|>” ‘{print $4,$6}’|awk -F “<” ‘BEGIN{RS=”<“;ORS=””}{print $0}’ >>aizhancizhishu.txt ;done;

 

关于二进制出错

-a, –text 将二进制文件当作文本文件处理。

http://www.howsoftworks.net/linux/command/grep.html

(转自老狼)

发表评论