现象:hive 表中的小时数据,每隔几天就会缺失一个小时的,最后发现时在做数据聚合cat的时候,失败,导致:
修改脚本,做下面的方案,解决了:
##merge 5min data into hour data cat $datapath/news_5min_$xhour* > $localpath/data/channelnews_$hour.txt #####check tmppath="${localpath}/data/channelnews_${hour}.txt" i=0 while (( $i < 10)) do m=`du -b $path | awk ‘{print int($1)}‘` if [ $m -lt 1024 ]; then echo "${path} is small ,is $m" sleep 5; else break fi let "i++" done echo "i is:$i"