Python 对新浪微博的元素 (Word, Screen Name)的词汇多样性分析

CODE:

#!/usr/bin/python
# -*- coding: utf-8 -*-

'''
Created on 2014-7-10
@author: guaguastd
@name: weiboLexicalDiversity.py
'''

if __name__ == '__main__':

    # get weibo_api to access sina api
    from sinaWeiboLogin import sinaWeiboLogin
    sinaWeiboApi = sinaWeiboLogin()

    # import sinaWeibo
    from sinaWeibo import extractWeiboEntities

    # import sinaWeoboStatuses
    from sinaWeiboStatuses import publicTimeline

    # import sinaWeiboFrequency
    from sinaWeiboLexicalDiversity import weibo_lexical_diversity, weibo_average_words

    # get the new 5 weibo
    weiboNum = 5
    statuses = publicTimeline(sinaWeiboApi, weiboNum)
    status_texts,screen_names,words = extractWeiboEntities(statuses)  

    for token in (words, screen_names):
        print '\rLexical diversity of %s: ' % token
        print weibo_lexical_diversity(token)  

    for status in (status_texts,):
        print '\rAverage words of %s: ' % status
        print weibo_average_words(status)

RESULT:

Lexical diversity of [u'[moc\u8f6c\u53d1]2014\u65b0\u6b3e\u590f\u88c5\u5370\u82b1\u77ed\u8896\u8fde\u8863\u88d9\u9ad8\u7aef\u5927\u7801\u4e2d\u5e74\u5973\u88c5\u4fee\u8eab\u663e\u7626\u857e\u4e1d\u8fde\u8863\u88d9', u'http://t.cn/RvCLdgN', u'[\u795e\u9a6c]\u963f\u4f9d\u83b2\u8fde\u8863\u88d9', u'ccdd\u5973\u88c52014\u590f\u88c5\u65b0\u6b3e', u'\u97e9\u7248', u'\u5c0f\u9999\u98ce\u857e\u4e1d\u516c\u4e3b\u88d9', u'\u6b63\u54c1', u'http://t.cn/RvCyo4X', u'\u590f\u65e5\u5ea6\u5047\u6e05\u51c9\u88c5~~>>>>>>\u559c\u6b22\u70b9\u8fd9\u91cc\uff1ahttp://t.cn/RvEqd5R', u'\u6211\u6b63\u5728\u6b66\u4fa0\u5361\u724c\u624b\u6e38\u201c\u5927\u638c\u95e8\u201d\u4e2d\u51b2\u51fb\u8840\u6218\u699c\u5355\uff0c\u613f\u5404\u4f4d\u5927\u4fa0\u62d4\u5200\u76f8\u52a9\uff01\u6ce8\u518c\u5927\u638c\u95e8\uff0c\u586b\u5199\u6211\u7684\u9080\u8bf7\u7801\u30102zr7\u3011\uff0c\u5171\u540c\u83b7\u53d6\u4e30\u539a\u5956\u52b1\u3002http://t.cn/8FUZSTe', u'@\u5927\u638c\u95e8\u6e38\u620f', u'\u8f7b\u8f68\u65e9\u4e0a\u7684\u7a7a\u8c03\u5f00\u5f97\u7565\u5927']:
1.0

Lexical diversity of [u'kathyisangel', u'wangbinrona', u'\u5168\u7403\u6d41\u884c\u670d\u9970\u6f6e\u7f8e\u98ce\u5c1a\u63a7', u'\u624b\u673a\u7528\u62372454403221', u'\u6b63\u76f4\u4f60\u4e00\u8138\u7684\u52c7\u6562\u541b']:
1.0

Average words of [u'[moc\u8f6c\u53d1]2014\u65b0\u6b3e\u590f\u88c5\u5370\u82b1\u77ed\u8896\u8fde\u8863\u88d9\u9ad8\u7aef\u5927\u7801\u4e2d\u5e74\u5973\u88c5\u4fee\u8eab\u663e\u7626\u857e\u4e1d\u8fde\u8863\u88d9  http://t.cn/RvCLdgN', u'[\u795e\u9a6c]\u963f\u4f9d\u83b2\u8fde\u8863\u88d9 ccdd\u5973\u88c52014\u590f\u88c5\u65b0\u6b3e \u97e9\u7248 \u5c0f\u9999\u98ce\u857e\u4e1d\u516c\u4e3b\u88d9 \u6b63\u54c1  http://t.cn/RvCyo4X', u'\u590f\u65e5\u5ea6\u5047\u6e05\u51c9\u88c5~~>>>>>>\u559c\u6b22\u70b9\u8fd9\u91cc\uff1ahttp://t.cn/RvEqd5R', u'\u6211\u6b63\u5728\u6b66\u4fa0\u5361\u724c\u624b\u6e38\u201c\u5927\u638c\u95e8\u201d\u4e2d\u51b2\u51fb\u8840\u6218\u699c\u5355\uff0c\u613f\u5404\u4f4d\u5927\u4fa0\u62d4\u5200\u76f8\u52a9\uff01\u6ce8\u518c\u5927\u638c\u95e8\uff0c\u586b\u5199\u6211\u7684\u9080\u8bf7\u7801\u30102zr7\u3011\uff0c\u5171\u540c\u83b7\u53d6\u4e30\u539a\u5956\u52b1\u3002http://t.cn/8FUZSTe @\u5927\u638c\u95e8\u6e38\u620f ', u'\u8f7b\u8f68\u65e9\u4e0a\u7684\u7a7a\u8c03\u5f00\u5f97\u7565\u5927']:
2.4

时间： 2024-10-12 10:48:04

Python 对新浪微博的元素 (Word, Screen Name)的词汇多样性分析的相关文章

Python 对新浪微博的博文元素 (Word, Screen Name)的频率分析

CODE: #!/usr/bin/python # -*- coding: utf-8 -*- ''' Created on 2014-7-9 @author: guaguastd @name: weiboFrequencyAnalysis.py ''' if __name__ == '__main__': # get weibo_api to access sina api from sinaWeiboLogin import sinaWeiboLogin sinaWeiboApi = sin

Python 对Twitter tweet的元素 (Word, Screen Name, Hash Tag)的词汇多样性分析

CODE: #!/usr/bin/python # -*- coding: utf-8 -*- ''' Created on 2014-7-3 @author: guaguastd @name: tweet_lexical_diversity.py ''' # Compute lexical diversity def lexical_diversity(tokens): return 1.0*len(set(tokens))/len(tokens) # Compute the average

Python 对Twitter tweet的元素 (Word, Screen Name, Hash Tag)的频率分析

#!/usr/bin/python # -*- coding: utf-8 -*- ''' Created on 2014-7-2 @author: guaguastd @name: tweet_frequency_analysis.py ''' if __name__ == '__main__': # import Counter from collections import Counter # pip install prettytable from prettytable import

Python 提取新浪微博的博文中的元素（包含Text, Screen_name）

CODE: #!/usr/bin/python # -*- coding: utf-8 -*- ''' Created on 2014-7-8 @author: guaguastd @name: extractWeiboEntities.py ''' if __name__ == '__main__': import json # get weibo_api to access sina api from sinaWeiboLogin import sinaWeiboLogin sinaWeib

Python 获取新浪微博的热门话题 (API)

#!/usr/bin/python # -*- coding: utf-8 -*- ''' Created on 2014-06-27 @author: guaguastd ''' import json # Refer to http://blog.csdn.net/guaguastd/article/details/33664443 from login import weibo_login # sina weibo basic secret information APP_KEY = ''

【Python】定位一组元素、

前几天生病加懒惰 TAT ========================================================================== 1.getAttribute()方法是一个函数.它只有一个参数--你打算查询的属性的名字: 2.http://www.cnblogs.com/fnng/p/3190966.html 注意路径 3. [Python]定位一组元素.,布布扣,bubuko.com

Python 获取新浪微博的最新公共微博

API: statuses/public_timeline 返回最新的200条公共微博,返回结果非完全实时 CODE: #!/usr/bin/python # -*- coding: utf-8 -*- ''' Created on 2014-7-3 @author: guaguastd @name: statuses_public_timeline.py ''' def public_timeline(weibo_api, count): #public_timeline = weibo_ap

会务准备期间材料准备工作具体实施总结 ----（vim技巧应用, python信息提取与整合, microsoft word格式调整批量化)

会务准备期间材料准备工作具体实施总结(vim, python, microsoft word) span.kw { color: #007020; font-weight: bold; } code > span.dt { color: #902000; } code > span.dv { color: #40a070; } code > span.bn { color: #40a070; } code > span.fl { color: #40a070; } code >

python将list连续元素和非连续元素分开转换为指定字符串

python将list连续元素和非连续元素分开转换为指定字符串贴吧网友提问http://tieba.baidu.com/p/3730249293已知一个由纯数字(顺序由小按大排序)元素组成的列表,比如li=[1,2,3,4,5,7,8,15,20,21,22,23,24,28]写一个函数,让它返回如下的字符串str='1~5,7~8,15,20~24,28'若数字连续,中间部分用 ~ 省略. """黄哥python远程视频培训班https://github.com/pytho