Python 规范化LinkedIn用户的联系人所在公司后缀 (data normalization)

CODE:

#!/usr/bin/python
# -*- coding: utf-8 -*-

'''
Created on 2014-8-19
@author: guaguastd
@name: company_suffix_normalize.py
'''

# import json
import os
import csv
from collections import Counter
from operator import itemgetter
from prettytable import PrettyTable

# specify csv directory
CSV_FILE = os.path.join(r"E:", "\\", "eclipse", "LinkedIn", "dfile", "my_connections.csv")

# define a set of transforms that converts the first item
# to the second item
transforms = [(', Inc.', ''), (', Inc', ''), (', LLC', ''), (', LLP', ''), (' LLC', ''), (' Inc.', ''), (' Inc', '')]

csvReader = csv.DictReader(open(CSV_FILE), delimiter=',', quotechar='"')
contacts = [row for row in csvReader]
companies = [c['Company'].strip() for c in contacts if c['Company'].strip() != '']

for i, _ in enumerate(companies):
    for transform in transforms:
        companies[i] = companies[i].replace(*transform)

pt = PrettyTable(field_names=['Company', 'Freq'])
pt.align = 'l'
c = Counter(companies)
[pt.add_row([company, freq])
for (company, freq) in sorted(c.items(), key=itemgetter(1), reverse=True)
    if freq > 0]
print pt

RESULT:

+---------------------------------------+------+
| Company                               | Freq |
+---------------------------------------+------+
| ??????????                            | 1    |
| ??                                    | 1    |
| SoftTalent Consulting ??????????????? | 1    |
| SJTU                                  | 1    |
| WatchGuard Technologies               | 1    |
| Hebei Meishen Chemical Group CO.,Ltd  | 1    |
| Bloomberg LP                          | 1    |
| DiHao trading Co.,Ltd                 | 1    |
| CET                                   | 1    |
| Pica8                                 | 1    |
| Microsoft                             | 1    |
+---------------------------------------+------+

Python 规范化LinkedIn用户的联系人所在公司后缀 (data normalization),布布扣,bubuko.com

时间: 2024-12-12 16:09:16

Python 规范化LinkedIn用户的联系人所在公司后缀 (data normalization)的相关文章

Python 规范化LinkedIn用户联系人的职位名

CODE: #!/usr/bin/python # -*- coding: utf-8 -*- ''' Created on 2014-8-19 @author: guaguastd @name: job_title_standard.py ''' import os import csv from collections import Counter from operator import itemgetter from prettytable import PrettyTable # sp

Python 对LinkedIn用户联系人的地址进行地理编码

CODE: #!/usr/bin/python # -*- coding: utf-8 -*- ''' Created on 2014-8-20 @author: guaguastd @name: geocode_connection_bing.py ''' from geopy import geocoders import json GEO_APP_KEY = '' g = geocoders.Bing(GEO_APP_KEY) # access to linkedin api from l

Python 提取LinkedIn用户的人脉

CODE: #!/usr/bin/python # -*- coding: utf-8 -*- ''' Created on 2014-8-18 @author: guaguastd @name: linkedin_connection_retrieve.py ''' # import login from login import linkedin_login # import json import json from prettytable import PrettyTable # acc

Python 显示LinkedIn用户的工作岗位

CODE: #!/usr/bin/python # -*- coding: utf-8 -*- ''' Created on 2014-8-18 @author: guaguastd @name: job_position_display.py ''' # import login from login import linkedin_login # import json import json # access to linkedin api linkedin_api = linkedin_

Python 聚类分析LinkedIn用户人脉网络

CODE: #!/usr/bin/python # -*- coding: utf-8 -*- ''' Created on 2014-8-26 @author: guaguastd @name: linkedin_network_clusters.py ''' import os import sys import json from urllib2 import HTTPError from cluster import KMeansClustering, centroid # A help

Python 访问 LinkedIn (API)

CODE: #!/usr/bin/python # -*- coding: utf-8 -*- ''' Created on 2014-8-16 @author: guaguastd @name: login.py ''' # twitter login def linkedin_login(): from linkedin import linkedin CONSUMER_KEY = '' CONSUMER_SECRET = '' USER_TOKEN = '' USER_SECRET = '

Python 提取Twitter用户的Tweet

CODE: #!/usr/bin/python # -*- coding: utf-8 -*- ''' Created on 2014-7-31 @author: guaguastd @name: harvest_user_tweet.py ''' if __name__ == '__main__': # import json import json # import search from search import search_for_tweet # import harvest_use

Python 爬行Twitter用户的Friendship图

CODE: #!/usr/bin/python # -*- coding: utf-8 -*- ''' Created on 2014-8-1 @author: guaguastd @name: crawing_friendship_graph.py ''' if __name__ == '__main__': # import json #import json # import search from search import search_for_tweet # import get_f

易宝典文章——玩转Office 365中的Exchange Online服务 之十三 怎样管理Exchange Online的邮件用户和联系人

在前面文章的描述过程中,提到了这样两个概念--联系人和组.接下来,就专门来讲关于Exchange Online的收件人对象的问题.对于Exchange Online来讲,所有能够接收邮件的对象都叫做收件人对象,它们具有一个共同的特点就是都具有邮件地址.虽然这些对象在Exchange Online中都具有邮件地址但不一定都有用于存储邮件的邮箱. 邮件用户和联系人就是属于在Exchange Online中只有邮件地址而没有邮箱存储的收件人对象.在ExchangeOnline中创建这类对象的目的,就是