自然语言18.1_Named Entity Recognition with NLTK

https://www.pythonprogramming.net/named-entity-recognition-nltk-tutorial/?completed=/chinking-nltk-tutorial/

Named Entity Recognition with NLTK

One of the most major forms of chunking in natural language
processing is called "Named Entity Recognition." The idea is to have the
machine immediately be able to pull out "entities" like people, places,
things, locations, monetary figures, and more.

This can be a bit of a challenge, but NLTK is this built in for
us. There are two major options with NLTK‘s named entity recognition:
either recognize all named entities, or recognize named entities as
their respective type, like people, places, locations, etc.

Here‘s an example:

import nltk
from nltk.corpus import state_union
from nltk.tokenize import PunktSentenceTokenizer

train_text = state_union.raw("2005-GWBush.txt")
sample_text = state_union.raw("2006-GWBush.txt")

custom_sent_tokenizer = PunktSentenceTokenizer(train_text)

tokenized = custom_sent_tokenizer.tokenize(sample_text)

def process_content():
    try:
        for i in tokenized[5:]:
            words = nltk.word_tokenize(i)
            tagged = nltk.pos_tag(words)
            namedEnt = nltk.ne_chunk(tagged, binary=True)
            namedEnt.draw()
    except Exception as e:
        print(str(e))

process_content()

Here, with the option of binary = True, this means either something is a named entity, or not. There will be no further detail. The result is:

If you set binary = False, then the result is:

Immediately, you can see a few things. When Binary is False, it picked up the same things, but wound up splitting up terms like White House into "White" and "House" as if they were different, whereas we could see in the binary = True option, the named entity recognition was correct to say White House was part of the same named entity.

Depending on your goals, you may use the binary option how you see fit. Here are the types of Named Entities that you can get if you have binary as false:

NE Type and Examples

ORGANIZATION - Georgia-Pacific Corp., WHO

PERSON - Eddy Bonte, President Obama

LOCATION - Murray River, Mount Everest

DATE - June, 2008-06-29

TIME - two fifty a m, 1:30 p.m.

MONEY - 175 million Canadian Dollars, GBP 10.40

PERCENT - twenty pct, 18.75 %

FACILITY - Washington Monument, Stonehenge

GPE - South East Asia, Midlothian

Either way, you will probably find that you need to do a bit more
work to get it just right, but this is pretty powerful right out of the
box.

In the next tutorial, we‘re going to talk about something similar to stemming, called lemmatizing.

时间： 2024-10-13 01:29:27

自然语言18.1_Named Entity Recognition with NLTK

https://www.pythonprogramming.net/named-entity-recognition-nltk-tutorial/?completed=/chinking-nltk-tutorial/

Named Entity Recognition with NLTK

自然语言18.1_Named Entity Recognition with NLTK的相关文章

自然语言12_Tokenizing Words and Sentences with NLTK

自然语言27_Converting words to Features with NLTK

【v2.x OGE教程 18】 Entity相关

自然语言15_Part of Speech Tagging with NLTK

<知识库的构建> 2-1 有名字的实体的识别 Named Entity Recognition

自然语言17_Chinking with NLTK

自然语言18_Named-entity recognition

【NLP】干货！Python NLTK结合stanford NLP工具包进行文本处理

Python深度学习自然语言处理工具Stanza试用！这也太强大了吧！