CoreNLP – Named Entity Recognition

by Open Source in on January 17, 2018

CoreNLP - Named Entity Recognition

Recognizes named entities (person and company names, etc.) in text. Principally, this annotator uses one or more machine learning sequence models to label entities, but it may also call specialist rule-based components, such as for labeling and interpreting times and dates. Numerical entities that require normalization, e.g., dates, have their normalized value stored in NormalizedNamedEntityTagAnnotation. For more extensive support for rule-based NER, you may also want to look at the RegexNER annotator. The set of entities recognized is language-dependent, and the recognized set of entities is frequently more limited for other languages than what is described below for English. As the name “NERClassifierCombiner” implies, commonly this annotator will run several named entity recognizers and then combine their results but it can run just a single annotator or only rule-based quantity NER.

For English, by default, this annotator recognizes named (PERSON, LOCATION, ORGANIZATION, MISC), numerical (MONEY, NUMBER, ORDINAL, PERCENT), and temporal (DATE, TIME, DURATION, SET) entities (12 classes). Adding the regexner annotator and using the supplied RegexNER pattern files adds support for the fine-grained and additional entity classes EMAIL, URL, CITY, STATE_OR_PROVINCE, COUNTRY, NATIONALITY, RELIGION, (job) TITLE, IDEOLOGY, CRIMINAL_CHARGE, CAUSE_OF_DEATH (11 classes) for a total of 23 classes. Named entities are recognized using a combination of three CRF sequence taggers trained on various corpora, including CoNLL, ACE, MUC, and ERE corpora. Numerical entities are recognized using a rule-based system.

How to Run

Docker is the simplest way to get started with this model. We have created a pre-built docker image that includes all dependencies needed to run the Stanford CoreNLP library.

Be sure you have docker installed!

Start the container Be sure to use the -p parameter to ensure you\'ve mapped your local port to the container\'s internal http port 9000

docker run -p 9000:9000 algohub/corenlp

Once the container is started, you can post any text data to port 9000 to get the result.

localhost:9000/?properties={annotators: ner,regexner, outputFormat: json}

Share Now!

Details

  • Released
    :

    January 17, 2018

  • Last Updated
    :

    June 1, 2020

  • Categories
    :

Share Your Valuable Opinions

You must log in to submit a review.