Parsey McParseface

by Open Source in on January 17, 2018

We are happy to release Parsey McParseface, an English parser that we have trained for you, and that you can use to analyze English text, along with trained models for 40 languages and support for text segmentation and morphological analysis.

Once you have successfully built SyntaxNet, you can start parsing text right away with Parsey McParseface, located under syntaxnet/models. The easiest thing is to use or modify the included script syntaxnet/demo.sh, which shows a basic setup to parse English taking plain text as input.

You can also skip right away to the detailed SyntaxNet tutorial.

How accurate is Parsey McParseface? For the initial release, we tried to balance a model that runs fast enough to be useful on a single machine (e.g. ~600 words/second on a modern desktop) and that is also the most accurate parser available. Here\'s how Parsey McParseface compares to the academic literature on several different English domains: (all numbers are % correct head assignments in the tree, or unlabelled attachment score)

ModelNewsWebQuestions
Martins et al. (2013)93.1088.2394.21
Zhang and McDonald (2014)93.3288.6593.37
Weiss et al. (2015)93.9189.2994.17
Andor et al. (2016)*94.4490.1795.40
Parsey McParseface94.1589.0894.77

We see that Parsey McParseface is state-of-the-art; more importantly, with SyntaxNet you can train larger networks with more hidden units and bigger beam sizes if you want to push the accuracy even further: Andor et al. (2016)* is simply a SyntaxNet model with a larger beam and network. For futher information on the datasets, see that paper under the section Treebank Union.

Parsey McParseface is also state-of-the-art for part-of-speech (POS) tagging (numbers below are per-token accuracy):

ModelNewsWebQuestions
Ling et al. (2015)97.4494.0396.18
Andor et al. (2016)*97.7794.8096.86
Parsey McParseface97.5294.2496.45

To Run:

Run on port 5000:

$ docker run -d -it --rm -p 5000:5000 --name parseyserver algohub/syntaxnet-server

The default model is English. To select models set the PARSEY_MODELS environment variable. Select one or more (comma separated) models of the ones available here (NOTE: must be written exactly as it appears in that list)

$ docker run -d -it --rm -p 5000:5000 --name parseyserver -e PARSEY_MODELS=Latin,English,French algohub/syntaxnet-server

You can also set the batch size if necessary using the PARSEY_BATCH_SIZE environment variable (default 1)

To Use:

Post plain text, line separated sentences to it:

$ curl -H Content-Type:text/plain --data-binary Alea iacta est http://localhost:5000/

Returns a list of lists of sentences and words, in what is essentially the CoNLL-U format, just in JSON

[
  [
    {
      id: 1,
      form: Alea,
      upostag: NOUN,
      xpostag: n-s---fn-,
      feats: {
        Case: Nom,
        Gender: Fem,
        fPOS: NOUN  n-s---fn-,
        Number: Sing
      },
      head: 2,
      deprel: nsubjpass
    },
    {
      id: 2,
      form: iacta,
      upostag: VERB,
      xpostag: v-srppfn-,
      feats: {
        Case: Nom,
        VerbForm: Part,
        Gender: Fem,
        fPOS: VERB  v-srppfn-,
        Number: Sing,
        Tense: Past,
        Aspect: Perf,
        Voice: Pass
      },
      head: 0,
      deprel: ROOT
    },
    {
      id: 3,
      form: est,
      upostag: VERB,
      xpostag: v3spia---,
      feats: {
        VerbForm: Fin,
        fPOS: VERB  v3spia---,
        Number: Sing,
        Person: 3,
        Tense: Pres,
        Voice: Act,
        Mood: Ind
      },
      head: 2,
      deprel: auxpass
    }
  ]
]

The default model is the first one in the PARSEY_MODELS list (in this case Latin). To use another, use the language query param: (must also match the model name exactly)

$ curl -H Content-Type:text/plain --data-binary The die is cast http://localhost:5000/?language=English

Returns:

[
  [
    {
      id: 1,
      form: The,
      upostag: DET,
      xpostag: DT,
      feats: {
        Definite: Def,
        fPOS: DET  DT,
        PronType: Art
      },
      head: 2,
      deprel: det
    },
    {
      id: 2,
      form: die,
      upostag: NOUN,
      xpostag: NN,
      feats: {
        fPOS: NOUN  NN,
        Number: Sing
      },
      head: 4,
      deprel: nsubj
    },
    {
      id: 3,
      form: is,
      upostag: VERB,
      xpostag: VBZ,
      feats: {
        Mood: Ind,
        fPOS: VERB  VBZ,
        Number: Sing,
        Person: 3,
        Tense: Pres,
        VerbForm: Fin
      },
      head: 4,
      deprel: cop
    },
    {
      id: 4,
      form: cast,
      upostag: ADJ,
      xpostag: JJ,
      feats: {
        fPOS: ADJ  JJ,
        Degree: Pos
      },
      head: 0,
      deprel: ROOT
    }
  ]
]

Share Now!

Details

  • Released
    :

    January 17, 2018

  • Last Updated
    :

    June 1, 2020

  • Categories
    :

Share Your Valuable Opinions

You must log in to submit a review.