Our goal is to develop a very simple English grammar checker in the form of a boolean function, isGrammatical(wordList). To begin, we will define a sentence to be grammatical if it contains three words of form: determiner noun verb. Thus
[“the”, “sheep”, “run”] is grammatical but [“run”, “the”, “sheep”] is not grammatical nor is
[“sheep”, “run”]. By tagging convention, determiners are tagged as DT, nouns as NN, verbs as VB, and adjectives (which we discuss later) as JJ. The tagSentence function from the tagger module will help you convert your wordlist to tags; a challenge is that some words (such as swing) have multiple definitions and thus multiple parts of speech (as swing is a noun and swing is a verb). Carefully examine the results returned by tagSentence when defining the logic of isGrammatical.

Note that by our current definition, we are not concerned with other aspects of grammar such as agreement of word forms, thus “an sheep run” would be considered as grammatical for now. (We will revisit this concern later.)


[3 points]
Add the words “thief”, “crisis”, “foot”, and “calf” to the lexicon.txt file, and modify wordforms.py so that it generates their plural forms correctly (“thieves”, “crises”, “feet”, “calves”). In terms of categories for the lexicon.txt, thief is a person, crisis and feet are inanimates. Interestingly, ‘calf’ might be an animal (the young cow) but might be an inanimate (the part of the body).

We are not yet using plurals in our sentences, but you should be able to add additional tests within the wordforms.py file to validate that these new words and forms are handled properly.


[3 points]
The original lexicon includes singular nouns such as cat but not plural forms of those nouns such as cats, and thus a sentence such as “the cats run” is not yet deemed as grammatical. However, the wordforms module does include functionality for taking a singular noun and constructing its plural form.

Modify the code in tagger.py so that it recognizes plural forms of nouns and tags them with NNS. We recommend the following approach:


update the loadLexicon function as follows. Every time you encounter a word that is a noun from the original lexicon.txt file, in addition to appending the usual (word,label), add an extra entry that includes the plural form of that noun, as well as a label that adds an ‘s’ to the original label. For example, when encountering ‘cat’ you should not only add (‘cat’,’animal’) to the lexicon but also (‘cats’,’animals’).


Then update the the labelToTag function to return the new tag NNS when encountering one of those plural labels such as ‘animals’.


Test your new functionality on sentences such as “the cats run” as well as with your new plurals such as “thieves”.

[2 points]
The original lexicon includes verbs such as run but not 3rd person singular forms of those verbs that end in -s such as runs, and thus a sentence such as “the cat runs” is not yet deemed as grammatical. As you did with nouns in the previous step, modify the software so that 3rd person singular verbs are added to the lexicon and tagged as VBZ. Test this new functionality on sentences such as “the cat runs” and
“the sheep runs”.


[2 points]
Now we have both plural nouns and 3rd person singular verbs, however the grammar checker currently doesn’t match them and thus accepts sentences like
“the cat run” and “the cats runs”, neither of which is truly grammatical. Update the rules so that it requires the tag VBZ on the verb if the noun has the singular tag NN, and requires the tag VB on the verb if the noun has the plural tag NNS. Add some test cases to ensure the program is working as intended. Include tests with unusual forms such as the noun “sheep” that can be singular or plural; thus “the sheep runs” and “the sheep run” are grammatical. Make sure to update any previous tests to conform to these new expectations.


[2 points]
Update your grammar to allow any number of optional adjectives between the determiner and noun in a sentence, thereby allowing sentences such as
“the big sheep run” and “the big white sheep run”, though disallowing sentences such as “the cat sheep run”.