How can I train NLTK on the entire Penn Treebank corpus?
It is laborious to collect the corpus with chunk tags, and thus its acquisition is mostly carried out through the transformation of the existing treebank.
Experiments on Chinese TreeBank from different training set size are made. It shows that our approach improves the accuracy of POS tagging over the four training sets with different sizes.