PorterStemmer#

class cuml.preprocessing.text.stem.PorterStemmer(mode='NLTK_EXTENSIONS')[source]#

A word stemmer based on the Porter stemming algorithm.

Porter, M. “An algorithm for suffix stripping.” Program 14.3 (1980): 130-137.

See http://www.tartarus.org/~martin/PorterStemmer/ for the homepage of the algorithm.

Martin Porter has endorsed several modifications to the Porter algorithm since writing his original paper, and those extensions are included in the implementations on his website. Additionally, others have proposed further improvements to the algorithm, including NLTK contributors. Only below mode is supported currently PorterStemmer.NLTK_EXTENSIONS

  • Implementation that includes further improvements devised by NLTK contributors or taken from other modified implementations found on the web.

Parameters:
mode: Modes of stemming (Only supports (NLTK_EXTENSIONS) currently)

default(“NLTK_EXTENSIONS”)

Methods

stem(word_str_ser)

Stem Words using Porter stemmer

Examples

>>> import cudf
>>> from cuml.preprocessing.text.stem import PorterStemmer
>>> stemmer = PorterStemmer()
>>> word_str_ser =  cudf.Series(['revival','singing','adjustable'])
>>> print(stemmer.stem(word_str_ser))
0     reviv
1      sing
2    adjust
dtype: object
stem(word_str_ser)[source]#

Stem Words using Porter stemmer

Parameters:
word_str_sercudf.Series

A string series of words to stem

Returns:
stemmed_sercudf.Series

Stemmed words strings series