Preprocess Text

Functions to clean text in a pandas dataframe column.

Preprocess Text Utils

remove_newline_chars(df_col) Remove new line and/or carriage return from dataframe column.
remove_digits(df_col) Remove digits.
remove_non_char(df_col) Remove non-alphabetic tokens: [#<>=.,;:$&*|?'"-()%]
custom_replace(df_col[, change_from, change_to]) Replace tokens.
remove_url(df_col) Remove hyperlink / url.
remove_email(df_col) Remove email address.
remove_consecutive_spaces(df_col) Remove consecutive white spaces.
remove_stopwords(df_col) Remove stopwords.
remove_accented_chars(df_col) Remove accented characters.
remove_punctuation(df_col) Remove punctuation.
remove_repeating_letters(df_col) Remove repeating letters with a minimum threshold of 2.
clean_text(df_col) Function that combines all text pre-processing tasks.