Preprocess Text¶
Functions to clean text in a pandas dataframe column.
Preprocess Text Utils¶
remove_newline_chars(df_col) | Remove new line and/or carriage return from dataframe column. |
remove_digits(df_col) | Remove digits. |
remove_non_char(df_col) | Remove non-alphabetic tokens: [#<>=.,;:$&*|?'" -()%] |
custom_replace(df_col[, change_from, change_to]) | Replace tokens. |
remove_url(df_col) | Remove hyperlink / url. |
remove_email(df_col) | Remove email address. |
remove_consecutive_spaces(df_col) | Remove consecutive white spaces. |
remove_stopwords(df_col) | Remove stopwords. |
remove_accented_chars(df_col) | Remove accented characters. |
remove_punctuation(df_col) | Remove punctuation. |
remove_repeating_letters(df_col) | Remove repeating letters with a minimum threshold of 2. |
clean_text(df_col) | Function that combines all text pre-processing tasks. |