Preprocess Text¶
Functions to clean text in a pandas dataframe column.
Preprocess Text Utils¶
| remove_newline_chars(df_col) | Remove new line and/or carriage return from dataframe column. |
| remove_digits(df_col) | Remove digits. |
| remove_non_char(df_col) | Remove non-alphabetic tokens: [#<>=.,;:$&*|?'" -()%] |
| custom_replace(df_col[, change_from, change_to]) | Replace tokens. |
| remove_url(df_col) | Remove hyperlink / url. |
| remove_email(df_col) | Remove email address. |
| remove_consecutive_spaces(df_col) | Remove consecutive white spaces. |
| remove_stopwords(df_col) | Remove stopwords. |
| remove_accented_chars(df_col) | Remove accented characters. |
| remove_punctuation(df_col) | Remove punctuation. |
| remove_repeating_letters(df_col) | Remove repeating letters with a minimum threshold of 2. |
| clean_text(df_col) | Function that combines all text pre-processing tasks. |