Natural Language Processing Archives

Efficient Keyword Extraction and Replacement with FlashText

Leave a Comment / Natural Language Processing / Khuyen Tran

If you want to perform fast keyword extraction and replacement in text, use FlashText.

Link to FlashText.

Try out this code.

Efficient Keyword Extraction and Replacement with FlashText Read More »

Convert number to words

Leave a Comment / Natural Language Processing / Khuyen Tran

When data contains both a numerical value (2019) and a written expression (‘two thousand and nineteen’) that represent the same quantity, it’s essential for them to match for accurate NLP interpretation.

This can be achieved by using num2words, which helps convert numbers to their word equivalent. The library can also generate ordinal numbers and support multiple languages.

Link to num2words.

Convert number to words Read More »

Galatic: Clean and Analyze Massive Text Datasets

Leave a Comment / Analyze Data, Feature Engineer, Natural Language Processing / Khuyen Tran

If you want to clean, gain insights, and create embeddings from massive unstructured text datasets, use Galatic.

Link to Galatic.

Galatic: Clean and Analyze Massive Text Datasets Read More »

txtai: All-in-one open-source embeddings database for semantic search

Leave a Comment / LLM Tools, Machine Learning Tools, Natural Language Processing / Khuyen Tran

Traditional search systems rely on keywords to retrieve data, whereas semantic search uses natural language understanding to identify results with similar meanings.

txtai is an all-in-one embedding database for semantic search that enables vector search with SQL, topic modeling, retrieval augmented generation, and more.

Link to txtai.

txtai: All-in-one open-source embeddings database for semantic search Read More »

Simplify LLM Integration with Magentic’s @prompt Decorator

Leave a Comment / LLM Tools, Natural Language Processing / Khuyen Tran

To enhance your code’s natural language skills with LLM effortlessly, try magentic.

With magentic, you can use the @prompt decorator to create functions that return organized LLM results, keeping your code neat and easy to read.

Simplify LLM Integration with Magentic’s @prompt Decorator Read More »

Preprocess Text in One Line of Code with Texthero

Leave a Comment / Feature Engineer, Natural Language Processing / Khuyen Tran

Processing text in a DataFrame often involves writing lengthy code. Texthero simplifies this by enabling one-line preprocessing, including:
– filling missing values
– converting upper case to lower case
– removing digits
– removing punctuation
– removing stopwords
– removing whitespace

Preprocess Text in One Line of Code with Texthero Read More »

newspaper3k: Extract Information From an Article in Two Lines of Code

Leave a Comment / Natural Language Processing / Khuyen Tran

If you want to quickly extract meaningful information from an article in a few lines of code, try newspaper3k.

newspaper3k: Extract Information From an Article in Two Lines of Code Read More »

Visualize the Frequency Tokens in a Text Corpora

3 Comments / Natural Language Processing / Khuyen Tran

If you want to quickly visualize the frequency of tokens in a collection of text documents, use the combination of scikit-learn’s CountVectorizer and Yellowbrick’s FreqDistVisualizer.

Link to Yellowbrick.

My previous tips on visualization.

Visualize the Frequency Tokens in a Text Corpora Read More »