Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Filter by Categories
About Article
Analyze Data
Archive
Best Practices
Better Outputs
Blog
Code Optimization
Code Quality
Command Line
Daily tips
Dashboard
Data Analysis & Manipulation
Data Engineer
Data Visualization
DataFrame
Delta Lake
DevOps
DuckDB
Environment Management
Feature Engineer
Git
Jupyter Notebook
LLM
LLM
Machine Learning
Machine Learning
Machine Learning & AI
Manage Data
MLOps
Natural Language Processing
NumPy
Pandas
Polars
PySpark
Python Tips
Python Utilities
Python Utilities
Scrape Data
SQL
Testing
Time Series
Tools
Visualization
Visualization & Reporting
Workflow & Automation
Workflow Automation

Beyond Keywords: Building a Semantic Recipe Search Engine

Table of Contents

Beyond Keywords: Building a Semantic Recipe Search Engine

Semantic search enables content discovery based on meaning rather than just keywords. This approach uses vector embeddings – numerical representations of text that capture semantic essence.

By converting text to vector embeddings, we can quantify semantic similarity between different pieces of content in a high-dimensional vector space. This allows for comparison and search based on underlying meaning, surpassing simple keyword matching.

Here’s a Python implementation of semantic search for recipe recommendations using sentence-transformers:

  1. Import necessary libraries for creating sentence embeddings and calculating similarity:
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
  1. Create a list of recipe titles that we’ll use for our search:
recipes = [
    "Banana and Date Sweetened Oatmeal Cookies",
    "No-Bake Berry Chia Seed Pudding",
    "Deep-Fried Oreo Sundae with Caramel Sauce",
    "Loaded Bacon Cheeseburger Pizza",
]
  1. Load a pre-trained model for creating sentence embeddings
model = SentenceTransformer('all-MiniLM-L6-v2')
  1. Create vector representations (embeddings) for all the recipe titles.
recipe_embeddings = model.encode(recipes)
  1. Define search function that takes a query and number of results to return. It creates an embedding for the query, calculates similarities with all recipes, and returns the top k similar recipes.
def find_similar_recipes(query, top_k=2):
    query_embedding = model.encode([query])
    similarities = cosine_similarity(query_embedding, recipe_embeddings)[0]
    top_indices = similarities.argsort()[-top_k:][::-1]
    return [(recipes[i], similarities[i]) for i in top_indices]
  1. Set up a test query and calls the function to find similar recipes.
query = "healthy dessert without sugar"
results = find_similar_recipes(query)
  1. Print the query and the most similar recipes with their similarity scores.
print(f"Query: {query}")
print("Most similar recipes:")
for recipe, score in results:
    print(f"- {recipe} (Similarity: {score:.2f})")

Output:

Query: healthy dessert without sugar
Most similar recipes:
- No-Bake Berry Chia Seed Pudding (Similarity: 0.55)
- Banana and Date Sweetened Oatmeal Cookies (Similarity: 0.43)

This implementation successfully identifies healthier dessert options, understanding that ingredients like berries, chia seeds, bananas, and dates are often used in healthy, sugar-free desserts. It excludes clearly unhealthy options, demonstrating comprehension of “healthy” in the dessert context. The score difference (0.55 vs 0.43) indicates that the model considers the chia seed pudding a closer match to the concept of a healthy, sugar-free dessert than the oatmeal cookies.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top

Work with Khuyen Tran

Work with Khuyen Tran