2025-09-29T00:00:00.000Z

Building a Semantic Emoji Search That Actually Works

Funnily enough, when searched for 'Donald trump' it shows 'Person Golfing Emoji' πŸ˜‚ Emoji Search Interface - showing semantic search results

Why?

All emoji pickers I use suck. I never get the emojis I want in time because they rely on exact keyword matches rather than understanding meaning and purpose.

In the video above, I wanted πŸ₯€ (wilted flower), but even typing "flower" didn't work in most emoji pickers. They just show you 🌸🌺🌻 and call it a day.

My workaround? Search on Google and click the top result.

How does Google know which emoji I wanted, but dedicated emoji pickers don't?

Because it (somewhat) understands the meaning of the emoji!

The Solution: Semantic Search

This project is based on the principle of semantic retrieval β€” grabbing the best matching result based on conceptual meaning rather than exact keywords.

When EmbeddingGemma-300M was released, this project jumped to the top of my wishlist. I also have an urge to run things locally, which made this even more appealing.

How It Works

  1. Data preparation: I created a JSON file mapping emojis to descriptions using publicly available datasets
  2. Embedding generation: Converted each description into a vector embedding
  3. Storage: Stored embeddings in a FAISS vector database
  4. Query processing: When a user types a query, it:
    • Converts the query into a vector embedding
    • Finds the top-k best matches using cosine similarity
    • Retrieves the corresponding emojis

The Model Journey

Initially, I tried EmbeddingGemma-300M and other models, but they weren't great at the job.

When I switched to all-mpnet-base-v2, the results were incredibly better. This model excels at semantic similarity tasks and provides much more nuanced understanding of emoji meanings.

Try It Yourself

The repo includes support for experimenting with other embedding models too.

Current Limitations

  1. Coverage: Only 1,373 emojis supported, while UTF-8 defines 3,953 emojis (less than 35% coverage)

  2. Cultural context: Some emojis lack proper semantic meaning because their descriptions don't capture how people actually use them. For example, πŸ™ is described as "thank you" or "please," so searching for "namaste" won't find it

  3. Ranking precision: The desired emoji might not always be #1, but typically appears in the top 5 results

Suggestions

If you have suggestions or want to contribute, check out the GitHub repo!