2025-09-29T00:00:00.000Z
Building a Semantic Emoji Search That Actually Works
Funnily enough, when searched for 'Donald trump' it shows 'Person Golfing Emoji' π
Why?
All emoji pickers I use suck. I never get the emojis I want in time because they rely on exact keyword matches rather than understanding meaning and purpose.
In the video above, I wanted π₯ (wilted flower), but even typing "flower" didn't work in most emoji pickers. They just show you πΈπΊπ» and call it a day.
My workaround? Search on Google and click the top result.
How does Google know which emoji I wanted, but dedicated emoji pickers don't?
Because it (somewhat) understands the meaning of the emoji!
The Solution: Semantic Search
This project is based on the principle of semantic retrieval β grabbing the best matching result based on conceptual meaning rather than exact keywords.
When EmbeddingGemma-300M was released, this project jumped to the top of my wishlist. I also have an urge to run things locally, which made this even more appealing.
How It Works
- Data preparation: I created a JSON file mapping emojis to descriptions using publicly available datasets
- Embedding generation: Converted each description into a vector embedding
- Storage: Stored embeddings in a FAISS vector database
- Query processing: When a user types a query, it:
- Converts the query into a vector embedding
- Finds the top-k best matches using cosine similarity
- Retrieves the corresponding emojis
The Model Journey
Initially, I tried EmbeddingGemma-300M and other models, but they weren't great at the job.
When I switched to all-mpnet-base-v2, the results were incredibly better. This model excels at semantic similarity tasks and provides much more nuanced understanding of emoji meanings.
Try It Yourself
- Live demo: [https://huggingface.co/spaces/TanishkB/emojisearch]
- Source code: github.com/TanishkBansode/emojisearch
The repo includes support for experimenting with other embedding models too.
Current Limitations
-
Coverage: Only 1,373 emojis supported, while UTF-8 defines 3,953 emojis (less than 35% coverage)
-
Cultural context: Some emojis lack proper semantic meaning because their descriptions don't capture how people actually use them. For example, π is described as "thank you" or "please," so searching for "namaste" won't find it
-
Ranking precision: The desired emoji might not always be #1, but typically appears in the top 5 results
Suggestions
If you have suggestions or want to contribute, check out the GitHub repo!