5 things I learned building 85 semantic search indexes

Reading Time

6 min read

I recently logged into my Vantage sales tracking tool and noticed that I had surpassed nearly 85 customer-built collections. Having worked daily with dozens of customers, I've been loading and demoing the power of Vantage and semantic search for almost a year. Along the way, I've learned a lot and wanted to share the most important lessons from those 85 search indexes.

These 5 insights will be extremely helpful for Vantage customers and anyone building semantic search and using generative AI to create exquisite and delightful e-commerce search experiences.

My customers are primarily B2C-focused, with the vast majority being consumer brands, retailers, or marketplaces. The remaining are still consumer-centric, but are in publishing or media. The overarching theme is that consumer experiences (search pages, product recommendations) need a significant upgrade in the new AI world; keyword search is dead, and semantic AI-driven search is the new king.

So, what are the five things that anyone using Vantage to build their next-generation search needs to know?

1. The tools and tech aren't hard, but they do papercut you.

Building embeddings, loading them into a vector store, and doing cosine search is relatively easy. However, building a well-performing AI-powered search from the ground up is hard. Managing packages, embedding models, consistency, performance, rate limiting—well, you get the idea! The prototype can easily demonstrate powerful semantic similarity for searching articles, products, recipes, cars, experiences, media, etc. The difficult part is getting multiple technologies and providers to work in unison so your search just works, fast.

Vantage manages the tedious and complex parts, allowing you to focus on your data and build delightful solutions.

2. It's about the data, stupid!

There is a saying from a US presidential election that epitomizes focus and importance; stupid is used affectionately.Filtering, how embeddings are stored and searched, and embedding models all matter, of course! However, a clear, concise, salient set of text that describes the style, use, and attributes in real human-understandable terms is the number one predictor of great results out of the box. Extraneous and copious data droning on about quality and shipping time in every listing? Not good. Using the same text in every product description in your catalog (great, excellent, outstanding, perfect)? Also not good.

Clear, concise text (3-4 paragraphs is a good rule of thumb) describing the salient aspects of the product, its context of use, and latent style attributes (patterns, influences, etc.) is essential.

3. Pictures are worth way more than 1,000 words (or floats!).

The images we see on product pages are the highest bandwidth interaction between computers and humans today. When you see a small set of images of a shoe, you understand a vast amount of information about it almost instantly. Imagery is a huge part of online discovery, and it can convey more to customers about the product than the product description can quickly and effectively. For customers trying to discover products, imagery is crucial but has only recently become reasonably approachable for most retailers.

You must process images with clever vision LLM prompts or an overlaid trained image+text embedding model and include that in the embedding to be searched. It's crucial the text and image are combined into a single embedding (or at least single model).

4. Semantic makes jumps that delight and can be very creative

Without tuning, most customers love typing in items they want and getting comparable results for typical keyword searches. For example, typing "maxi dress cotton" yields an entire page of maxi dresses, all cotton.

Slight semantic delights, such as searching for slang terms, find and delight in ways that keywords didn't. For example, searching for a gym chain and color can yield spot-on results. However, the models can sometimes make jumps that are reasonable but confusing. For instance, searching for "church dress" might yield a brand that looks like misspelled Gospel (Ghospell), which semantically matches "church."

Adjustments like keyword boosting, fine-tuned embedding models, and query augmentation allow reduction of these creative jumps. However, don't overdo this, as sometimes a little variety and some non-intuitive jumps can actually add to the diversity of your results. Variety in the results, even non-obvious ones, may benefit and delight your users. With keywords, you might have shown ZERO-ZILCH-NADA results before, but now you show some variety and the best if not creative results given your catalog!

5. Search box is just part of discovery; keep going!

The search box is boring. Sure, we all use it, and at Vantage, we start with it. However, the future will look different than a box you type keywords into and get boxes of product images below. The keyword search box could only do text, and simply. My customers regularly make significant leaps forward in their experiences when they broaden the lens and break out of the box (and boxy results).

Semantic text search is relatively easy. Combining images and text is harder. Using products you like and dislike to find items is harder. Nudging results based on products you've bought and reviewed positively, in entirely separate categories, is harder. Vantage has made all of this "harder stuff" API-driven and easy for any developer.

From Pinterest-style visual vibe queries to suggested semantic queries to entire outfits put together by a shopping assistant, success at a business level is achieved with breakthrough and delightful experiences surrounding the search box/results.

‍Your boss will think you are a rockstar not when they type "floral sundress" on your site and get a bunch of floral sundresses (duh), but when floral sundresses are suggested as part of an outfit for a "graduation garden party," along with a wide-brim sunhat (upsell and increased ACS, anyone?). Rockstar achievements, unlocked!

‍

Tag :

No items found.

Light up your catalog with Vantage Discovery

Vantage Discovery is a generative AI-powered SaaS platform that is transforming how users interact with digital content. Founded by the visionary team behind Pinterest's renowned search and discovery engines, Vantage Discovery empowers retailers and publishers to offer their customers unparalleled, intuitive search experiences. By seamlessly integrating with your existing catalog, our platform leverages state-of-the-art language models to deliver highly relevant, context-aware results.

With Vantage Discovery, you can effortlessly enhance your website with semantic search, personalized recommendations, and engaging discovery features - all through an easy to use API. Unlock the true potential of your content and captivate your audience with Vantage Discovery, the ultimate AI-driven search and discovery solution.

Try Vantage Discovery