Time
Reading Time
7 min read

The New Frontier of Data Engineering

In early 2023, I wrote an article titled: Embeddings Pipelines are the new ETL. For those outside the data space, ETL (Extract, Transform, Load) was once the common term that Data Engineering has now picked up in its place. I argued that data pipelines would be a major undertaking - but still a solvable data problem - and that an organization's private data would now be more valuable than ever.

A year later, that sentiment hasn't changed, but I now realize I missed a few major steps between 'data' and 'value'. While building a 'library of Alexandria' with unstructured data is a challenging undertaking for most organizations, using your data in new creative ways to create value and drive business outcomes is something that few people are doing, despite the barrier to entry being lower than ever.

Enter the world of AI-powered data engineering, where Large Language Models (LLMs) and prompting techniques are revolutionizing how we process and enrich data.

The Operational Focus of Traditional Product Catalogs

Let's start with a common scenario: a traditional product catalog. Many organizations have product data that's optimized for operational efficiency rather than search or customer experience.

It's worth noting that I generated this example using Claude.ai and formatted it to resemble data you might find in an AS/400 DB2 iSeries database, a system still widely used in many industries for its reliability and performance in handling these operational tasks. While this is a simplified example, it's based on experiences with many retailers and represents a common data structure in the industry.

Here's what such a product catalog might look like:

This data, while perhaps looking basic, wasn’t designed for an era of ecommerce and digital-first economies. It's designed to support a multitude of critical business functions like EDI, supply chain management, and distribution. The structure is engrained in workflows across the global economy in well-established processes, ensuring smooth operations at scale.

However, this type of catalog wasn't initially intended or designed to support advanced features like semantic search or personalized product recommendations. The descriptions are concise and functional, which is perfect for inventory management but may fall short when it comes to powering more nuanced search experiences or catering to diverse audience needs in a digital marketplace.

But here's the exciting part: with the advent of synthetic data and AI-powered enrichment, we can transform this operationally-focused data into a rich resource for advanced search and personalization - without disrupting the existing systems that rely on its current format.

Unlocking the Power of Untapped Data Sources

Here's where things get exciting. Your organization likely has a wealth of untapped data sources that can dramatically enhance your product information. One often-overlooked goldmine? Internal and promotional videos.

Consider the following types of videos your organization might already have:

  1. Marketing and promotional videos: These often showcase products in action, highlighting key features and benefits in a way that's engaging and memorable.
  2. Sales training videos: These are treasure troves of detailed product information. They often include:
    • In-depth explanations of how products work
    • Demonstrations of product use in various scenarios
    • Comparisons with competing products
    • Tips for upselling or cross-selling related items
  3. Customer testimonial videos: These can provide real-world use cases and benefits from the customer's perspective.
  4. Installation or assembly tutorials: For complex products, these videos often contain detailed information about components, compatibility, and best practices.
  5. Maintenance and care instructions: These videos can provide valuable information about product longevity, care requirements, and potential accessories.

Let's look at some specific examples of how these videos could enrich your product data:

Example 1: Custom product labels for craft beer

  • Traditional catalog data: "Pressure-sensitive labels, 12 oz bottle size, waterproof material"
  • Data from promotional video: "Vibrant color printing with special finishes available, resists ice bucket condensation, options for removable adhesives for easy recycling"
  • Data from sales training video: "Ideal for microbreweries looking to stand out on shelves, can handle intricate designs and small text, outperforms competitor X in durability during shipping and handling"

Example 2: All-season tires

  • Traditional catalog data: "205/55R16, all-season tread, 50,000 mile warranty"
  • Data from installation tutorial: "Balanced ride for both wet and light snow conditions, compatible with most sedans and small SUVs, recommended rotation every 5,000 miles"
  • Data from customer testimonial: "Great value for year-round use in Minnesota, noticeable improvement in fuel efficiency compared to previous tires"

Example 3: Cast iron cookware set

  • Traditional catalog data: "5-piece set, pre-seasoned, oven safe up to 500°F"
  • Data from promotional video: "Naturally non-stick when properly maintained, great for stovetop-to-oven recipes, ideal for searing meats and baking cornbread"
  • Data from care instructions video: "Rinse with hot water and brush to clean, dry thoroughly to prevent rust, pairs well with our wooden utensil set to protect seasoning"

These videos contain a treasure trove of detailed, audience-focused information that can significantly enrich your basic product data. By extracting and structuring this information, you can create much richer, more informative product descriptions that not only improve search relevance but also enhance the overall customer experience.

For instance, a small brewery searching for "durable beer bottle labels" would be more likely to find the custom product labels in Example 1 due to the information extracted from the promotional and sales training videos. The additional details about color printing, special finishes, and performance during shipping could be crucial decision-making factors that aren't captured in the basic catalog data.

Similarly, a customer looking for "winter tires for sedan" might find the all-season tires in Example 2 as a relevant option, thanks to the additional context provided by the installation tutorial and customer testimonial.

Turning Videos into Rich Product Descriptions

Let's walk through a practical example. Imagine you're preparing for a major event like Lollapalooza, a music festival held in Chicago every August. You want to tailor your product descriptions to appeal to festival-goers.

Here's a technique to leverage a Lollapalooza promotional video to enhance your product descriptions:

  1. Start with your basic product catalog, such as the product name and description - simpler is better to start.
  2. Feed this catalog and the video into an AI model (in this case, we used Google's Gemini).
  3. Instruct the model to create event-specific descriptions based on the video content.
  4. Iterate. Iterate. Iterate. You will find some data in your catalog helps ground it better than others, but you won’t know until you try it over and over again, making tweaks along the way. There’s an art form here.

The result? Rich, contextual product descriptions that resonate with your target audience. Here's a snippet of what we got:

These enhanced product descriptions now include:

  1. More detailed and evocative product information
  2. Specific references to the festival experience
  3. Extracted keywords for improved searchability
  4. Related product suggestions for cross-selling opportunities
  5. Usage tips that add value for the customer

This level of detail dramatically improves the potential for semantic search. For example:

  • A search for "skin protection for outdoor concert" would now easily surface PRD001, the sunscreen.
  • Someone looking for "authentic Chicago food for music festival" would find PRD002, the hot dog kit.
  • The extracted keywords allow for more nuanced categorization and tagging.
  • The related products and usage tips provide additional context that could match with various user intents and queries.

By leveraging video content to create these rich, contextual descriptions, we've transformed basic product data into a powerful resource for both search algorithms and potential customers. This approach not only improves search relevance but also enhances the overall customer experience by providing comprehensive, situation-specific information about each product.

Implementing This Approach in Your Organization

The beauty of this technique lies in its scalability and adaptability. Here's how you can apply this in your organization:

  1. Identify Your Data Sources: Look beyond traditional databases. Consider marketing materials, training videos, customer testimonials, and even social media content.
  2. Build Automated Pipelines: Create workflows that can extract information from various media types (video, audio, text) and structure it for use.
  3. Leverage AI for Enrichment: Use LLMs to generate context-rich descriptions, tailoring content for different audiences or events.
  4. Integrate with Existing Systems: Ensure your enriched data can be seamlessly incorporated into your existing product databases and search systems.
  5. Iterate and Refine: Continuously improve your process based on user feedback and search performance metrics.

By implementing this approach, you're not just improving search results - you're creating a more dynamic, responsive product catalog that can adapt to various marketing needs and customer contexts.

The future of data is bright, and more than ever, we're entering an era that can empower business analysts and domain experts to make innovative use of data within the organization. There's a treasure trove of information out there - it's now only a matter of deciding where to start and how creative you can be in connecting the dots.

Light up your catalog with Vantage Discovery

Vantage Discovery is a generative AI-powered SaaS platform that is transforming how users interact with digital content. Founded by the visionary team behind Pinterest's renowned search and discovery engines, Vantage Discovery empowers retailers and publishers to offer their customers unparalleled, intuitive search experiences. By seamlessly integrating with your existing catalog, our platform leverages state-of-the-art language models to deliver highly relevant, context-aware results.

With Vantage Discovery, you can effortlessly enhance your website with semantic search, personalized recommendations, and engaging discovery features - all through an easy to use API. Unlock the true potential of your content and captivate your audience with Vantage Discovery, the ultimate AI-driven search and discovery solution.

Our Vantage Point

Introducing Vantage Discovery

Mar 21, 2024
Introducing Vantage Discovery, a generative AI-powered SaaS platform that revolutionizes search, discovery, and personalization for retailers, publishers, brands, and more.
Read More
1 min read

Ecommerce search transcended for the AI age

Mar 20, 2024
Explore search engines and how your ecommerce shop can improve customer experiences via search, discovery and personalization.
Read More
8 min read

How Cooklist brought their catalog to life in unexpected ways

Mar 20, 2024
How semantic search and discovery brought Cooklist’s catalog to life and enabled astounding improvements in customer experience.
Read More
5 min read

Let's create magical customer experiences together.

Join us as we create online search and discovery experiences that make your customers feel understood and engaged.