In early 2023, I wrote an article titled: Embeddings Pipelines are the new ETL. For those outside the data space, ETL (Extract, Transform, Load) was once the common term that Data Engineering has now picked up in its place. I argued that data pipelines would be a major undertaking - but still a solvable data problem - and that an organization's private data would now be more valuable than ever.
A year later, that sentiment hasn't changed, but I now realize I missed a few major steps between 'data' and 'value'. While building a 'library of Alexandria' with unstructured data is a challenging undertaking for most organizations, using your data in new creative ways to create value and drive business outcomes is something that few people are doing, despite the barrier to entry being lower than ever.
Enter the world of AI-powered data engineering, where Large Language Models (LLMs) and prompting techniques are revolutionizing how we process and enrich data.
Let's start with a common scenario: a traditional product catalog. Many organizations have product data that's optimized for operational efficiency rather than search or customer experience.
It's worth noting that I generated this example using Claude.ai and formatted it to resemble data you might find in an AS/400 DB2 iSeries database, a system still widely used in many industries for its reliability and performance in handling these operational tasks. While this is a simplified example, it's based on experiences with many retailers and represents a common data structure in the industry.
Here's what such a product catalog might look like:
This data, while perhaps looking basic, wasn’t designed for an era of ecommerce and digital-first economies. It's designed to support a multitude of critical business functions like EDI, supply chain management, and distribution. The structure is engrained in workflows across the global economy in well-established processes, ensuring smooth operations at scale.
However, this type of catalog wasn't initially intended or designed to support advanced features like semantic search or personalized product recommendations. The descriptions are concise and functional, which is perfect for inventory management but may fall short when it comes to powering more nuanced search experiences or catering to diverse audience needs in a digital marketplace.
But here's the exciting part: with the advent of synthetic data and AI-powered enrichment, we can transform this operationally-focused data into a rich resource for advanced search and personalization - without disrupting the existing systems that rely on its current format.
Here's where things get exciting. Your organization likely has a wealth of untapped data sources that can dramatically enhance your product information. One often-overlooked goldmine? Internal and promotional videos.
Consider the following types of videos your organization might already have:
Let's look at some specific examples of how these videos could enrich your product data:
These videos contain a treasure trove of detailed, audience-focused information that can significantly enrich your basic product data. By extracting and structuring this information, you can create much richer, more informative product descriptions that not only improve search relevance but also enhance the overall customer experience.
For instance, a small brewery searching for "durable beer bottle labels" would be more likely to find the custom product labels in Example 1 due to the information extracted from the promotional and sales training videos. The additional details about color printing, special finishes, and performance during shipping could be crucial decision-making factors that aren't captured in the basic catalog data.
Similarly, a customer looking for "winter tires for sedan" might find the all-season tires in Example 2 as a relevant option, thanks to the additional context provided by the installation tutorial and customer testimonial.
Let's walk through a practical example. Imagine you're preparing for a major event like Lollapalooza, a music festival held in Chicago every August. You want to tailor your product descriptions to appeal to festival-goers.
Here's a technique to leverage a Lollapalooza promotional video to enhance your product descriptions:
The result? Rich, contextual product descriptions that resonate with your target audience. Here's a snippet of what we got:
These enhanced product descriptions now include:
This level of detail dramatically improves the potential for semantic search. For example:
By leveraging video content to create these rich, contextual descriptions, we've transformed basic product data into a powerful resource for both search algorithms and potential customers. This approach not only improves search relevance but also enhances the overall customer experience by providing comprehensive, situation-specific information about each product.
The beauty of this technique lies in its scalability and adaptability. Here's how you can apply this in your organization:
By implementing this approach, you're not just improving search results - you're creating a more dynamic, responsive product catalog that can adapt to various marketing needs and customer contexts.
The future of data is bright, and more than ever, we're entering an era that can empower business analysts and domain experts to make innovative use of data within the organization. There's a treasure trove of information out there - it's now only a matter of deciding where to start and how creative you can be in connecting the dots.