Digital asset management (DAM) vendors have started integrating their systems with one or more of the currently available artificial intelligence (AI) image-recognition API components. Vendors typically advertise these tools as complete automation solutions, which free users from ever having to manually enter image tags again.
When tested with generic stock images, the results appear superficially reasonable. Issues arise, however, not only when the algorithms fail, but also when the material being catalogued is more specialized and requires metadata specific to the organization — a frequent occurrence within businesses.
Let's take a look at why this problem arises and examine a series of techniques which can produce superior results.
Metadata = Context = Meaning = Value
Metadata is traditionally defined as "data about data." This definition doesn't go far enough in explaining its importance. I contend metadata is better described as "contextual data." Context gives perspective or meaning, and that provides value to users. Objects stored inside DAM systems are called "digital assets" in part because of this — they have a value which is implicit in the description. Meaning and value are synonymous, so a reductionist approach to cataloguing asset metadata, such as AI image recognition tools use, will likely result in a corresponding diminution in the value of digital assets.
This happens frequently with DAM software: vendors think they have simplified a problem when they have only simplified their understanding of it. I would argue this is part of the reason why this segment has underperformed relative to other enterprise technologies for so long.
Cataloguing digital assets with metadata is difficult and labor-intensive. And proving a correlation between expertly catalogued digital assets and an improvement in an enterprise's bottom line is a challenge. Although enhancing the findability of digital assets will probably help generate higher ROI from DAM eventually, this is virtually impossible to demonstrate in financial terms. Businesses often assign asset cataloguing to lower-value employees like interns for this reason. Interns and junior employees are the people who know the least about the context of digital assets, yet they are assigned the hardest task and the one where most of the extrinsic value in digital assets gets created.
Cataloguing images is highly complex and far more interpretive than other forms of recognition (e.g. text analysis). What different people "see" in an image can vary due to a wide range of factors, from their knowledge of the subject matter through to cultural differences, age, politics and ethnicity, to name just a few.
As long as we continue to radically underestimate this complexity, the issue will never entirely be resolved. Further, the reductionist approaches employed by software engineers to help them build systems acts as a hindrance when it comes to properly understanding the nature of the relevant metadata problem for visual media. This isn't to say, however, that the enhanced results these systems produce — though imperfect — will not be sufficient in many cases.
What we, as an industry must do now is set more reasonable expectations and create some practical approaches to solving the issue.
The Image Recognition Paradox
Current AI image recognition tools focus entirely on the asset's binary data itself (or the intrinsic value). This creates a paradoxical situation. On the one hand, it increases the likelihood of relevance, because in order to be suggested, a relationship with the asset has to have been established in the first place. On the other hand, there is huge scope for either misinterpretation or a failure to grasp why one group of users might be interested in a digital asset over another.
When I review AI systems with real DAM users (as opposed to the test materials that vendors use to demonstrate them) all kinds of unexpected issues arise. For example, not only do you need to know an image is a "tall glass and steel building," you need to recognize the critical detail that it is your firm's headquarters.
Similarly, for those organizations whose operations deal in a particular specialist area (e.g. they sell shoes), having thousands of digital assets tagged with a very obvious keyword (e.g. "shoe") is worse than useless, yet most visual recognition systems will still propose this keyword.
In order to resolve the image recognition paradox, it is essential to capture extrinsic value (metadata) about a digital asset that is created both before it is identified and after when it is being used. This adds the missing context (or meaning) and can also be used to enhance the quality of the metadata generated using AI image recognition tools as a result.
Combining Digital Asset Supply Chain Metadata With AI Tools
An increasing number of enterprise DAM users have begun to grasp that they operate a digital asset supply chain (whether they intended to or not). In a digital asset supply chain, digital assets pass through a number of points or nodes which add value.
AI tools can interface with the data generated by digital asset supply chains to derive more optimal metadata and improve results. AI text analysis is far further developed than image recognition because it is an easier problem to solve and the range of interpretations, while still considerable, are narrower. This will likely remain the case for the foreseeable future.
When a problem domain is more tightly defined, AI's results improve exponentially. The reason for this is easy to grasp: the fewer the range of variables, the shorter the odds on accurately enumerating and interpreting them. DAM vendors should use this to improve the results from available AI technology.
Accomplishing this requires a continuous improvement approach (which is a key principle of all supply chain management initiatives). Rather than pretending the problem is "solved" and moving on to the next, a continuous improvement approach admits there is always an opportunity to incrementally enhance results. The process itself becomes an asset which will develop and grow into something that enhances value for the business.
Upstream and Downstream Metadata Sources
In supply chain terminology, processes that add value can be differentiated as upstream or downstream.
In the context of digital asset supply chains, upstream means sources of metadata that are generated before the asset is created, frequently before the binary data (intrinsic value) even exists. Downstream means metadata that gets added after the asset is given an identifier (i.e. the point at which it becomes tangible to end users).
Here are some examples of upstream and downstream metadata:
Project Metadata — Upstream
Digital assets are frequently associated with some organizational initiative such as a new product launch, marketing campaign, exhibition, sporting event or other case-specific example. These contain core metadata-like descriptions, categories, keywords and more which apply to every single digital asset associated with them. These can be defined before assets exist and then offered to users (or even automatically inserted) and provide a base set of metadata which can become dynamic after cataloguing, so they can be changed and updated centrally across all relevant digital assets at a later point. This is less AI and more simple automation — and often that is all that is required.
Briefing Materials — Upstream
Before a digital asset is created, a briefing stage takes place, where the team decides what types of digital assets need to be commissioned or purchased. If these discussions are digitally recorded and marked up with an identifier, it is relatively simple for AI text analysis tools to analyze the material and extract key concepts to use as digital asset metadata.
Workflow Metadata — Downstream
AI text analysis can use data generated from discussions and approval documentation as source material. This can generate suggestions to enhance metadata, for example, concepts or subjects that have become relevant after an asset has been uploaded and catalogued.
Asset Usage — Downstream
User requests to access restricted assets can provide a source to optimize cataloguing metadata. If users are constantly mentioning a given project or use-case and the asset is not tagged with that term, text analysis can count the frequency of occurrences and make suggestions.
Lightboxes/Collections — Downstream
Nearly all DAM systems have some kind of user collection or lightbox feature where users can store arbitrary selections. Since this is effectively quantitative data, analyzing these assets can provide insight as to potential correlations between assets which previously went unnoticed. The system can then use this metadata from these assets to cross-fertilize each other via suggestions to those carrying out cataloguing activity.
There are a wide range of other potential usage scenarios for exploiting AI technology to enhance digital asset metadata.
Efficient and Effective Use of AI for DAM
Several points become clear about the current use of AI in DAM:
- While image recognition offers a source of some literal descriptive metadata (i.e. what something looks like in universal terms), it is a poor source of the kind of metadata most enterprise users require to find the relevant digital assets for their needs.
- A superior, and mostly untapped source of contextual awareness which could generate credible metadata is digital asset supply chains and the text or quantitative metadata which they generate.
- Many of the methods currently in use are not particularly sophisticated. They simply depend on the participants in the supply chain being prepared to be better organized. This is a key principle of all effective supply chain management initiatives: get the culture and planning right and potentially cut the complexity and cost of the technology.
- Effective use of AI tech requires more than simply plugging in third party APIs. To get the most value, you will need to spend on consulting and custom implementation work, otherwise the results will not be optimized sufficiently to generate value. That point is still not widely understood by DAM users (nor often mentioned by DAM vendors).
A Final Note on Interoperability
One subject that should be mentioned is that when properly executed, interoperability alone can produce many of the same benefits as AI. Further, the risk and complexity of implementation is significantly lower if people, systems and processes are aligned using some simple standardized protocols and methods.
DAM vendors should reflect on this as it speaks to some fundamental issues within the wider DAM industry. Many vendors would prefer to invest in highly complex solutions that keep them at the front and center of the process, rather than collaborating with peers to achieve the same result for far less effort and expense.
Imagine how much more effective AI technology could become for DAM if interoperability standards both existed and were enforced as a prerequisite of commercial participation in the delivery of solutions to end users?