In a world full of information, how much of it is repeated or duplicated? More importantly, how can you guarantee that the information is accurate and up-to-date? With ForNova, an aggregation platform, you can identify duplicates and make updates from within a single platform, reducing the number of incorrectly identified listings and lots of embarrassment.
Whether it’s a classified ad or store discount posted in multiple online platforms, incorrectly identified duplicates can result in misinformation for consumers, leaving them with conflicting or incomplete information when searching online. With ForNova, the identification process offers three main components in an effort to enhance its ability to identify duplicates.
Using similar terms for each information field, different terms with same meaning that originate from different sites can be associated to the same information field value used by the aggregating solution. Ultimately the ForNova platform provides customers with a simple tool that allows them to continuously update the normalizing database. The ForNova platform also generates periodic reports of all terms that were not recognized by the normalizing mechanism.
Data Items Comparison
The data items comparison matches policies for each listing information field, including:
- The data in the fields must be the same
- The data in the fields must not contradict
- The data in the fields can differ by a certain value or percentage
Based on the data comparison, the listings duplications logic can be defined:
- All fields must match
- Only some of the fields must match/ or “only part of the field must match”
- At least N of M fields much match
Image Comparison Mechanism
ForNova also employs an image comparison mechanism, which identifies duplicate images even if they have been altered and are slightly different. Designed to support large numbers of listings without significantly increasing the processing requirements, the image comparison is not impacted by size differences or resolution.
Using an artificial intelligence analysis technology, the way humans understand web pages is imitated and reproduced so that information can be analyzed appropriately. ForNova filters out irrelevant data from result pages, so that the correct listings and product structures are identified and information from each listing is harvested.
Designed so as to reduce reliance of semantic analysis, ForNova’s technology boasts being able to search deep into the web to uncover the long tail of topics, languages and websites.
After setting up a specific vertical, the ForNova platform handles a list of unlimited URL's, automatically scraping and aggregating the relevant information. The platform automatically compiles the collected information from detailed listing pages. The information is then available in an open searchable MySQL database.
Overall, ForNova provides scalability designed to cover lots of information across the web using one international platform while providing aggregation results from all sources in one unified structured format. Regardless of location or language, ForNova is able to make it so updating your information is efficient and effective.