Gnoesis Develops Universal Markup Solution For Web Content
A Montreal-based software and research development company has developed a markup solution and language-neutral asset-descriptor that when fully developed, could result in a universal computer language for representing information in databases, web and document contents and business objects.

While still at a seminal stage of development, the company Gnoesis, aims to address the problem of data fragmentation caused by semantic differences between developers and users from different linguistic backgrounds.


Gnoesis, the company that has developed the language called KODAXIL (Knowledge, Object, Data, Action, and eXtensible Interoperable Language), a data and information representation language, says the new language will replace the XML function of consolidating semantically identical data streams from different languages, by creating a common language to do this.

The extensible semantic markup associated with this language will be understood worldwide and is three times shorter than XML.

Principal Problem is Naming Conventions

The problem, Gnoesis says, is that software developers from different companies in different linguistic areas, and even developers in the same linguistic area, use different names to describe the same business functions, processes, even applications.

While problems associated with interoperability between platforms and applications are obvious, it also has major repercussions even for non-technical users.

Databases, for example, are cluttered with numerous business objects, or functions, that often have two or more different naming conventions.

Multiple vendors offer numerous representation schemes with different names for the same business objects, or textual data in different languages, resulting in a substantial decrease in the size of datasets used for data mining.

The result is that when a search is initiated in a given database, it can only interpret and assemble the information that it semantically understands, resulting in massive information fragmentation, and often unsatisfactory search results.

The problem becomes more serious when companies have to combine operations and IT applications, e.g. in the case of a merger, or commercial alliance.

KODAXIL To Act As ‘Super-Translator’

The purpose of of KODAXIL is to act as a ‘super-translator’ so that no matter what language a user uses to input queries or instructions, based on a newly developed lexicon, it will be universally understood.

The new lexicon, which is constantly evolving the developers say, currently consists of 450,000 words in neutral and variant forms that form the core of KODAXIL (when all variant forms are encoded, the base lexicon will comprise 8 million words).

Practical Application

Using this core, web developers world-wide will only have to concern themselves with creating their own particular websites in their own particular language, as a set of common tools used universally will convert it into KODAXIL text.

This in turn, will ‘translate’ that text automatically for users with different naming conventions, or for users who work in different languages.

Already, Gnoesis is working on the development of a search engine that will interpret queries that have been translated into KODAXIL from a number of different languages.

In such an engine, because all search requests, no matter what their origin, are turned into a common language, the results from the search will be better targeted and produce more accurate and substantial information about the same search topic.

However, Avner Levy, creator of KODAXIL, says that while this ability to jump between different languages is a huge achievement, the principal advantage will be the elimination of the use of different objects, or processes.

“Composing documents using Kodaxil words for use in any language helps abolish linguistic barriers, thus creating a global village,” he said.

“But these important benefits are nowhere near as vital as eliminating the use of different names for the same business objects or processes, an ailment common to IT departments in most corporations or federal agencies; ultimately, applications may use ‘FirstName’, ‘FIRST_NAME’, or ‘姓氏’, but they all point to the same interoperable business object . . .”

Much Done, Much To Do

The development of KODAXIL is ongoing, and Gnoesis says it is already working on a method for converting multi-language information captured from the Internet, books and databases, integrating them into a self-learning conceptual machine.

While the company is optimistic that they will succeed in developing a technology parallel to the semantic web, but without XML-based technologies, a considerable amount of work remains to be done.

With only 450,000 words out of eight million words and variants translated, as well as substantial work yet to be done translating and describing objects, the only thing that is certain about KODAXIL is that it will not be ready soon.

The next step Gnoesis says, is to take KODAXIL open source. Details to follow as soon as they come out so watch this space.