Overview of the DELI validation process

Mark H. Butler, HP Labs Bristol 19 January 2006

1. Load schema information

1.1 Load namespace defintion file

First DELI loads a namespace definition file (namespaceConfig.xml) that points to several local UAProf schemas. These schemas are corrected versions of the schemas published by the OMA.

 

In some cases the namespace definition file contains namespace alias information, because when some of the schemas were first published they were published from different URIs from those specified in the UAProf specifications. This caused confusion amongst UAProf profile authors about which URI was correct.

 

Therefore if a profile uses a namespace specified in a specification (a known namespace), it is regarded as correct. If it uses a namespace that a schema was published from, but was not used in a specification (known as an aliased namespace) then DELI prints a warning about this but accepts the profile. If the profile uses properties that are not from either a known or aliased namespace, then errors will be printed when those properties are encountered.

 

For example a typical entry in the namespaceConfig.xml file is

 

  <namespace>

   <uri>http://www.wapforum.org/UAPROF/ccppschema-20000405#</uri>

   <aliasUri>http://www.wapforum.org/profiles/ccppschema-20000405#</aliasUri>

   <aliasUri>http://www.wapforum.org/profiles/UAPROF/ccppschema-20000405#</aliasUri>

   <schemaVocabularyFile>config/vocab/ccppschema-20000405.rdfs</schemaVocabularyFile>

  </namespace>

 

indicating to load a schema file, from the location indicated by <schemaVocabularyFile>, and associate it with the known namespace indicated by <uri>, and alias namespaces indicated by <aliasUri>.

 

1.2  Load schemas

Then DELI loads each schema file. It extracts information about

 

-        the namespace associated with a particular vocabulary,

-        the component types specified in the vocabulary,

-        the properties available in each component type,

-        the data types associated with each property,

-        the resolution rule associated with each property,

-        and whether each property is single valued, or can contain multiple values which are either unordered (a Bag) or ordered (a Seq).

 

Prior to UAProf 2, not all of this information is not contained in the RDF Schema, so it is necessary for DELI to process the comments field in the schema. There have been also been instances of inconsistencies between the information in the comments and information in RDF Schema, so even if the information is available in the RDF Schema, DELI still parses the comments and prints a warning when there is an inconsistency.

 

1.3 Load datatype definitions

DELI also loads information about the type of values specific data types can contain. In the case of UAProf 2, this information is stored in an XML Schema file that is provided by the OMA (at the moment xmlschema-20030226.xsd).

 

For vocabularies prior to UAProf 2, this information is stored in an internal DELI config file called uaprofValidatorConfig.xml. This contains regular expressions based on the UAProf specification about what values are legal for each property datatype e.g

Literal</name>

[A-Za-z0-9/.\\\-;:_ ()=*+]+

 

Dimension

[0-9,.]+x[0-9,.]+

 

Number

[0-9]+ 

 

Boolean

(Yes)|(No)|(yes)|(no)

 

(While writing this review I've just compared the two files, and I note that xmlschema-20030226.xsd is considerably more strict about what it will accept than the uaprofValidatorConfig.xml file. This is because the configuration file for profiles prior to UAProf 2 has been changed because there are certain characters not permitted in the UAProf specifications that are in common use in profiles, for example "," in dimension that many devices uses. This may be the wrong behavior.)

 

2. Process profile

2.1 Parsing RDF / XML

Then DELI starts to check the profile. It first uses ARP [1], the parser used in Jena [2], also used in the W3C RDF validation service [3], to convert the UAProf document to an RDF model. This checks the document is conformant with RDF/XML, just as the W3C validator does. If the document does not conform to [4] then errors are printed. There are some common errors because some (now obsolete) UAProf specifications were written before [4] was written, so they feature an older, obsolete version of RDF. Typical examples here include the use of the id attribute instead of the rdf:id attribute or about rather than rdf:about.

 

2.2 Processing RDF model

Then DELI checks the RDF model derived from the profile. It does this as follows: First it locates the root of the profile. It expects the root node will have either uaprof:component properties or uaprof:default properties that have uaprof:component properties. In either case, the uaprof:component properties will point to an anonymous node that has an rdf:type property and a number of UAProf properties. DELI checks that the type property corresponds to a component type defined in one of the schemas loaded in stage 1.2 (i.e. it uses the same local name and namespace). It then proceeds to check each UAProf property attached to the node.

 

In the case of all profiles, it checks that the property corresponds to a property defined by a schema in stage 1.2. If not, it prints an error at this point. It checks that the property is attatched to the correct component. If not, it prints an error at this point. The schema also defines whether the property should be a single value or a Bag or Seq so DELI checks this is true in the profile. If not, it prints an error at this point.

 

In the case of non UAProf 2 profiles, if data type validation is turned on (it is by default) then it also checks that the property value conforms to the regular expression defined in the uaprofValidatorConfig.xml file.

 

In the case of UAProf 2 profiles, it checks that the property has an rdf:datatype attribute and that it corresponds to the datatype defined in the schema. It also checks the property value conforms to the regular expression defined in the XML Schema file associated with the vocabulary, which is currently the xmlschema-20030226.xsd.

 

Because DELI checks the RDF model derived from the profile, the ordering of components and properties do not correspond to the ordering used in the original UAProf document. This is an unfortunate side-effect of using RDF. However DELI users would like line numbers to be be printed with errors, so my intention is to investigate whether DELI can re-analyze the UAProf document once an error has been determined in order to determine a line number.

 

Summary

So, in summary, DELI checks that:

 

i)                a profile is well formed RDF/XML.

ii)              that the correct RDF namespace is used

iii)            a profile only uses known components. This is based on the assumption that unknown components are mis-spelt components.

iv)             a profile only uses known properties. This is based on the assumption that unknown properties are mis-spelt properties

v)               each property is associated with the correct component

vi)             that a property is single valued, sequence valued or bag valued as indicated by the schema

vii)           that the property value conforms to a regular expression for the data type associated with the property in the schema

viii)         in the case of UAProf 2 profiles, that the property has an rdf:datatype attribute containing the correct datatype for the property.

 

Note the assumption used in (iii) and (iv) is known as the closed world assumption. It fails if vendors publish profiles with new properties, but do not publish schemas that define those properties.

 

DELI also has the limitation that schemas must be configured manually, rather than being loaded automatically based on information in the profile. This is because of the previously mentioned errors in some schemas. This limitation will be removed in future versions of DELI.

 

References

 

[1] http://www.hpl.hp.com/personal/jjc/arp/

 

[2] http://jena.sourceforge.net/

 

[3] http://www.w3.org/RDF/Validator/

 

[4] http://www.w3.org/TR/rdf-syntax-grammar/