RDF Result sets

[This post was prompted by a discussion with Jeni Tennison and will probably end up cross-posted to our company blog sometime.]

In some recent semantic web applications, where we’ve been creating user interfaces over REST style interfaces over RDF data sets, we found a common pattern emerging – ResultSets. The approach we took has been documented but it’s buried in other details so I’d like to pull out the essential pattern in this post.

Situation

The situation is that your UI (or other client) wants to find all resources that match some criteria and get a description of them. Typically the client wants to see those resources ordered (e.g. in terms of relevance to some original query, or by name or whatever) .

This is not just a SPARQL SELECT. SELECT allows you to find the matching resources and to sort them but it can only extract a fixed set of values from the resources. A key value of RDF is it’s ability to handle schema-less information and not require resource descriptions to be of uniform shape. If we only pull back descriptions via SELECT we loose that.

This not a simple subgraph of the RDF dataset (e.g. as you would get from a DESCRIBE) since then you loose the information on which are the top level matching resources and how they are ordered.

Specifying the query

Abstractly we specify the query using the template:

query(select, var, description)

Where select is a SPARQL select query which extracts the resources we want, possibly ordering them; var is the name of a variable in the select which corresponds to the retrieved resources and description is either the single keyword “DESCRIBE” (meaning that each resource should be returned via a SPARQL DESCRIBE operation) or it is a SPARQL ConstructTemplate which refers to other variables in the select.

In fact there’s a lot of separate machinery for how to build up the query as a series of query refinement operations, but that’s not relevant here.

Returning the results

To return results we provide two abstractions – a ResultSet and a ResultWindow.

A ResultSet:

  • is identified by a URI
  • has RDF metadata to describe that URI (the dataset operated on, the query run, when it ran etc)
  • can be used to open a ResultWindow
    openWindows(ResultSet, {start, {end}})

A ResultSetWindow:

  • is also identified by a URI and identifies:
    • an ordered list of resources
    • an RDF graph containing at least the descriptions of the resources within the window
    • a flag to indicate if the window reaches to the end of the ResultSet

Having a first class representation of the whole result set allows us to pass it around, annotate it, share it, without having to copy the actual results. It is up to the server to decide how eager/lazy to be on evaluation and what caching (if any) to do.

Having a window allows us to probe and page through inconveniently large result sets. If a client opens a window over the whole set or pages through in nice order then the server can still stream results but the server has to be prepared to reissue the query with a LIMIT/OFFSET or rewind the results if the client opens windows out of order.

Packaging up as a RESTful API

So far we’ve been talking abstractly but as well as a Java API for this query interface we want to use it in a RESTful web service setting.

The query endpoint is simple, supporting GET (or POST for large queries):

http://example.com/dataset?query={qstring}&var={name}&description={descstring}

The returned document representation is RDF (in RDF/XML, Turtle or a JSON encoding) describing the result set:

<http://example.com/dataset/resultsetNNN> a rs:ResultSet;
    dc:date "2009-12-15T09:32:42Z"^^xsd:dateTime;
    rs:query qstring;
    rs:var name;
    rs:description descstring;
    rs:firstWindow <http://example.com/dataset/resultsetNNN/0-20>;
    ... optional statistics or other metadata ... .

In the case of a client requesting HTML then they get a rendering of this, which includes clickable links for browsing to the first window of results.

The client can then open a window onto the ResultSet by appending a window description to the returned results set URI:

http://example.com/dataset/resultsetNNN/{start{-end}}

or can follow its nose down the rs:firstWindow reference.

A GET on this window URI, requesting JSON encoding is easy, you get a wrapper something like:

{
    "id" : "http://example.com/dataset/resultsetNNN/0-20",
    "resultset" : "http://example.com/dataset/resultsetNNN",
    "windowStart" : 0,
    "windowEnd" : 15,
    "complete"  : true,
    "results" : [
         {
             "id" : "http://example.com/dataset/someresourceURI",
             "http://example.com/somepropertyURI" : "some property value",
             ...
         }
         ...
    ]
}

So the array of results provides the required ordering and top level resource list, the RDF descriptions are rendered inline as JSON structures.

In the case of requesting some RDF encoding then you get a graph back which contains the metadata trail allowing the client to unpick the result list:

<http://example.com/dataset/resultsetNNN> a rs:ResultSet;
    rs:window <http://example.com/dataset/resultsetNNN/0-20> .

<http://example.com/dataset/resultsetNNN/0-20> a rs:ResultSetWindow;
    rs:windowStart 0;
    rs:windowEnd 15;
    rs:complete true;
    rs:results (
        <http://example.com/dataset/someresourceURI> ... ) .
<http://example.com/dataset/someresourceURI> ... .

So there you go. Some quite minor wrappers round existing technology but it’s a pattern that worked for us.

Posted in rdf

Twitter spam

Had a brief dabble with Twitter again over the last few days.

Still not sure I really get it but will trying sticking with it for a bit longer this time.

However, I did start to think twice once I got hit by Twitter spam. I happened to mention on Twitter that I had done the last tax return for my late father’s estate and so had finally wrapped up the probate stuff. Within an hour I had a UK “probate advisory company” following me until I blocked them. I suppose to some people that might be helpful active targeted marketing. To me it was ugly unwanted opportunistic intrusion. Though impressively efficient of them to spot the opportunity so quickly.

Maybe I just don’t have to right temperament for the modern conduct-your-social-life-in-public style.

OWL 2 for RDF vocabularies

I was at the Bristol Vocamp recently (a fun event) and there was a lot of discussion around issues of validating RDF and checking conformance with lightweight RDF vocabularies.  There was some talk about constraint expression languages but I suggested that OWL, and especially OWL 2, already do quite a lot of that. Couple that to closed-world validation tools in the style of Eyeball and we already have quite a lot of what is needed. In the end I did a brief summary of what’s new in OWL 2 and how it might affect people who focus mostly on RDF/RDFS. This went down well enough I promised to write up the talk. Here’s a first stab at that.

Warning: I’m not an expert on OWL 2, though I have done some work with implementing the OWL 2 RL profile. If you want the real story then go look at the specs :-)

OWL issues for RDF vocabularies

OWL is a compromise. It’s a compromise between the people who want to write down complex ontologies (and so want maximum expressivity) and those who want tractable, even efficient, reasoners who need some constraints on the language. It’s also a compromise between the Description Logic community who have great reasoning technology, but need “neat” constraints on the language to be able to employ it, and the RDF community used to “scruffy” freedom. The compromise means that there are multiple flavours of OWL. There is the Description Logic flavour (DL) that constrains how you use the OWL vocabulary, but allows for complete reasoning support, and the unconstrained RDF semantics (Full).

Early in the lifetime of OWL some owners of RDF vocabularies did try to use OWL to express additional constraints on their vocabularies. For example Dan and Libby added some OWL constructs to foaf. However, people found they tended to end up in OWL Full instead of OWL DL and/or couldn’t express what they wanted. So what were the limitations on using OWL for RDF vocabularies? The main ones were:

  • Strict separation of ObjectProperties (properties that point to other resources) and DatatypeProperties (properties that point to literal values). For example, Dublin Core  allows dc:creator to denote the name of an author as a string or point to a resource such as a FOAF description. That’s not allowed within OWL DL.
  • Strict separate of meta-levels. In some RDF vocabularies you want to annotate your classes and properties with other information such as hints to a UI or data generator. In OWL DL you can have annotations but they have no semantics, which means that you aren’t allowed to add things like range axioms to your annotation properties. Whereas within Full you could say “this annotation should be an integer”.
  • Some key limits on expressivity. In particular you can’t define a key. In foaf you can say state things like your IM Chat ID or the SHA1 hash of your mailbox, which ought to be enough to uniquely identify you. In OWL there is the notion of an InverseFunctionalProperty which looks just like it should allow you to say that, for example, foaf:aimChatID is a unique key. Except that within OWL DL it is only applicable to ObjectProperties, you can’t use it on literal-valued property.
  • Complex to understand. One cost of the compromise is that people found the whole hierarchy of OWL DL, OWL Lite, OWL Full and its relationship to RDFS confusing. The number of non-specialists prepared to read the model theories might have been a bit limited too. Which in turns means there’s a pretty high barrier to implementation.

OWL 2 is a major set of extensions and, mostly, improvements to OWL which solve at least some of these problems and introduce additional features that are useful to people from the RDF side of the house.

What’s new in OWL 2

First let’s be clear – the complexity problem hasn’t been solved!  This is a language with four different syntaxes (not counting the different RDF syntax versions) specified across 13 different documents. So don’t expect anything I say here to be complete or definitive. Though if some part is actually wrong then please tell me about it.

However, buried in this complexity is some stuff I think is useful, for the types of applications I work on, which I’ll pick out.

Syntaxes

OWL had an abstract syntax to help with writing the specs but all OWL ontologies were expressed via RDF. In OWL2 life is more complicated. There is still the moral equivalent of the abstract syntax, now called the functional syntax. There is still an RDF syntax, and the good news is that it is fully backward compatible. However, there is also another textual syntax called the Manchester syntax and an XML-but-not-RDF/XML syntax called OWL/XML. Since OWL can be used for writing down RDF facts as well as all this ontology stuff, that means the world now has at least two new RDF syntax forms. Though in practice I doubt if they’ll get much tool support or uptake for pure RDF usage.

Expressivity

There are lots of new axioms you can express in OWL 2, all motivated by some applications, but several of them are especially useful in an RDF setting …

Qualified Cardinality Restrictions. In OWL you could say that a Person has four limbs, that the value of limb is either of type Arm or type Leg, and even that a Person has some limbs that are Arms and some limbs that are Legs. However, you couldn’t say that they have precisely two limbs which are type Leg and two limbs which are type Arm. In OWL 2 you can. You can combine cardinality restrictions with local range types. This is  big deal for things like medical ontologies. The QCR axioms were at one point in the OWL (1) drafts but got left out on complexity grounds, this was seen in some quarters as a bit of a mistake and getting them back in again was quite a high priority for the OWL 2 group.

Keys. These solve the problem of using literal-valued properties to identify resources. With OWL2′s owl:hasKey you can give a list of properties, both object- and literal-valued properties, that together identify resources of a given type.

Chain properties. This is an interesting extensions which allows OWL 2 to derive uncle from the combination of parent and brother. You can say that a chain of properties, when composed together, imply another property.

Property axioms. OWL 1 had several property types and property axioms – Functional, Symmetric, inverseOf etc. In OWL 2 there are more – Reflexive (so that x R x is always true), Irreflexive (so that x R x is never true), Asymmetric (so that you can’t have both x R y and y R x).

Negative assertions. This is a strange one from an RDF point of view. In OWL 2 you can assert that a certain fact does not hold – that triples like (foo prop val) or (foo rdf:type Bar) are not true. From an RDF point of view this is a big step. It implies that RDF toolkits ought to implement a full three valued logic – yes, no, don’t know – so you can distinguish between triple not present and triple asserted to not be true. From OWL point of view this is just syntactic sugar for something that was already expressible in OWL 1 if you knew the tricks.

Syntactic sugar for disjointness. There also some syntactic sugar to make it easier to say a set of classes are mutually disjoint or that they are both disjoint and together cover some specific larger set.

Punning and annotations

So there’s a few interesting new things that you can state with OWL2. What about all the issues of having to cleanly separate object- and literal- valued properties, to separate meta-levels, to be careful with annotations?

Here there is good news and bad news.

There is a notion in OWL 2 of punning. Roughly this means that the same identifier can be used as if it denoted, for example, both an individual and a class. However, the way this works is that it is as if there were really two different entities; entities that are not really connected but just happen to have the same looking name. So things you say about the individual-nature of some X don’t affect the class-nature of that X, and vice versa. In terms of the OWL semantics this punning works. In terms of web architecture and using URIs to denote things it makes some people uncomfortable.

This punning means that you can do things like treat :Eagle as separately denoting both a class of Birds and an individual of the class :Species. This can be quite useful for some sorts of modelling. It also means that you can say more about annotations. In particular there are new constructs for declaring the domain/range of annotation properties.

The bad news  is that this punning doesn’t extend as far as allowing you to mix Object- and Datatype- properties. So one of the biggest limitations on OWL (1) DL for working with RDF vocabularies is still there. I think in part this is due to problems with having backward compatible syntax, certainly the early proposals for allowing some property type punning did involve a non-monotonic RDF syntax which would have been an absolute nightmare. So if this is the cost of having a backward compatible and stable syntax then it seems to me like a price worth paying. Shame though.

Datatype constraints

This bit is potentially pretty significant for RDF validators.

With OWL 1, and indeed RDFS, when you wanted to validate some literal values you could always use XSD constraints “out of band”. That is, use XSD to define some new datatype like integers between 1 and 42 and then declare that new datatype as the range of your property. The tie up between the URI you use for that datatype in your RDFS declaration and the file where you specified the datatype restrictions was a matter of convention rather than part of the specs. Nevertheless at least Jena supported it, so long as you explicitly loaded in the schema definitions you wanted.

OWL 2 has add a whole lot of machinery which, at least to me, seems to move the XSD data restriction machinery wholesale into OWL. Specifically each of the OWL datatypes now has a set of facets, which correspond to the XSD notion of facets, and allow you to constrain the allowed values to a subset of the datatype’s value space. So that you can now define the range of a property as being only integers between 1 and 3 without having to step outside OWL at all. You can also create unions, complements and intersections of such dataranges – so you can have a value which is either between 1 and 3 or greater than 13 but not 42.

[You might think these datatype combinators weren't needed. OWL already lets you define unions of classes after all. However, the OWL 2 punning also doesn't extend as far as allowing punning between classes and datatypes.]

Profiles

The final innovation in OWL 2 that I want to mention is the notion of Profiles.

With all the new axiom types in OWL 2 implementations are even more complex than for OWL 1 (and of a higher computational complexity class) whereas for some usages people would forgo some expressivity to get easier to implement or faster to run inference engines. OWL 2 profiles do this for you. There are three defined profiles of OWL 2 which each conform to the same semantics but limit what parts of the syntax you can use in return for better performance. These are EL, QL and RL.

EL is particularly good for cases where you have very big ontologies (lots of classes and properties) but they are not too complex.

QL is particularly suited to cases where you have lots of instance data (large “ABox” in the jargon). It allows the instance data to be stored in an RDBMS and accessed via a query rewrite that doesn’t require recursive queries.

RL is suited to implementation by rule based reasoners, including databases with deduction rule support such as datalog engines.

From an RDF point of view it is RL that is particularly interesting because it supports the RDF-based semantics.

The what?

OK I’ve skipped over something up to now. It is still the case that there is a two way split in OWL 2. There is the direct semantics, Description Logic friendly, an extension of OWL (1) DL. There also an RDF-based semantics, much like in OWL 1. The correspondence between the two is a little less direct than in OWL 1 but it’s there. The OWL 2 RDF-based semantics is then an upward-compatible extension of RDF and RDFS and means you can take an RDFS vocabulary and add a few useful bits from OWL 2 and know the exact semantics of what you’ve got. The question is, will that require a full blown theorem prover to reason with? What OWL RL does is provide a profile of OWL which can implemented via simple entailment rules at the RDF level and so gives you an RDFS-compatible fragment of OWL 2 which is likely to be pretty widely supported.

At least that’s the marketing story and it’s largely true but the story on OWL RL implementation and conformance is a bit more complicated than it seems. Subject for a future post!

I should admit that, as far as I can tell, the RL profile doesn’t include DatarangeRestriction support for making use of all those datatype facets and only supports intersection of Dataranges. But I’m sure that can be worked round eventually.

So …

So going back to where we started. The addition of keys plus datatype facets makes a noticeable difference to what OWL 2 do for RDF vocabularies. The big limitation that remains is the need for strict separation of Datatype- and Object- properties for OWL 2 DL. However, for OWL 2 RL, with the RDF semantics, then you can lift that restriction (just as existing OWL (1) reasoners like Jena’s have always been OWL Full reasoners and can mix freely with unconstrained RDFS).

When it comes to validation then there is still the issue that open world and lack of Unique Name Assumption makes is hard to make use of cardinality constraints. This is the thing that most often trips up people new to OWL. That expect that if that say p has cardinality 1 on some class Foo and their data has a Foo with a missing p then the OWL validators will complain. They won’t. The declaration just means that semantically Foo does in fact have a p value, you just don’t know what it is yet. However, there is no problem at all with creating tools which make a closed world and unique name assumption for the purposes of data validation. They aren’t violating the OWL semantics, so long as they don’t purport to be doing OWL consistency checking, they are doing a different job but a useful one.  We’ve had Jena’s Eyeball for a long time now and since it is openly extensible there’s nothing to stop someone adding some additional checkers to, for example, implement the OWL DatarangeRestrictions.

Learning Aikido

Aikido can be a challenging and frustrating art to learn.

When you first start it seems as if there are hundreds of techniques, scores of attacks and thousands of combinations – each with a ten syllable Japanese name. Though, actually, it is not the sheer numbers of moves that makes Aikido difficult. Once you get into it you soon realise that there are only a dozen or so “important” techniques and a similar number of common attacks. While there are indeed a vast number of combinations and variations the basic starting points fit together like a jigsaw. For any attack there are only a few useful starts (a small tenkan to avoid a moving ai katate dori attack, a “back triangular” foot movement for gyaku and so on) and once you’ve done the start there are only a few techniques that make much sense. Later on you find that small changes of angle and posture can create a seemingly unbounded number of variations but you don’t need to worry about that so much to begin with.

So there may be lots of combinations but the number of basic moves to get the hang of is not that great. Pretty soon you start to feel that you’ve seen most of the main moves already. Throw in a bit of blending and remember to breathe, and you are sorted.

Then it starts to get hard.

Watch someone who is fairly new to Aikido perform a common technique – say ai katate dori ikkyo. Then watch a senior do the same move. It’s clear the senior is much better at it (I hope!). They’ll look more relaxed, more in control, with better posture – the technique just works. Then look at how their hands and feet move – to an untrained eye the moves the two aikidoka are making might look pretty similar. So, why the difference in effectiveness? That’s the challenge in Aikido. It is what is sometimes called an inner art. I’ve no idea what that means officially, but it certainly seems to me that the difference between a technique that works, and one that doesn’t, is mostly about how it feels on the inside. As you progress you rapidly understand that the nuances of how you move your hips, how you hold your arms, how relaxed you are, breathing, all make a big difference to how your Aikido works yet only a small difference to what, from the outside, your body seems to be doing.

When I had been training in Aikido for about a year we had a senior guest instructor come to our club. He showed ai katate dori ikkyo. I had done ikkyo many times by then but he said something very useful. He said “Ikkyo, first technique we meet, last technique we learn”. That was a great relief to me. I’d been working hard for a year. I had a shiny yellow belt (well, actually, a pale lemon coloured belt, but that’s another story), but I still didn’t feel I could make the most basic technique like ikkyo work well. That lesson helped me to understand that it wasn’t the details of the angles and how I moved my feet I needed to learn next – it was how to get the feeling of the technique right.

There are a few basic principles that describe the nuances of how Aikido techniques work. We could just write (most of) them down:

Relax. Move from the hips. Lead. Blend. Keep your hands in front of your centre. Use both hands. Relax. Keep weight underside. Maintain ‘one point’. Torifune. Extend (or extend Ki depending on your style). Breath out. Triangle, circle, square. Avoid, balance, control. Join, connect, catch. Keep your head up. Relax.

The trouble is that just writing them down doesn’t help much. Ironically it was a fiction story, whose lead character studied karate, that helped me to understand the problem here. In this story our hero was trying to learn a particular form of snap kick. His sensei told him “lift your leg, relax your knee, let your leg flip up”. He practised for weeks focussing on this move. He didn’t get any better. Then at the end of a particularly long session he was so tired he didn’t think about it and suddenly executed a perfect snap kick. A fellow student saw how good it was and asked how he had done it. “Well, I just just sort of relaxed my knee and let my leg flip up”!

The point is that the descriptions like “relax your knee” do capture what you are striving for, and do describe the principle quite nicely once you have learnt it, but don’t really tell you how to achieve that learning. For that the only thing is practice, and then more practice.

The fact that these principles run so deep, make such a big difference, and yet take so much to develop, is what makes Aikido so endlessly rewarding, as well as frustrating.

So just relax, extend, keep weight-underside and remember to breathe …

Midori at Thornbury Aikido

Midori's holiday 2007 838

Midori and the Thornbury group 2007

For several years now we have been lucky enough to be visited fairly regularly by Midori Sensei (Midori Kajihara). From being a senior Dan grade within Kobayashi dojos Midori has moved on to study Aiki in further depth and each year we marvel at the way her Aikido has developed.

While my normal Aikido is traditional, I have practised Ki style a little and am used to notions like weight underside in which the way you think about your arm moving affects how hard the movement is to resist. Midori’s Aiki takes this a whole several stages further. Notions of join and contact affect how you think about your connection to Uke (your partner) and subtle internal shifts in tension and weight are enough to make the techniques work with little effort and little visible from the outside. This subtlety makes Midori’s style challenging to learn but immensely rewarding when you begin to get a glimmer of what’s going on – especially for someone like me who normally relies on big flowing movements to make it all work!

IMGP0762

A grab with a touch of Yonkyo

IMGP0763

Untwisting her body throws me off with no apparent effort

Using RDFS or OWL as a schema language for validating RDF

[This post is rescued from an ancient SWAD-E FAQ list because I want to update it.]

Many software applications need the ability to test that some input data is complete and correct enough to be processed, e.g. to check the data once so that access functions will not later on break due to missing items. This is commonly done by using a schema language to define what “complete and correct” means in this, syntactic, sense and a schema processor to validate data against the schema.

Developers new to RDF can easily mistake RDFS as being a schema language (perhaps because the ‘S’ stands for schema!), they then get referred to OWL as providing the solution and then get surprised by the results of trying to use OWL this way.

This is a big topic which we’ll just touch on here. In this FAQ entry I just want to illustrate a few of pitfalls and hint at why this is harder than it looks in the hope that it might reduce the “unpleasant surprise” for developers new to OWL.

To spoil the punch line, there isn’t yet a really good schema solution for semantic web applications but one is needed. OWL does allow you to express some (though not all) of the constraints you might like. However, to use it you may need an OWL processor which makes additional assumptions relevant to your application – a generic processor will not do the sort of validation a schema-language user is expecting.

The problems arise from fundamental features of the semantic web:
- open world assumption
- no unique name assumption
- multiple typing
- support for inference

Let’s look at a few examples of schema-like constraints you might want to express:

1. Required property

Suppose you want to express a constraint something like “every document must have an author”. You might say something like:

eg:Document rdf:type owl:Class;
    rdfs:subClassOf [ a owl:Restriction;
        owl:onProperty     dc:author;
        owl:minCardinality 1^^xsd:integer].

 eg:myDoc rdf:type eg:Document .

You might think that if you asked a general OWL processor to validate this it would say “invalid” because eg:myDoc doesn’t have an author. Not so. The OWL restriction is saying something that is supposed to be “true of the world” rather than true of any given data document. So seeing an instance of a Document an OWL processor will conclude that it must have an author (because every Document does) just not one we know about yet. So in fact if you now ask an OWL aware processor for the author of myDoc you might, for example, get back a bNode – an example of the inferential, as opposed to constraint checking, nature of OWL processing. This also fits in with the open world assumption – there may be another triple giving an author for myDoc “out there” somewhere.

Of course, even though general OWL processors behave this way doesn’t prevent one from creating a specialist validator which treats a document as a complete closed description and flags any such missing properties – it is just that a generic OWL reasoner probably won’t do this by default.

2. Limiting the number of properties

A related example is expressing the constraint that “every document can have at most one copyright holder”.

  eg:Document rdf:type owl:Class;
              rdfs:subClassOf [ a owl:Restriction;
               owl:onProperty     eg:copyrightHolder;
               owl:maxCardinality 1^^xsd:integer].

  eg:myDoc rdf:type eg:Document ;
           eg:copyrightHolder eg:institute1 ;
           eg:copyrightHolder eg:institute2 .

Again if you ask a general OWL processor to validate this set of statements you might expect it to complain that there are two values for eg:copyrightHolder. Not so. In this case, the problem is the unique name assumption. On the web two different URIs could refer to the same resource and there is no defined way to tell this. Unless there is an explicit declaration that eg:institute1 and eg:institute2 are owl:differentFrom each other then there is no violation.

Indeed, just like in the first example, what an OWL processor does is the reverse. Instead of noticing a violation it infers additional facts which must be true if the data is consistent, in this case it would infer:

       eg:institute1 owl:sameAs  eg:institute2 .

Again, a specialist OWL processor could be told to make an additional unique name assumption to handle such cases but that is not a good thing to do in general. In fact, using such cardinality constraints (e.g. in the guise of owl:InverseFunctionalProperty or owl:FunctionalProperty) to detect aliases is a powerful and much used feature of OWL.

Life is a little easier if one is dealing with DatatypeProperties because you can tell when two literals are distinct (well even this is hard when you are looking at different xsd number classes but at least strings are easy!).

3. Type constraints

The third common schema requirement is to the limit the types of values a given property can take. For example:

  eg:Document rdf:type owl:Class;
              owl:equivalentClass [ a owl:Restriction;
               owl:onProperty     eg:author ;
               owl:allValuesFrom  eg:Person ].

  eg:myDoc rdf:type eg:Document ;
           eg:author eg:Daffy .
  eg:Daffy rdf:type eg:Duck.

  eg:myDoc2 eg:author eg:Dave .
  eg:Dave rdf:type eg:Person .

Does the myDoc example cause a constraint violation? No. In RDF an instance can be a member of many classes. Unless we are explicitly told that the classes eg:Duck and eg:Person are disjoint then all that happens with the myDoc example is that we infer that eg:Daffy must be a Person as well. Again a specialist processor could be developed to flag a warning in cases where an object is inferred to have type which is not a known supertype of its declared types; again this would be making additional assumptions not warranted in the general case but useful for input validation purposes.

Having got the hang that OWL is more about inference that constraint checking then what about myDoc2? Should the OWL processor infer that myDoc2 is a Document. After all we defined a Document this time using a complete, rather than partial, definition – so that anything for which all authors are Persons should be a document and the author of myDoc2 is a person. The answer, again, is “no”. Just because all the authors we see happen to be people doesn’t mean there aren’t more authors for myDoc2 that we don’t know about.

4. Value ranges

Another common schema requirement is to limit the range of a value. For example to say that an integer representing a day-of-the-month should be between 1 and 31.

Data ranges are not part of OWL at all.

You can express them within XML Schema Datatypes. You could declare a user defined XSD datatype which is an xsd:integer restricted to the range 1 to 31.

There is a problem that XML Schema doesn’t define a standard way of determining the URI for a user defined datatype and the RDF datatyping mechanism requires all datatypes to have a URI. This will hopefully get “clarified” and in any case there is a de facto convention which is straightfoward, used by DAML and supported by toolkits so in the meantime we can be non-standard but get work done.

It also slightly less useful that it seems since the RDF datatyping machinery requires that each literal value have an explict datatype URI – you can’t just give a lexical value and use range constraints to apply the type.

These caveats aside, the xsd user defined datatype machinery is useful and this is the one place where RDFS on its own, without OWL, can do some validation. An RDFS processor should detect if the lexical form of a typed literal does not match the declared datatype.

5. Complex constraints

The final forms of constraints that come up are ones which involve constraints between values. For example, that a pair of properties should form a unique value pair, or that the value of one datatype property must be less than another property of the same resource, or of a related resource.

No such cross-property constraints can be expressed at all OWL.

FAQ: Why do rdfs:domain and :range work backwards?

[This post is rescued from an ancient SWAD-E FAQ list to make it easy to point so since it's a problem that comes up on jena-dev fairly frequently.]

Q. Why do rdfs:domain and rdfs:range seem to work back-to-front when it comes to thinking about the class hierarchy?

A. Because RDFS is a logic-based system. The way rdfs range and domain declarations work is alien to anyone who thinks of RDFS and OWL as being a bit like a type system for a programming language, especially an object oriented language.

To expand on the problem. Suppose we have three classes:
eg:Animal eg:Human eg:Man

And suppose they are linked into the simple class hierarchy:
eg:Man rdfs:subClassOf eg:Human .
eg:Human rdfs:subClassOf eg:Animal .

Now suppose we have property eg:personalName with:
eg:personalName rdfs:domain eg:Human .

The question to ask is this: “can we deduce:
eg:personalName rdfs:domain eg:Man ?"

The answer is “no” the correct such deduction is:
eg:personalName rdfs:domain eg:Animal .

This is completely obvious to anyone who thinks about RDFS as a logic system, however it can be surprising if you are thinking in terms of objects.

A common line of thought is this: “surely [P rdfs:domain C] means roughly that P ‘can be applied to’ objects of type C, just like a type constraint in a programming language. Now all instances of eg:Man are also eg:Human so we can always apply eg:personalName to eg:Man things, doesn’t that mean eg:Man is in the domain of eg:personalName?”

There are two flaws in this line of thought. First, rdfs:domain isn’t really a constraint and doesn’t mean ‘can be applied to’. It means more or less the opposite, it enables an inference not imposes a constraint. [P rdfs:domain C] means that if you see a triple [X P foo] then you are licensed to deduce that X must be of type C. So we can see that if we make the illegal deduction [eg:personalName rdfs:domain eg:Man] then everything we applied eg:personalName to would become a eg:Man and we could no longer have things of type eg:Human which aren’t of type eg:Man. Whereas the correct deduction [eg:personalName rdfs:domain eg:Animal] is safe because every eg:Human is an eg:Animal so the domain deductions don’t tell us anything that wasn’t already true, so to speak!

The second flaw is in the phrasing “is in the domain of”. It is true that eg:Man is, in some sense, “in the domain of” eg:personalName but the correct translation of this loose phase is that “eg:Man is a subclass of the domain of eg:personalName” which is quite different from saying “eg:Man is the domain of eg:personalName.”