Garbasail project, part 3

Finally! After some last works on our garbasail we made it to the beach and let it fly. :-)

The last things we had to do was taping the diagonal lines, making the corners a bit stronger with more duct tape, preparing the handlebar (we used one from a bike :-) ), cutting the strings and tying them to the handlebar and the garbasail.

Yesterday we tried it out on the beach in Zandvoort, Netherlands then. The weather was perfect, there was lots of space, strong wind and the garbasail did exactly what we hoped for. It was strong enough to drag us across the beach but not too strong to handle or to break.
Though, after the first two "flights" it did get all tangled up. Untangling it wasn't easy since we didn't want to get into the strings while it was blowing up. We would have needed more people to handle this. Nonetheless, we had about 4 to 5 really good rides and a lot of fun.

I put the photos of our flights and some construction details up on Flickr. This is the first time I use Flickr - I didn't know their interfaces are such a mess. But all in all it's easier than just having a few pictures in the blog entries.

Tags :

Semantic Web security considerations

Apart from the problem of trust and how a Semantic Web agent might rate new information it receives, there are some quite concrete security concerns to be aware of when engineering a Semantic Web application.

Always be careful when you allow for external data to be loaded into your RDF store. Sure this applies everywhere, but people might not be as aware of it for RDF stores and so far I haven't seen any discussion of it.

It might always be a good idea to keep the external data you load separate from the data especially created for the store. You could put it into another repository for example. This should prevent the external data from getting into your inferencing, applying of rules, SPARQLing and other kinds of deducing knowledge that will influence the behaviour of your application. If your application requires the external data to get into the mix, then you really have to deal with context, trust and rating.
In the same way you should of course never put confidential data (user passwords and email addresses) in the same repository as publicly accessible data.

In your store you might use quads to associate triples with a graph URI. Then, when retrieving RDF data from external documents, you could store it using the URI of the document as the graph URI. That way you will always know where the data came from and can treat it in this context. Be careful though: some RDF formats allow you to define named graphs inside of documents. When parsing those documents, your API might probably already associate the triples with a named graph and then, when you store them, store several quads: one for each named graph they're attached to (since you provide a graph URI yourself). As far as I'm aware, the Sesame API would in this case overwrite any graph URIs that the document attached to the triples - your mileage may vary. :-)
So, the problem here is: if in any way the graph URIs in the documents get into your store, then the documents can inject foreign-document data. That is: document A contains triples which are associated with a graph URI which is the URI of document B, thus some data in document A will end up in your store as being from document B (if that's how you interpret the graph URIs).
Doesn't sound very threatening? Why would someone do that? Well, an example: there's a new semantic search website for travel stuff, called UpTake. Read about it on ReadWriteWeb or Paul Miller's blog post on ZDNet. They pull in information about hotels and places to stay and reviews and stuff from lots of sites. Now imagine one of their sources wanted to say something bad about another competing source or about a hotel. How could they do that? Just say "they suck"? No, not effective enough. If other reviews are mostly positive, it will just be regarded as noise. So instead they could publish wrong facts about their competitor as if they stated them. If UpTake would read in an RDF document from them which contains a named graph and that graph is given the URI of a document of the competitor, then there's a danger that they store the data in the graph as coming from the competitor's document.
You could call this vulnerability named graph spoofing or context spoofing. There might be good use cases for accepting named graphs inside externally loaded documents but be careful when you treat graph URIs to mean source documents.

Another danger comes from the power of the N3 format. Note that N3 can do much more than just express the RDF model. In fact, N3 is said to be a Turing-complete language. So if your library can understand all of N3, then it might not only use its formulae (rules) for inferencing (which alone can bring enough trouble) but really execute the N3. Thus, you could get N3 injections.

Apart from security holes, there are other possible attacks. You might want to put limits on SPARQL access: size limits for the requests, size limits for the replies, time limits for the replies, maximum number of requests per day per IP address, etc. There's a reason people don't offer direct SQL read-access to their public data on their database servers. It could provide the users with endless possiblities but it might just as well let them crash your server.

Exporting Miranda's contacts as FOAF

Storing all the contacts in my various address books and networks in one address book file in the FOAF format is part of my vision for organising my digital life better with Semantic Web technologies. I haven't tried Beatnik yet but it might help with that. This is also part of a plan to integrate semantic stuff a bit deeper into my Web site, ultimately showing that we don't need the walled gardens and data silos that are social network platforms today but more on that as soon as I've made some progress with my ideas.

For now we will just try to convert some contact data. I'm not member of many social network sites because I don't like them. I don't like giving them my data. So if I want to manage my social network myself using FOAF, where could I start getting my contacts from? Most of my friends use ICQ and so IM contacts would make for a good start. Fortunately the messenger I use, Miranda IM, can export all my contacts into an easy-to-handle .ini file. To do that you click on its menu, go to "contact list import/export", then "export contact list", confirm the warning, choose where to store the file and say you don't want to export your history.

This file is kept quite simple: for each contact you have a section with a number. In each section you have several attributes. The ones we will use are "ID", "CListNick", "CListGroup", "Proto" and "Hidden". Now we need something to parse the .ini file. I work with Java and used the ini4j library for that. With it we will read in the file and then have access to the sections and the attributes of a section as Java Maps. From there on we just open another file and write the RDF/XML we want to generate as plain text into it (note that we need our URI to relate other people to ourselves).

So, we go through the sections (contacts) and whenever we see one with the attribute "Hidden" we skip it (this is people who wanted to reach you or people you ignored I assume). We use the attribute "CListGroup" to create memberships to foaf:Groups. "CListNick" servers as the foaf:nick of a person. To be able to use "ID" as an identifier we first have to determine the appropriate property to put it into. Therefore we map the value in "Proto" (protocol) to the name of a FOAF attribute: "JABBER" to foaf:jabberID, "ICQ" to foaf:icqChatID and so on. After we finished writing out all the contacts we write the groups and then finish the file. That's it.

Now you have a valid FOAF file with your IM contact list data. This is not perfect yet - the people in it are uniquely identified by their IM IDs but they don't have URIs, neither have the groups we created. Also when you want to map those entries to people from other sources you probably have to do it manually. LiveJournal has IM IDs in their FOAF data but I don't think you can search for them so you could only do it if you already know the people's handles there you want to integrate into your address book.

Take a look at the Java file for more detail.

Blogging, the Semantic Web, languages and the garbasail

I don't really blog much which has a reason: I have planned a lot of blog entries about many different things and they all build on top of each other: I want to explain the things I do to my readers. But I never get to finish any of the entries and so I figure blogging doesn't work this way. I just have to write. Whenever I feel the need to explain something I can still do it later on. And there are other people to explain things anyway.

Well so I have read up on a lot of Semantic Web stuff over the last year and it has given me a vision of what I want to do with computers. I will explain this step by step. But first some random posts about parts of the technology around it which you probably might not understand immediately if you don't know what it's all about.

By the way: as you can see I actually have two blogs, one in English and one in German. Normally I will try to write posts in both languages and then interlink them. But sometimes there are topics which are specific for one language audience (e.g. German politics and press) and sometimes I'm just lazy. :-)

Oh and: the garbasail project hasn't died! We had some bad luck with finding dates for trying it out last year and every time we wanted to there was rain announced (just like right now, it's raining cats and dogs). I hope we'll have more luck this year.

Take care!

dada album cover meme

Via Danny Ayers ...

  1. The first article title on the Wikipedia Random Articles page is the name of your band.
  2. The last four words of the very last quotation on the Random Quotations page is the title of your album.
  3. The third picture in Flickr's Interesting Photos From The Last 7 Days will be your album cover.
  4. Use your graphics programme of choice to throw them together, and post the result.

I got this:

I don't know what it is, but it is dodgy. :-) Hardcore in disguise? I like how the ISO code makes for a good electronical artist name though. :-) My sources were: 1, 2 and 3 (copyrighted :-o ) and I used Jasc Paint Shop Pro 8 (apparently bought up by Corel by now).

Tags :

Ontology design patterns?

I played around with Semantic Web stuff a lot lately and while trying to build ontologies I came across problems which seemed like they could be solved with general design patterns. (Note: it gets a bit technical from here on so you need to know the backgrounds.)

[…]