Metadata Madness

August 3, 2008

ISO19139 – hello? hello?

Filed under: inspire, iso19139 — metadatamadness @ 4:24 pm

Hello? What were the designers of ISO19139 thinking? Were any of them, in fact, thinking?

Consider this, my fellow sufferers:

gmd:identificationInfo/gmd:MD_DataIdentification/gmd:pointOfContact/gmd:CI_ResponsibleParty/gmd:contactInfo/gmd:CI_Contact/gmd:address/gmd:CI_Address/gmd:electronicMailAddress/gco:CharacterString

This is the hoop-jumping necessary to extricate the email address of a contact person for a data set, from a piece of ISO19139 XML. That’s 10, count ’em, 10 XML element levels deep.

This is metadata madness!

  • Why are so many elements repeated, with a short name then a long one?
  • Why is every text string wrapped in a gco:CharacterString element – just in case an XML parser doesn’t realise that it’s looking at text?

I suspect there’s a tyranny of the toolset at work, an expectation that everyone’s working with W3C XML Schema and not with a more relaxed schema language or, imagine, none at all.

19139 is simply grotesque. It makes a joke of the word “standard”. The INSPIRE expert group on Metadata skirted round the issue, but it’s not being recommended for use in European projects – the dog’s not barking, though ISO may be.

I wouldn’t care, except that it is starting to affect me now. Organisations buy a proprietary toolkit, read some dodgy abstract reference documents which say that 19115 is The Standard Way To Do Geo-Metadata and 19139 is The Standard Way To Put It In XML. Then some poor muggins has to write code to actually re-use the information.

I am replacing an old, minimal elementtree based 19139 parser with an XSLT stylesheet which transforms the data into RDF/XML. The verbosity of XSLT and of 19139 complement each other beautifully, leading to a baroque intertwining which would make great net.art wallpaper but is hopeless for information management purposes.

I am not even going near the topic of model overdesign issues with ISO19115 itself because people with influence and vested interest quite sensibly do not care.

July 16, 2007

Metadata on the European SDI

Filed under: Uncategorized — metadatamadness @ 6:59 pm

“Metadata” was the only specific technology theme to have a dedicated session at the 13th EC-GIS workshop in Porto. I was glad of the opportunity to present results from the first version of Terradue’s data distribution system, a “GeoPortal” interface to tilesets available via BitTorrent. Discover the Difference, Share the Load – slides in .pdf or in OpenOffice .odp format.

It was a chance to talk on favourite topics; foremost how, in moving spatial data around on networks, the GIS community exists in a bubble. It’s worth looking outside the OGC and ISO standards base, doing more in systems design than playing it safe. GeoJSON, for example, is getting traction on the internet for a lot of lightweight web services. Interlis 2 is no longer Switzerland’s best kept secret, FME’s recent open source release of Interlis libraries (though of course OGR has support too) is helping with that.

(And these are just different data modelling ecologies within GIS, not outside it – in library science, physics simulation, or warehouse logistics. It was refreshing to hear over the 3 days, from so many different angles, “we would benefit from sharing systems design decisions with other domains.”)

Nick Land, representing the consortium of Europe’s National Mapping Agencies, asked in an earlier session essentially, “why are you all still arguing about metadata standards? we sorted all this out [and it became concrete as ISO19115] many years ago.”

“Internet time” may be a truism, but consensus changes; technologies change and become commoditised; the norms of business process alter. For me, getting metadata right is a springboard to a “next-generation” kind of spatial data search.

I wanted time to do more than glance at the different search strategies which minimal, structured descriptions of data can support.

  • Less reliance on text description of data
  • Networks of data users
  • Spatial proximity and scale
  • Similarity of geometries and of properties
  • Reuse in applications!

Some of this narrative, I thought, would be too “far out” for this gathering. I blushed when Thomas Vogele stood up to present his “somewhat more conventional” but extremely functional project, PortalU, a metadata collection and search service for environmental data in Germany, of which more another time.

I was cheered by the tone of Michael Gould’s talk on “Implicit Geo-Metadata”. His group are supporting extensions to gvSIG to handle the MEF format for metadata in data package interchange pioneered by GeoNetwork. He talked of more work being done in the client, semi-automatic extraction and submission of metadata to registries, of “metadata growing as the data is used”; attaching and recording the many tacit statements about data made in the use and the exchange of it. With a common core of agreement and simple ways for machines to compare resources.

Two of the speakers in this session – Michael Gould and Thomas Vogele – were members of the Metadata Drafting Team putting together the Implementing Rules for that part of the INSPIRE Directive. Both spoke with optimism and assurance about the “minimal abstract model” approach in the Metadata IRs, and the coherence of the public feedback about it.

An ongoing revelation for me is that the ‘call for simplicity and collaboration’ in GIS is not just coming from Web 2.0 neogeographers. I think back to a great talk I heard at XTech 2007 about real-world metadata registries in scientific collaboration. Similar sets of keyphrases – the minimal abstract model for metadata; the exchange of small domain models or schemas.

So I am not discouraged by Ed Parsons’ disappointment on the way metadata and data search are being talked about:

I am however disappointed by the continued focus on metadata driven catalogue services as the primary mechanism to find geospatial data, I don’t believe this will work as nobody likes creating metadata, and catalogue services are unproved.

INSPIRE needs GeoSearch !!

Getting better metadata into public indexes is everyone’s concern. This is one reason the Open Knowledge Foundation just started the Comprehensive Knowledge Archive Network – to provide a metadata registry of sources known to be usable under an open license or in the public domain. Are we at OKFN behind the times? The future can look a lot like the distant past.

Having Google swoop in, index your data archive and gain the value from having the interface to it, is not a desirable or sustainable option for a lot of public authorities. With a massive, uninspectable index of data elsewhere, we don’t have access to the deep implicit context around data, needed to build systems that aren’t in the large data collectors’ philosophy.

April 2, 2007

Hacking on metadata at FOSS4G 2007

Filed under: foss4g, geonetwork — metadatamadness @ 5:00 pm

At Jody Garnett‘s suggestion I added a listing for a code sprint on the day after FOSS4G 2007 centred around GeoNetwork and exploring different kinds of metadata madness.

The outline says, Building crawler/harvester/aggregator applications on top of the GeoNetwork metadata catalog network and similar interfaces. Plugging client stuff like gvSIG and uDig into it. My plan was to camp in the middle of all the other code sprint sessions and try to attract defectors from projects like GeoTools and OpenLayers.

Stefan asked about remote participation, and yes OSGeo has a very broad IRC culture, and it would be interesting to organise voice conferencing for such an event, so that project contributors who aren’t able to attend the conference could participate more in any sprint. And I expect a lot of wiki and trac documentation of the sessions will happen.

Will anything happen in preparation for such an event? Well, what happens organically anyway, hopefully lots of experiments with different standards, with RDF, and with spatial extensions to ebRIM , to OAI-PMH and whatever else comes along and looks useful. September’s a way off in internet time…

March 30, 2007

after all, why blog about metadata?

Filed under: inspire, iso19115 — metadatamadness @ 8:40 pm

Recently my work has become far too niche, acronym-ridden and full of curious and monotonous purpose to inflict on the Mapping Hacks blog. Recently i helped co-ordinate a free and open source software community response to the draft Implementing Rules for Metadata underlying the INSPIRE directive establishing a spatial data infrastructure in Europe, *deep breath*, and I learned a lot during that process and while trying to follow the corresponding US process of establishment of a new metadata profile based on the ISO19115 standard. A couple of weeks ago I had a look at ISO 19115 in this rough essay written after reading the draft North American Profile for metadata, and I’m not alone in holding a dissenting view on the grounds of overcomplexity and lack of machine-reusability.

I’ve been researching metadata models, exchange interfaces and appropriate standards, for a BitTorrent-based data distribution project with Terradue, using GeoNetwork with a mimimal Dublin Core based profile using GeoRSS and iCal to indicate more specific spatio temporal events, based on a simple model called called DCLite4G, a minimal information model for metadata oriented towards GeoRSS and RDF. This is something i have worked on via wiki and email over the last year with Stefan Keller, based on a collective effort by the Geodata Committee at the Open Source Geospatial Foundation, using the FGDC Core standard model as a reference.

Recently I gave a talk to a cosy geoforum convened by Stefan in Zurich. The slides for my talk (huge 23Mb pdf) are partly more visual illustration than they are narrative of what I was actually saying; I have a half-written essay about “open process” in geodata re-use and redistribution which I’ll post here when it’s done.

At the Open Knowledge Foundation Rufus has been doing some good work on a web interface for a testable generic metadata repository service for data packages, with transparent versioning in the backend. I hope at some point the work on geospatial data contribution and search services, with the advice of people in the “information retrieval” community, will connect up with this sort of thing.

So I would like to talk on a blog about all this sort of thing and consider that if even just three people really connect with it, the time spent writing it will have been totally worthwhile.

Create a free website or blog at WordPress.com.

Design a site like this with WordPress.com
Get started