Don't Back Down

Posted by Chad Everett on January 16, 2005

Cleaning up Metadata »

You may notice that I have all sorts of metadata in my page headers, and this (theoretically) helps the machine processing of this data, so that it will be more accurately indexed, at which point it should be found more readily when needed. I've always tried to provide clean metadata. The problem is that much of it isn't documented particularly well. At least, not in one place. So I did some digging and made some changes. Here are the results, in case you're interested in doing the same.

I removed the http-equiv meta tag for Content-Type. Mostly because I was confused. I didn't understand why it would be sent with the page itself through the HTTP session, then set again in the header. Apparently I'm not alone. I found this page that talks about the subject, and specifically it recommends dropping it. Apparently the only really useful reason for the tag is compatibility with HTTP 0.9. I'm sorry if there are some out there using this, but I'm not, so it shouldn't be an issue here.

Next I moved on to the DC.Format tag, which incidentally created similar thoughts to me. Why do we keep sending this value? So I searched and found this page which includes a guideline recommending that you "Do not create <dc.format> metadata for resources if they form part of an HTML page". That's what I'm doing, so it's dropped too.

I then went through the rest of the Dublin Core elements, and realized that I only had a partial understanding of what they did. So I did some more digging. I found this excellent guide to different tags, and it helped immeasurably. Here's a brief summary of the changes.

I removed the DC.subject lines, as they are really like keywords, and with all the talking I do, I don't know if they are really necessary. I removed the DC.publisher information, because it didn't really make sense to duplicate it all over again from DC.creator. I added the DC.date.modified tag to indicate the last date that the site was changed. I added DC.creator tags for fax, phone and postal, so if you want to contact me, you don't have any excuse.

The last change to the Dublin Core elements was to the DC.language tag, so that it uses a scheme of Z3953, which indicates the use of the NISO Z39.53 Language Codes, which is recommended for compatibility with MARC. I really don't know what that means. It has something to do with libraries and reference data, and it sounded like a good argument.

I still need to do some work on DC.identifier, because it always points to the root. And I'd really like to implement the DC.relation.ischildof tag on things, to really build some structure. But that will take some doing, so it's not done yet. For now, check out the pages at Everitz Consulting for some examples of what this means.

Finally, I added the no-email-collection tag to the header, which is used on all pages, so we'll see if it works. I picked up this tip from Project Honeypot (though I can't find the exact page right now). I'll enable some more stuff from Project Honeypot along the way, but I'm not ready just yet. Have to save something to do tomorrow!

Related Entries

Post a comment