Wednesday, April 08, 2009

RSS and Archive.org

While I'm a huge fan of Archive.org for the hosting and archiving of podcasts and audio artifacts one of the things that has always been a bit of a let down about the site is the general lack of RSS. While most other sites are now creating RSS feeds for every imaginable content view, Archive.org only has RSS for newly added items to the entire site. Not much help if you are trying to use the site to host podcasts for an event or a project and want those podcasts embedded in the project blog. This recently came up for me specifically in developing the Waiheke Podcasting Project on the Ning platform. After looking at a variety of free podcast hosting solutions I still considered Archive.org to be the most preferred. Not only are they a non-profit organization like we are, they just seem like the best place to store heritage style artifacts to describe our community because they have some solid sponsorships and a good reputation for the work they're doing. Also for our project a big plus was that they don't have a limit on the amount of podcasts that can be hosted there or their file size. This worked well for us because although Ning was going to be great as a community podcast site, it only allows 20MB uploads of audio and there is a limit to the amount of audio each user can have on the site.

All the podcasts that we hosted on Archive.org have a tag applied to them ... so this was halfway towards where I needed it to be. As long as members of the project uploaded their podcasts with the "waiheke podcasting project" tag then we could get a page that displayed all the posts. But how to get this into Ning? The solution was to use a service called Feed43. Feed43 is a free online service that converts any web page to an RSS feed on the fly. It's not as simple as just pointing the URL you want turned into RSS at Feed43 though, it requires a bit of tweaking but the results are pretty good and I now have an RSS feed from the Archive.org page that aggregates all the "waiheke podcasting project" tags, and displays them on our Ning page.

How It Works (the expurgated version)


Basically Feed43 goes off and grabs the URL you provide and displays the HTML generated from that page... so if you're scared of the sight of web page blood and guts then this might not be the job for you. From there you need to try to identify some unique HTML classes or IDs or whatever that is used to display the title, the URL, and any other descriptive information that you're interested in to generate your feed. This is called the search pattern and Feed43 uses the results of these patterns to create a fully formed RSS feed. There's a bit of mucking around in this bit and I didn't really have much luck with my trial-and-error start to this process until I had a good look at the tutorial - then it started to make sense, so I'd suggest doing that if this is something you need to do. Once you've got a result that looks like a nice feed Feed43 will generate you a URL that you can use as an RSS feed to link to from your blog or Ning site!

3 comments:

brent said...

One small thing though that i've noticed in looking into this further is that there's no way to make 'enclosures' using Feed43 .... so the RSS i've made while creating links to the podcasts won't actually let people play them from within their readers. I'm going to try to get a mate to run the feed thru Feedburner which apparently will try to make enclosures from a feed and see if that works.

Wayne said...

Have you explored what the NZ National Library are doing around the Digital Content initiative --- perhaps there's a solution here especially since the National Library have a commitment to archiving NZ cultural aritifacts.

brent said...

hmmm... as of today Feed43 seems to have been down for quite a while. Might need another solution.