Main | July 2006 »

April 23, 2006

Tag Cloud for Extreme Tales

Can be found here

It is all the resources that I've tagged in del.icio.us. The tagspace is nice and tidy becuase when I tagged the resources I already had the category list to guide me - but stuff keeps creeping in when I forget I am logged in as drk and I also keep meaning to tidy up all my for:extremetales entries ...


Tags: , , , ,


Keyword Cluster Units

Because of the limits of Google News - 20 keywords in a single account - I've bundled keywords into top-pevel categories that approximate projects I'm working on - then for shorthand named them "Keyword Cluster Units".

Each KCU is made up of 1-20 keywords which are inserted (by hand) into Google News - the resulting RSS news feed is then available for further processing anywhere.

Example: Current Obsessions - the keyword unit cluster that tracks news which I am interested in blogging about.

cleanfeed, hackers, privacy-internet, virus, censorship-internet, microsoft-security, copyright, drm, riaa, mpaa, piracy-internet, piracy-sea, malware, blacklists, smartfilter, dmca, spyware, exploit, security-internet, censorware.

The major problem is that each KCU has to be maintained by hand - and everytime the KCU is updated then the Google News RSS feed URL also changes - so even minor changes are a pain and a major re-arrangement is a major pain.

The other problem is that Google News ties everyuser to an email address - so for the 5 current KCUs I need 5 email addresses - login into Google News - make the changes and then cut and paste the changed URL.

I would also like to get rid of the "Top Stories" they clutter the feeds with irrelevant duplicate stories ...

It would be great if Google News supported (a) more than 20 keywords for an account, and (b) a method of clustering keywords into categories and then feeding them to category led RSS feeds.

Right now the situation works well for a few Google News feeds - they all come out marked Google News - but by the time I've republished in FeedBurner and re-written the Feed name with the KCU name - the feeds end up marked with both the category name of the keyword cluster - and the keyword itself as a category within the feed.

It works - but its kludgy and maintenance is a pain - and I still haven't folded the stuff from BlogSpace into SyndicSpace yet ..... and all my attempts at online aggregation have led to nothing but online aggravation.


Tags:


April 15, 2006

In the Beginning ...

I'm still on the holy grail of building the mother of all news filtering engines out of online sources - the story so far.

Google News: searches 4500 news sources - can be adjusted to search by keyword(s) - limit of 20 keyword searches per userid - userid tied to email address - RSS available

Problem - not enough keywords available

Solution: Multiple Google News groups with multiple keyword cluster units - each account clusters keywords according to a category - see the example feeds under keyword clusters.

Next up is tagging the keyword cluster units somehow - I originally wanted to aggregate using rojo into one feed - but no! rojo barfs when i tried to give it more than one raw google news feed.

Solution: Feed each filtered google newsfeed into feedburner and reburn the feed name and tag it with the keyword cluster name - now each feed is named with the keyword cluster unit name and and each post is tagged with the keyword as category!!

This is neat!! - Fed into RSSOwl, Omea Feedreader or rojo I can use keyword cluster name as top-level category and still see the keyword as a subcategory.

e.g. keyword cluster name = Extreme Tales News, category - skateboarding or keyword cluster name = Space Science and category = nasa - and so on.

I know at a glance which cluster (category) the story has come from and also which keyword triggered the story from within Google News.

Now the problems begin.

I have 5 keyword cluster units feeding into 5 feedburner feeds to provide the info I want - but I want to aggregate them into one feed - RSS or Atom - to pass to wherever I want.

Things get screwed from here in.when I subsrcibe to the rojo RSS for "all your stories" it strips out all the carefully placed keyword cluster unit names (blog names) information - and also hoses the category information.

That's two levels of categorisation gone!! I'll look at the raw xml sometime and try and figure out what rojo is doing with my category information - sure as hell Omea can't see it.

In Omea reader I can see the sub-categories of the rojo feeds - but ONLY if I define a feedreader sub-category for that feed. Otherwise nada. This is - as they say - not good.

The last thing I want to do is keep updating Omea categories every time I change a keyword and the idea of putting a 100 categories into Omea Reader just becasuse rojo has stripped them out is horrible.

So while the feedreader feeds tag the stories with the keyword as category - allowing me to see at a glance which keyword it came from - rojo breaks this totally.

Worse yet - RSSOwl will have nothing to do with republished rojo feeds.

So the quest continues - I'll try and update this here and keep it separate from my main blog - its all experimental anyway - and also blog roll the feeds I am playing - maybe better minds than mine know how to solve the problems!!


Tags: