Main | Keyword Cluster Units »

In the Beginning ...

I'm still on the holy grail of building the mother of all news filtering engines out of online sources - the story so far.

Google News: searches 4500 news sources - can be adjusted to search by keyword(s) - limit of 20 keyword searches per userid - userid tied to email address - RSS available

Problem - not enough keywords available

Solution: Multiple Google News groups with multiple keyword cluster units - each account clusters keywords according to a category - see the example feeds under keyword clusters.

Next up is tagging the keyword cluster units somehow - I originally wanted to aggregate using rojo into one feed - but no! rojo barfs when i tried to give it more than one raw google news feed.

Solution: Feed each filtered google newsfeed into feedburner and reburn the feed name and tag it with the keyword cluster name - now each feed is named with the keyword cluster unit name and and each post is tagged with the keyword as category!!

This is neat!! - Fed into RSSOwl, Omea Feedreader or rojo I can use keyword cluster name as top-level category and still see the keyword as a subcategory.

e.g. keyword cluster name = Extreme Tales News, category - skateboarding or keyword cluster name = Space Science and category = nasa - and so on.

I know at a glance which cluster (category) the story has come from and also which keyword triggered the story from within Google News.

Now the problems begin.

I have 5 keyword cluster units feeding into 5 feedburner feeds to provide the info I want - but I want to aggregate them into one feed - RSS or Atom - to pass to wherever I want.

Things get screwed from here in.when I subsrcibe to the rojo RSS for "all your stories" it strips out all the carefully placed keyword cluster unit names (blog names) information - and also hoses the category information.

That's two levels of categorisation gone!! I'll look at the raw xml sometime and try and figure out what rojo is doing with my category information - sure as hell Omea can't see it.

In Omea reader I can see the sub-categories of the rojo feeds - but ONLY if I define a feedreader sub-category for that feed. Otherwise nada. This is - as they say - not good.

The last thing I want to do is keep updating Omea categories every time I change a keyword and the idea of putting a 100 categories into Omea Reader just becasuse rojo has stripped them out is horrible.

So while the feedreader feeds tag the stories with the keyword as category - allowing me to see at a glance which keyword it came from - rojo breaks this totally.

Worse yet - RSSOwl will have nothing to do with republished rojo feeds.

So the quest continues - I'll try and update this here and keep it separate from my main blog - its all experimental anyway - and also blog roll the feeds I am playing - maybe better minds than mine know how to solve the problems!!


Tags: