Pondering RSS Syncing
Wed Nov 23 13:10:28 EST 2011
I was listening to the latest episode of Build and Analyze on the way home from work yesterday and, as I am wont to do, I started yelling at my iPhone when they started talking about Google Reader and the difficulty of syncing. Admittedly, at the end, they got to the fact that, even if you could do it technically, it'd be tough to make money off of providing an RSS sync server. That part is fair enough, but I still can't let the technical difficulties stand, and I've been thinking more about how it would be done in Domino.
In the basic form, the problem in question is pretty much exactly what NSF and the Notes/Domino relationship is designed to do: seamless replication, deletion stubs, unread marks, and so forth. In fact, RSS syncing is a better fit for the model than mail, since mail required adding all kinds of extra (but useful) functionality, while RSS syncing would just be data and an agent to fetch the feeds periodically.
The way I figure it, there would only be a couple technical hurdles, both related to scaling: storing large volumes of data and fetching new feed content periodically.
Storing large volumes of data might not be too bad. There are a couple ways you could do it. One would be to store the user's list of subscribed feeds and "read" stubs in one database per user, and then store the feeds and feed content in another database (or databases), and do all data access via agents or web services that would pull the data from each distinct location. Another way could be to store the feeds and entries in each user's database, keeping the feed content as attached HTML documents and letting DAOS handle efficient storage. The latter route would let you take advantage of the Domino Data Service and read marks (which DAS conveniently supports).
Updating the feeds would be rough for a single server, but the job could be farmed out to many clusters in a server. You could write agents that would determine the server they're on and, based on, say, its name, pick a slice of the feeds to check, so if you have 10 servers, each would update 10% of the feeds.
I'm sure there'd be other roadblocks during actual implementation (it IS Domino, after all), but I think that'd be basically all you'd need on the server side. The client would be a little tougher, since you couldn't just use NSF and selective replication, but that wouldn't be terribly difficult to handle.
It's too bad it's likely not profitable, between licensing, hosting, and bandwidth costs - it'd be a fun project to try out.