Bayesian News by Email
I'm the author of a reasonable popular open source app called phplist. My app is capable of handling RSS sources and sending out regular emails. But there's no point setting it up to send Â鶹Éç news emails, because your email service is already doing that and probably even better. But then I thought, inspired by the Bayesian spam filters of Mozilla mail (that I use) why not have a go at Bayesian news filtering. I am rather impressed with the success rate of my spam filter and all of that only based on a reasonably simple algorithm. So I rather quickly whacked up a prototype site of my system, with some advanced configuration in the emails that allows tagging news stories to be "interesting" or "not interesting" which can then be fed back to the system to update the Bayesian filters, and could be used to personalise the news stories from the Â鶹Éç. After all, aren't we all receiving too much information anyway.
For now, the system only registers the filters and updates the database. There is no Bayesian algorithm involved at all. But for proper filtering that will take a while anyway. But on the whole the entire infrastructure for this to work is now set up and filtering can be activated with just a little more time to implement it. After all, Bayesian filtering is not that complex (if you read https://www.paulgraham.com/better.html). The system runs fully automated, pulling off RSS, and sending the emails without any intervention. I have had a system like this running for the indymedia website for easily 2 years now (just to test that it works, hardly anybody actually knows about that). But then again we're talking about rather less volume in this case.
This is really a prototype, and certainly not capable of big loads.
So it's quite obvious what I'd do, if I had more time and money.
- implement the filtering
- make it scalable
- add loads more security and privacy checks
- add some branding
- wrap up the code, either in a plugin for phplist or in the core code).
- probably loads more, there is always something in a project you don't think about.
Comments