If you were to log in, you'd be able to get more information on your fellow community member.
Philip, I think it is a very good idea. I wonder if statistical linguistic analysis would be a practical and productive adjunct to manual, iterative regexp coding. Baysian spam filters like DSPAM have been developed to learn very effectively for their intended purpose, but their capabilities have the potential to extend beyond the realm of SPAM filtration. If the guts of a spam filter were to be deployed against the 'scraped' content, it should be able to identify the patterns relevant to the subscriber with a high and improving degree of reliability. What's more, the software would be responsive to user feedback - it would learn what mattered to the subscriber. This might allow the software to develop and continually improve independently of programming hours invested. Regards Richard