I’ve been working on a startup project on the side for almost a year now – focusing on pattern recognition, natural language processing stuff, and predictive statistical modeling… it’s been fun. At the core, we’ve put together a language analysis engine which looks at a chunk of text and figures out if it’s positive or negative. In researching this as a problem, we’ve determined that if you take three individuals, and then have them categorize the same random text (blog, article, website, tweet, etc) they will agree 63% of the time. There’s a little bit of variance depending on what’s shown, but plus or minus a couple percentage points, is about how accurate a human is. We’ve gone through several different models in doing the predictions, and tweaked the algorithm quite a bit over multiple different versions, but we recently hit a pretty major milestone – we’re now rating articles, or our engine is, with a 70+% accuracy rate. In other words, if we rate something as positive (meaning the author felt positive about whatever they were writing) 70% of the time, the human will agree with how we rated it. ^_^
We’re better at determining human opinion than the average human is.
We’re going to be going into beta soon, on a service that will allow you to track how positive or negative your brand is, by tracking the mentions on the internet – effectively doing sentiment analysis and tracking; if you’re interested, you can sign up here. You can read more about the project in general at www.entrenza.com.
Thanks to Steve & Jesse & Ben, my co-collaborators on the project for making this happen!