A few months ago, we implemented HitBox, a site-wide data-gathering package. We didn't think our implementation through well enough before we launched, and a lot of people called us on it! Until we could satisfy ourselves that we'd addressed all the issues you raised, we chose to turn it off. We think we've gotten it right this time, and we'd like to let you know about the changes we've made.
People have asked us why we chose to use a third-party service instead of gathering that data ourselves, and the answer's simple: we'd rather spend our development time making things better. Using a third-party service saves us from having to reinvent the wheel, and it gives us a chance to validate our internal statistics. We're confident that HitBox is the right way to get that data right now, and we want you to be confident, too.
How does HitBox work?
HitBox collects a small amount of anonymous information about visitors to web pages, by sending a random, unique cookie to your browser. They then report the information back to us in bulk, again anonymously. They don't see anything that might connect that cookie back to a specific LJ username, and we don't see any statistics on an individual level. On our end, we will be applying the Hitbox cookie to a random sampling of a very small percentage (~5%) of all visitors.
We know that privacy is important to you, and we know that you trust us with your most sensitive stuff. First of all, we don't send HitBox anything with identifying information. The only thing we send on each pageload is very general data: what browser's being used, the URL of the page, the user's account type (if it's loaded by a logged-in user), and what site scheme and language is being used.
We know that some of our page URLs include identifying information (such as your username, memory category keywords, and so on). We're stripping all that out. For example, http://www.livejournal.com/tools/memories.bml?user=news will get recorded as just http://www.livejournal.com/tools/memories.bml (It's actually far easier for us to interpret the data when it's anonymized like this.)
We're including HitBox code on most site-schemed pages. We're not including it on any administrative areas (such as the Support area), anything where you have to enter sensitive identity data (such as the credit card pages within the payment and shop areas). We're also not including it on anything within your journal -- including all comment pages. Basically, anything under your subdomain (exampleusername.livejournal.com, and the equivalent for communities) won't have the HitBox code added.
We're also not interested in tracking every pageload on the site -- it would overwhelm us, and we don't need that much data to draw our conclusions. If we make sure that the sample size is truly random, we can get the statistical accuracy we need.
If you're uncomfortable with contributing to the aggregate stat-gathering, you can choose not to by going to the Admin Console and typing "set opt_exclude_stats 1". (You can also go to HitBox's site and turn them off Internet-wide.) We're providing this because we know some people are uneasy with the whole idea and we want to respect that, even though we know it runs the risk of statistically biasing our results.
We think we've done our best to balance privacy and information gathering, making sure to research and test this to the best of our abilities, and we think the results will be positive for everyone. If you have any questions, please ask.