Treasure in the defects


Suppressing data can be argued well with
Materiality – Observations can be dropped if their absence would be insignificant to aggregates and would not change the directional conclusion of the analysis.
Statistics – Formal methods can be applied for rejecting data. Look at Peirce’s criterion, Grubb’s test, Chauvenet’s criterion, Dixon’s Q test or, frankly, propose a new one that sounds as serious.
Reasonableness – Some elements just don’t make sense.  If one attribute is wrong, the observation may be considered suspicious and discarded.
Completeness – Most databases and statistical tools expect NA’s, nulls or NANs (not a number).  Data can be optional, and processes can be incomplete.  So, dropping empty data is tempting.
Error – The observation violates some stated business rule.  Software captures data and software can have bugs. So, we expect and ignore data as defective.

Source: www.datasciencecentral.com

All those dropped observations have value, though.

First, when we find a problem, we should tell someone.  We don’t have to, but we should. Like that "See Something, Say Something" announcement, communicating exceptions is an analyst’s responsibility.  Software gets fixed, other analysts save time, lessons get learned, customers get a better experience. 

Second, this data may deserve some digging.  If there’s a process, people will find a workaround.  Machine generated data shows that computers do the same thing with controls. Data exceptions have stories that lead to new business rules and pattern discoveries.  As with data errors, we don’t have to pursue these stories, but we should.  Researching outliers has a poor "a priori" business case.  You don’t know what you’ll find. Tracking the value of what you have already learned is almost as good.  That’s an anecdotal business case.

See on Scoop.itData Nerd’s Corner

Advertisements
Leave a comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: