Asking the RIGHT questions

Carla Gentry
Data Scientist Each time I talk to someone about analytics, I ask the same question: “What is your ultimate goal with this project?” Often it is to increase sales or reduce turnover. Of course, this isn’t usually what’s said; initially all I get is a panicked look that says “we can’t get what we want out of our database—it isn’t working right…how do we fix it??”Typically, there is nothing wrong with the database: the master and all its clones are just as they were designed to be, the variables are entered correctly and the reporting functions are pulling exactly what they were coded and designed to pull.So what’s the issue?

Example HR data — Every HRIS or ATS database contains different information (employee addresses, phone numbers, salaries, benefits, and the like), but how much of that information is connected? What I mean is: is there a unique quantifier that connects each table or database together? If I want to select all the people who might retire in the next 5 years out of a database, complete with demographic, sales and personal information, in order to create an organizational plan for this, can I accomplish this with my current database? By design relational databases are just that: “relational.” Therefore, everything should flow, if set up correctly in the very beginning.

Which brings us back to asking the right questions—which might look a bit like these:

  • What is it really that I want to be able to answer with my data collection?
  • Structured vs unstructured data, am I asking the right questions and giving them choices or offering a space to add comments (be careful of this)? Example a) agree b) disagree c( unsure —– Can I work with unstructured data?
  • Are you offering incentives to employees or candidates to “complete additional info” as to glean a more complete pictures of your customer? (5 dollar iTunes or Starbucks card)
  • Do I want to link social data to what I collect from employees? (3rd party sign in via Facebook, Twitter, Google+, etc)
  • Are you including IT in your business meetings?

A gap analysis usually reports on what is missing, but it doesn’t have to be this way (reactive and not proactive). If you ask the right questions initially, your design and results will reflect this. Know what you are trying to accomplish.

If you say “my database is broken,” but what you really mean is “I need to be able to sustain sales throughout the next five years; I’m concerned with increasing what we have in the pipeline,” well, ask for help. Make sure you have the correct data to indicate your sales reps selling habits, including seasonality. What data does your sales department have that can help you answer these and many other questions?

Do you have store performance, or other line of business performance data to help form a more three dimensional view of your candidates or employees? A data-centric view is so valuable, but unless you ask the right questions when collecting your data and setting up your database, you may end up trying to build a predictive model using only name, address and phone number! As Chief Engineer Scotty Montgomery of the USS Enterprise might say, “I can’t do it captain!”

Go well armed on your journey towards predictive analytics and remember to always ask the right questions!

The Data Scientist Team

The Data Scientist Team


I’ve been intrigued with all of the attention that the world of Data Science has received.  It seems that every popular business magazine has published several articles and it’s become a mainstream topic at most industry conferences. One of the things that struck me as odd is that there’s a group of folks that actually believe that all of the activities necessary to deliver new business discoveries with data science can be reasonably addressed by finding individuals that have a cornucopia of technical and business skills.  One popular belief is that a Data Scientist should be able to address all of the business and technical activities necessary to identify, qualify, prove, and explain a business idea with detailed data.

If you can find individuals that comprehend the peculiarities of source data extraction, have mastered data integration techniques, understand parallel algorithms to process tens of billions of records, have worked with specialized data preparation tools, and can debate your company’s business strategy and priorities – Cool!  Hire these folks and chain their leg to the desk as soon as possible.

If you can’t, you might consider building a team that can cover the various roles that are necessary to support a Data Science initiative. There’s a lot more to Data Science than simply processing a pile of data with the latest open source framework.  The roles that you should consider include:

Data Services

Manages the various data repositories that feed data to the analytics effort.  This includes understanding the schemas, tracking the data content, and making sure the platforms are maintained. Companies with existing data warehouses, data marts, or reporting systems typically have a group of folks focused on these activities (DBAs, administrators, etc.).

Data Engineer

Responsible for developing and implementing tools to gather, move, process, and manage data. In most analytics environments, these activities are handled by the data integration team.  In the world of Big Data or Data Science, this isn’t just ETL development for batch files; it also includes processing data streams and handling the cleansing and standardization of numerous structured and unstructured data sources.

Data Manager

Handles the traditional data management or source data stewardship role; the focus is supporting development access and manipulation of data content. This includes tracking the available data sources (internal and external), understanding the location and underlying details of specific attributes, and supporting developers’ code construction efforts.

Production Development

Responsible for packaging the Data Scientist discoveries into a production ready deliverable. This may include (one or) many components: new data attributes, new algorithms, a new data processing method, or an entirely new end-user tool. The goal is to ensure that the discoveries deliver business value.

Data Scientist

The team leader and the individual that excels at analyzing data to help a business gain a competitive edge. They are adept at technical activities and equally qualified to lead a business discussion as to the benefits of a new business strategy or approach. They can tackle all aspects of a problem and often lead the interdisciplinary team to construct an analytics solution.

There’s no shortage of success stories about the amazing data discoveries uncovered by Data Scientists.  In many of those companies, the Data Scientist didn’t have an incumbent data warehousing or analytics environment; they couldn’t pick up the phone to call a data architect, there wasn’t any metadata documentation, and their company didn’t have a standard set of data management tools.  They were on their own.  So, the Data Scientist became “chief cook and bottle washer” for everything that is big data and analytics.

Most companies today have institutionalized data analysis; there are multiple data warehouses, lots of dashboards, and even a query support desk.  And while there’s a big difference between desktop reporting and processing social media feedback, much of the “behind the scenes” data management and data integration work is the same.  If your company already has an incumbent data and analytics environment, it makes sense to leverage existing methods, practices, and staff skills.  Let the Data Scientists focus on identifying the next big idea and the heavy analytics; let the rest of the team deal with all of the other work.

via The Data Scientist Team.