Asking the RIGHT questions

Carla Gentry
Data Scientist Each time I talk to someone about analytics, I ask the same question: “What is your ultimate goal with this project?” Often it is to increase sales or reduce turnover. Of course, this isn’t usually what’s said; initially all I get is a panicked look that says “we can’t get what we want out of our database—it isn’t working right…how do we fix it??”Typically, there is nothing wrong with the database: the master and all its clones are just as they were designed to be, the variables are entered correctly and the reporting functions are pulling exactly what they were coded and designed to pull.So what’s the issue?

Example HR data — Every HRIS or ATS database contains different information (employee addresses, phone numbers, salaries, benefits, and the like), but how much of that information is connected? What I mean is: is there a unique quantifier that connects each table or database together? If I want to select all the people who might retire in the next 5 years out of a database, complete with demographic, sales and personal information, in order to create an organizational plan for this, can I accomplish this with my current database? By design relational databases are just that: “relational.” Therefore, everything should flow, if set up correctly in the very beginning.

Which brings us back to asking the right questions—which might look a bit like these:

  • What is it really that I want to be able to answer with my data collection?
  • Structured vs unstructured data, am I asking the right questions and giving them choices or offering a space to add comments (be careful of this)? Example a) agree b) disagree c( unsure —– Can I work with unstructured data?
  • Are you offering incentives to employees or candidates to “complete additional info” as to glean a more complete pictures of your customer? (5 dollar iTunes or Starbucks card)
  • Do I want to link social data to what I collect from employees? (3rd party sign in via Facebook, Twitter, Google+, etc)
  • Are you including IT in your business meetings?

A gap analysis usually reports on what is missing, but it doesn’t have to be this way (reactive and not proactive). If you ask the right questions initially, your design and results will reflect this. Know what you are trying to accomplish.

If you say “my database is broken,” but what you really mean is “I need to be able to sustain sales throughout the next five years; I’m concerned with increasing what we have in the pipeline,” well, ask for help. Make sure you have the correct data to indicate your sales reps selling habits, including seasonality. What data does your sales department have that can help you answer these and many other questions?

Do you have store performance, or other line of business performance data to help form a more three dimensional view of your candidates or employees? A data-centric view is so valuable, but unless you ask the right questions when collecting your data and setting up your database, you may end up trying to build a predictive model using only name, address and phone number! As Chief Engineer Scotty Montgomery of the USS Enterprise might say, “I can’t do it captain!”

Go well armed on your journey towards predictive analytics and remember to always ask the right questions!


The Data Scientist Team

The Data Scientist Team


I’ve been intrigued with all of the attention that the world of Data Science has received.  It seems that every popular business magazine has published several articles and it’s become a mainstream topic at most industry conferences. One of the things that struck me as odd is that there’s a group of folks that actually believe that all of the activities necessary to deliver new business discoveries with data science can be reasonably addressed by finding individuals that have a cornucopia of technical and business skills.  One popular belief is that a Data Scientist should be able to address all of the business and technical activities necessary to identify, qualify, prove, and explain a business idea with detailed data.

If you can find individuals that comprehend the peculiarities of source data extraction, have mastered data integration techniques, understand parallel algorithms to process tens of billions of records, have worked with specialized data preparation tools, and can debate your company’s business strategy and priorities – Cool!  Hire these folks and chain their leg to the desk as soon as possible.

If you can’t, you might consider building a team that can cover the various roles that are necessary to support a Data Science initiative. There’s a lot more to Data Science than simply processing a pile of data with the latest open source framework.  The roles that you should consider include:

Data Services

Manages the various data repositories that feed data to the analytics effort.  This includes understanding the schemas, tracking the data content, and making sure the platforms are maintained. Companies with existing data warehouses, data marts, or reporting systems typically have a group of folks focused on these activities (DBAs, administrators, etc.).

Data Engineer

Responsible for developing and implementing tools to gather, move, process, and manage data. In most analytics environments, these activities are handled by the data integration team.  In the world of Big Data or Data Science, this isn’t just ETL development for batch files; it also includes processing data streams and handling the cleansing and standardization of numerous structured and unstructured data sources.

Data Manager

Handles the traditional data management or source data stewardship role; the focus is supporting development access and manipulation of data content. This includes tracking the available data sources (internal and external), understanding the location and underlying details of specific attributes, and supporting developers’ code construction efforts.

Production Development

Responsible for packaging the Data Scientist discoveries into a production ready deliverable. This may include (one or) many components: new data attributes, new algorithms, a new data processing method, or an entirely new end-user tool. The goal is to ensure that the discoveries deliver business value.

Data Scientist

The team leader and the individual that excels at analyzing data to help a business gain a competitive edge. They are adept at technical activities and equally qualified to lead a business discussion as to the benefits of a new business strategy or approach. They can tackle all aspects of a problem and often lead the interdisciplinary team to construct an analytics solution.

There’s no shortage of success stories about the amazing data discoveries uncovered by Data Scientists.  In many of those companies, the Data Scientist didn’t have an incumbent data warehousing or analytics environment; they couldn’t pick up the phone to call a data architect, there wasn’t any metadata documentation, and their company didn’t have a standard set of data management tools.  They were on their own.  So, the Data Scientist became “chief cook and bottle washer” for everything that is big data and analytics.

Most companies today have institutionalized data analysis; there are multiple data warehouses, lots of dashboards, and even a query support desk.  And while there’s a big difference between desktop reporting and processing social media feedback, much of the “behind the scenes” data management and data integration work is the same.  If your company already has an incumbent data and analytics environment, it makes sense to leverage existing methods, practices, and staff skills.  Let the Data Scientists focus on identifying the next big idea and the heavy analytics; let the rest of the team deal with all of the other work.

via The Data Scientist Team.

How do you become a data scientist?

I  have read several articles on the subject, but none of the authors were really
“Data Scientist” and they admit that, so I thought it was time that something
was written by an actual Data Scientist.

First off, let’s make sure you understand that there’s lots of college
involved, no way around that one. If you noticed a lady in the 2nd row, 3rd from
the left had a mole on her nose in the last commercial you watched, you might
have what it takes, even if you hadn’t thought of Mathematics, Engineering or
Econometrics as a field of study. What I am implying is that it’s take someone
who is VERY observant to be successful in Data Science. Why, because you deal
with such large data sets and large outputs/results, your ability to absorb lots
of information quickly and exacting, is your best friend. I can scrolled a
million records in minutes or run a small SQL script, analyze the results and
tell you if that data is bad or corrupt in minutes. Cleansing data is always the
1st step, if this part if left out, I can guarantee you will have lots of N/A’s
or characters where number should be, etc… so make QA your friend not your

What major or course work produces the best Data Scientist? Econometrics and
Mathematics as long as they have an additional major in Business, why, because
of the logic involved as well as the classic theory of Left Brain people and
numbers. Creative is great for making power point presentations but when you
have 10 terabytes of raw data, pretty is not the 1st things on your mind. Minor
or actively engage in courses that will teach you programming, you don’t need
hard core Pearl but you will need SQL skills at the very least. Microsoft Visual
Studio, SSAS, SSIS, SSRS package, SAS, SPSS, SQL, Cognos, Macros, Visual Basic
are all not only good to know but vital when you have multiple client who use
different CRM, BI and ETL tools.

Once the schooling ends, the real world begins. My 1st boss said, “forget
everything you learned in College, there is no “bell curve” here; meaning,
statistics, programming, mathematics, logics and common sense are only the
start. Practice on cleansing data, extracting data, normalizing data, segmenting
data, loading data, trending data, modeling…. in other words data data data data
data. Never assume your results, never ignore anomalies, do keep a unbiased mind
and never scrimp on tools, software or classes. Yes, that’s right I still attend
webinars and read like crazy to stay sharp on my tools and technic.

We need more people desperately in Science, Technology, Engineering and
Mathematics (STEM) so please consider Data Science as a career. According to the
latest study we’re in high demand and considered rock stars according to
What can your data do you for? @Data_Nerd

I am not a Social Media expert but I play one on TV

I love that one liner from movies so I thought it appropriate for what I wanted to write this morning. There are lots of people out there playing Social Media experts, they promise followers, 1,000’s of them but how are they attaining them? Are they mining for followers from your niche or are they just searching for “follow-back” or others on Twitter who automatically accepts followers? What promises “on paper” did they make you? Did they guarantee your results? Did they guarantee to increase your ROI? NO? Well then, you are just wasting your money and you also stand the chance of these “Experts” or “Guru’s” damaging your brand or your name.

When I first branched off on my own, I envisioned I would be using analytics to increase market share, increase ROI, introduce new products to the market just, as I have been doing for the last 15 years for companies like Firestone, Kraft, Hershey…etc. I never dreamed that my best avenue for introducing and spreading the world of analytics would be Social Media. I had been using social media for years as a “tool” to help keep in touch so why not branch out and use it now.

Some Businesses “get analytics”, they see its importance not only in advertising and marketing but in social media as well. They use my service and others like me, to find their “target audience” or their niche. They depend on the research we provide to get the “Big” picture of what is really going on with their business or product. They know that by using statistics and data mining, they can make a change in their social media efforts. Unfortunately, a majority of the social media marketing community has decided to stick their heads in the sand and pretend analytics isn’t important or “can’t” be done (Oh yes, it can). We have made it easy entry into social media, placing in-experienced and un-professional GURU’s in as valid Marketers. We have been invaded with back seat driving experts who, because they have a Twitter account, think they can be “your” social media guru. Thanks to people like them we now have more spam than ever, you see the same “unique” page post from different URLs (hijacking ENTIRE websites) so the true author may see his or her post from multiple sites. The latest is promises of analysis as well, I’ve seen firms profess that they can handle your data and analysis as well but don’t have the first trained expert on site to achieve this. Gurus or experts focus on quantity over quality, and those results don’t get businesses anywhere. Websites offering 1000 Facebook or other social media fans with a low price are spam that are trying to enforce the already present notion that we need to have a significant social media following to get any kind of ROI.

In conclusion, be careful out there, social media is no different than any other aspect of life, there are good and bad people out there. Research the firm you plan on hiring and ask for references, protect your brand, your company and your reputation by only using professionals (ask for a Business License). If you need analytics assistance, make sure they have a scientific or mathematical background. If you need a social media expert get a “real” one and not just someone who plays one on TV. It’s your money, spend it wisely. If you need assistance in identifying best practices on how to use analysis in Social Media give me a call at 423-552-2062

Social Media for Beginners

Everyone’s on Twitter, Facebook, LinkedIn, and Foursquare these days, using it for pleasure as well as business. Marketing on Twitter is considered so NON-Tech that one of my potential clients called me back to say he had hired a High School student to do his Twitter
Marketing, I said OK, good luck with that….of course 3 weeks later he called back saying he was re-considering using a professional
service to do his marketing…. Hmmm, wonder what made him change his mind?

It only takes a few minute on any Social Media site to see why a High School student would NOT be the best candidate for Social
Media Marketing, there are 200 Million Tweets per day on Twitter (stats) and the United States has ~153 Million people on Facebook
and a Global Audience of 706,590,220 (stats)…. that’s a tremendous amount of streaming data and posts, in order to stand out, you
will need some knowledge on how to segment and target your audience. (i.e. How to Engage).

There are quite a few agencies out there that are willing to help but first let’s go over the basics of Social Media:

Twitter – 140 characters to get your message across, with url link address included in the count. What can you say in 140 characters,
how do I find people, what’s the relevance of re-tweeting and are links safe to click on?

* A lot can be conveyed in a short amount of characters, example…. July 21, 2011 article about Atlantis and NASA,
“Even Without Space Shuttles, US Spaceflight Lives On, Astronauts Say” 68 characters – and you got the message, right….
after adding the link
it adds another 85 characters so we have went over our 140 count by 13… CRAP… but wait, that is what tiny URL’s are for, Twitter
does it automatically for url’s but it may not shorten it enough…. therefore HootSuite, Social Oomph, and TweetDeck…etc are
great options… It will turn a 85 character URL into into 18 characters (HootSuite). So our entire message
about NASA is  87 characters {Even Without Space Shuttles, US Spaceflight Lives On, Astronauts Say}

* How do I find people on Twitter? There are several ways, you can find a friend by their email address or by their Twitter name
(example @data_nerd)… or if you are just browsing a certain interest, you can use a hashtag (#) in conjunction with a search term…
example #SocialMedia brings up all the conversations about SocialMedia (notice I didn’t leave a space between the word or else
twitter would have looked up Social and not Media).

*What’s the relevance of Re-Tweeting? That is 2 fold – 1st – Klout or a measurement of your influence (Klout) as well as PeerIndex
scores are increased when you engage with others (re-tweet), the higher your score the better chance someone will be listening when you post/tweet. 2nd – It shows respect and admiration for other people and gives them confidence in your ability to give others credit.

We all know the type that takes credit for everyone’s post, even Mashable’s Post and after you get to know them you will see that they are probably not worth you following them anyway. Posting, followers and following (what does all this mean??)
For the beginners I amincluding a wonderful link by John Jantsch called “Twitter for Business“.

*Use your own judgement in finding people to follow and be careful about GURUS, I never like anyone calling themselves that
because there is almost more to learn but you make your own call on that one. Another interesting article you should read is
Is This Pond Stocked? Big Fish, Little Fish, and Social Media” by Renee DeCoskey.

*Are links safe for me to click on? Usually but “Be Careful”, some people on Twitter do NOT have “your” best interest at heart, so
see how else is following them, check their Klout or PeerIndex scores or just watch their post for a day… Warning if someone has
5,200 followers and ZERO TWEETS, they are data mining for email addresses (avoid and or block these people)… How to Tweet Safely

*Facebook Business Page… A fully completed profile page speaks wonders to building your credibility and boosting your chance of
someone making a purchase from your business.

How to Grow Your Facebook Page Community


*What about Foursquare? It’s FREE and any Business can join – Wikipedia says it best so click on the hyper-link (bold – underlined)
Foursquare also read How to Use Foursquare for Small and Local Business by Jessica McLaughlin

Well, I guess that’s enough for now… this should get anyone that’s new to Social Media started and it didn’t cost you a
dime, enjoy!

@Data_Nerd (Twitter)

Data mining – Is it for everyone?

When I graduated from College, I spent years trying to convince everyone of the
wonders of data mining and Econometrics. Now it seems data mining is nothing
more than a buzz word, the Social Media gravy train that everyone is hopping on
board. Unfortunately for them and fortunately for me, analysis and data mining
are not for the faint of heart. Terabytes and yes, even Petabytes of data are no
joke and data CAN NOT be continuously updated without an ETL (extraction,
transformation and load) process of which, non one is talking about. Wait till
everyone find out how long it takes to load data and then to have it perfect is
an even bigger challenge (META DATA ROCKS). I feel confident that someone like
myself will always be in need but my concern at the moment is how much time will
be wasted while everyone figures out that data mining and analysis are not as
easy as you think, no matter what tools you have. Google Analytics is free but
not everyone is using it correctly, data mining will suffer the same pain….
Data will be lost forever because some novice user did something wrong in the
coding of the ETL or analysis process. But alas, all me and Analytical Solution
can do is wait for experience to win out over hype. Wish me luck!


Social Media

We all have seen the hundreds of post and blogs dedicated to Social Media and I think it’s wonderful but please keep in mind; it is time consuming to engage your audience and you must be sure you have the right potential “buyers mix” or you’ll never be more than an interesting tweet or post.

Social Media is a wonderful way to introduce potential consumers with your product and or services. But doing a little research 1st, will assist you in improving the future of your sales/engagement. There are many “Self Proclaimed Gurus” out there screaming that they can get you 100,000′s of follower, well that would be great if they were 100,000 followers that were interested in your product or service but that isn’t the case. What I want to convey is that by being behavior-selective, geo-targeting and product-targeting your audience you can increase your sales and ROI. So, what do I mean by that? If I am selling products for older customers then I need to find them, right? How? Data! If you have your own data great, email your Analytical department and have one of your analyst start working with your Social Media department. Your analyst (if experienced) will be able to data mine your internal transactional data and find “your target audience” to engage with through Social Media. What if you don’t have an “Analytical department” or a “Social Media department”? I would recommend you use someone similar to myself, or learn how to do it yourself. I would love to assist anyone out there interested in either hiring Analytical Solution as a Consulting Company and or a Trainer. Regardless if you are a CMO or a C staff member or just a Small to Medium Business Owner, analytics can greatly improve your ability to find and engage with the “right” customers.

An example of Twitters potential for engagement ( “On Twitter, we saw a 500 percent increase in Tweets from Japan as people reached out to friends, family and loved ones in the moments after the earthquake”). Amazing that within hours, millions of tweets and post were sent to a captivated audience begging for more information. Families were reunited, others avoided harm all together thanks to a well timed comment from Social Media.

T.V. and Radio are in AWE of the power of “engagement” and I think most of the world is too. Let’s use this wonderful tool but let’s use it Smartly, Responsibly and with Ethics. Remember that it’s your “Name” or “Company” that is being put out there for millions to see. Have fun, make someone laugh, tell them something interest and you’ll have a person ready to listen to more about what you sell or promote.

A couple of examples of how analytics are used:

‘Grocery Stores’
The stocking layout in a grocery store has been designed to increase the dollar amount you purchase. Cereal is not next to milk, so you have to walk past other tempting items to buy both. This was designed using the analysis of data from shopping receipts.

‘The Love Canal’
In short, a neighborhood was worried about a high number of birth defects among their children. A statistical analysis found that there were no statistical significant difference in the number of birth defects in the neighborhood, compared to similar neighborhoods.

A scientist took up the case and instead looked at the number of birth defects in houses built on top of a filled-in canal, compared to other houses in the neighborhood. The result jumped out at them and revealed that toxic waste had been dumped in the canals. Waste which was now causing birth defects

The morale: When you crunch the numbers and do a thorough data analysis that combine different data (health records and geographic info in this case), you can discover significant things right in your back yard! The data of your business can hide clues that you’d never discover in your day-to-day activities. At the same time, it shows that it’s important to listen and work with the guys on the floor, which sense and see things that others higher up can’t.