Riding with the Stars: Passenger Privacy in the NYC Taxicab Dataset

Hmmm, interesting -> Applying Differential Privacy
So, we’re at a point now where we can agree this data should not have been released in its current form. But this data has been collected, and there is a lot of value in it – ask any urban planner. It would be a shame if it was withheld entirely.

In my previous post, Differential Privacy: The Basics, I provided an introduction to differential privacy by exploring its definition and discussing its relevance in the broader context of public data release. In this post, I shall demonstrate how easily privacy can be breached and then counter this by showing how differential privacy can protect against this attack. I will also present a few other examples of differentially private queries.

The Data

There has been a lot of online comment recently about a dataset released by the New York City Taxi and Limousine Commission. It contains details about every taxi ride (yellow cabs) in New York in 2013, including the pickup and drop off times, locations, fare and tip amounts, as well as anonymized (hashed) versions of the taxi’s license and medallion numbers. It was obtained via a FOIL (Freedom of Information Law) request earlier this year and has been making waves in the…

View original post 2,314 more words


Hands on with Watson Analytics: Pretty useful when it’s working

If I have one big complaint about Watson Analytics, it’s that it’s still a bit buggy — the tool to download charts as images doesn’t seem to work, for example, and I had to reload multiple pages because of server errors. I’d be pretty upset if I were using the paid version, which allows for more storage and larger files, and experienced the same issues. Adding variables to a view without starting over could be easier, too.


Last month, [company]IBM[/company] made available the beta version of its Watson Analytics data analysis service, an offering first announced in September. It’s one of IBM’s only recent forays into anything resembling consumer software, and it’s supposed to make it easy for anyone to analyze data, relying on natural language processing (thus the Watson branding) to drive the query experience.

When the servers running Watson Analytics are working, it actually delivers on that goal.

Analytic power to the people

Because I was impressed that IBM decided to a cloud service using the freemium business model — and carrying the Watson branding, no less — I wanted to see firsthand how well Watson Analytics works. So I uploaded a CSV file including data from Crunchbase on all companies categorized as “big data,” and I got to work.

Seems like a good starting point.

watson14Choose one and get results. The little icon in…

View original post 433 more words

The Four Horsemen Of The Cyber Apocalypse

These “Four Horsemen” point us to the components we can expect to see used by hackers in 2015: exploits in unpatchable systems; recycled malware hidden imperceptibly; and human error. Studying these harbingers could very well save us from a potential cyber catastrophe

How big data got its mojo back

Big data never really went anywhere, but as a business, it did get a little boring over the past couple years.


Big data never really went anywhere, but as a business, it did get a little boring over the past couple years.

Big data technologies (and not just Hadoop) proved harder to deploy, harder to use and were a lot more limited in scope than all the hype suggested. Machine learning became the new black as startups infused it into everything, but most often marketing and sales software. So much ink and breath were wasted trying to define (or disprove) the idea of data science, probably because the tools of the trade were still so foreign to most people.

But while the early days of the big data movement hinted at greatness, it’s probably fair to say they didn’t deliver — even if the resulting tools were very useful and very necessary to set the stage for things to come. And, realistically, many companies still haven’t adopted these technologies or these techniques.


Things are changing…

View original post 400 more words