Learning Data Science: Choose a Platform


DS-DecisionTree

One of the first questions I confronted when setting out to learn Data Science was what platform to use. As you begin to look at books and courses you realize that you’ll need a basic platform for working with data. Think of it as an IDE for data manipulation, statistics, and algorithms. For example, if you take Andrew Ng‘s popular Machine Learning course, you’ll be doing the exercises in Octave. If you take the machine learning course on Pluralsight, you be using ENCOG.

Data Scientists Love Them Some Python

Python is the most popular general purpose programming language in the machine learning world. I’m not a Python guy (yet), but you can start at SciPy and go from there.

Why I Chose R

I initially started working through Andrew Ng’s course, but I wasn’t sold on spending a lot of time learning Octave. I had a Data Mining book with all the exercises in Weka, but I wasn’t loving that idea either. I kept hearing about this statistics language called R. After some investigation, I found that the R language is nothing to write home about, but R Studio and the vast collection of available packages make R a great choice.

R Studio has been great to work in. The popular Coursera Data Science specialization is essentially an extended course in R. Azure ML Studio now supports the R language. The list goes on and is growing. The folks at Kaggle show the popularity of tools used by their competitors, with R as the clear winner…

Kaggle Tools

Bottom line… if you have an tool that makes sense for you, then use it. Otherwise, start with R.