This post is a reblog of a post by Chenh Han Lee, the original can be seen at Udacity.
January 12, 2015
If you’re interested in a career in data, and you’re familiar with the set of skills you’ll need to master, you know that Python and R are two of the most popular languages for data analysis. If you’re not exactly sure which to start learning first, you’re reading the right article.
When it comes to data analysis, both Python and R are simple (and free) to install and relatively easy to get started with. If you’re a newcomer to the world of data science and don’t have experience in either language, or with programming in general, it makes sense to be unsure whether to learn R or Python first.
Luckily, you can’t really go wrong with either.
The Case for R
R has a long and trusted history and a robust supporting community in the data industry. Together, those facts mean that you can rely on online support from others in the field if you need assistance or have questions about using the language. Plus, there are plenty of publicly released packages, more than 5,000 in fact, that you can download to use in tandem with R to extend its capabilities to new heights. That makes R great for conducting complex exploratory data analysis. R also integrates well with other computer languages like C++, Java, and C.
When you need to do heavy statistical analysis or graphing, R’s your go-to. Common mathematical operations like matrix multiplication work straight out of the box, and the language’s array-oriented syntax makes it easier to translate from math to code, especially for someone with no or minimal programming background.
The Case for Python
Python is a general-purpose programming language that can pretty much do anything you need it to: data munging, data engineering, data wrangling, website scraping, web app building, and more. It’s simpler to master than R if you have previously learned an object-oriented programming language like Java or C++.
In addition, because Python is an object-oriented programming language, it’s easier to write large-scale, maintainable, and robust code with it than with R. Using Python, the prototype code that you write on your own computer can be used as production code if needed.
Although Python doesn’t have as comprehensive a set of packages and libraries available to data professionals as R, the combination of Python with tools like Pandas, Numpy, Scipy, Scikit-Learn, and Seaborn will get you pretty darn close. The language is also slowly becoming more useful for tasks like machine learning, and basic to intermediate statistical work (formerly just R’s domain).
Choosing Between Python and R
Here are a few guidelines for determining whether to begin your data language studies with Python or with R.
Choose the language to begin with based on your personal preference, on which comes more naturally to you, which is easier to grasp from the get-go. To give you a sense of what to expect, mathematicians and statisticians tend to prefer R, whereas computer scientists and software engineers tend to favor Python. The best news is that once you learn to program well in one language, it’s pretty easy to pick up others.
You can also make the Python vs. R call based on a project you know you’ll be working on in your data studies. If you’re working with data that’s been gathered and cleaned for you, and your main focus is the analysis of that data, go with R. If you have to work with dirty or jumbled data, or to scrape data from websites, files, or other data sources, you should start learning, or advancing your studies in, Python.
Once you have the basics of data analysis under your belt, another criterion for evaluating which language to further your skills in is what language your teammates are using. If you’re all literally speaking the same language, it’ll make collaboration—as well as learning from each other—much easier.
Jobs calling for skill in Python compared to R have increased similarly over the last few years.
That said, as you can see, Python has started to overtake R in data jobs. Thanks to the expansion of the Python ecosystem, tools for nearly every aspect of computing are readily available in the language. In addition, since Python can be used to develop web applications, it enables companies to employ crossover between Python developers and data science teams. That’s a major boon given the shortage of data experts in the current marketplace.
The Bottom Line
In general, you can’t err whether you choose to learn Python first or R first for data analysis. Each language has its pros and cons for different scenarios and tasks. In addition, there are actually libraries to use Python with R, and vice versa—so learning one won’t preclude you from being able to learn and use the other. Perhaps the best solution is to use the above guidelines to decide which of the two languages to begin with, then fortify your skill set by learning the other one.
Is your brain warmed up enough yet? Get to it!