I recently came across this programming language called R. I read that it was a good tool for statistical analysis. I like learning new things. Also, I do not know much about data analysis or stats outside the CFA curriculum, but this piqued my interest.
I’d like to know :
How useful this would be for financial analysis(Valuations, Market/portfolio analysis etc)?
Are there other things like this that could be picked up?
I do not intend to use this primarily for my Resume although i think it would be a nice small thing to have.
It is the default finance statistical analysis software in academia, and possibly in practice. So, it is useful to be conversant in this, in my opinion. However, I think the right attitude would be to learn statistical analysis techniques, and then practice them through R. The language itself is limited in usefulness without in depth knowledge of the applications.
I think it depends where you’re at in terms of the default software: I came across many more academics (in finance and economics) who used Stata than those who used R.
For the original question, I would recommend getting some introductory statistics textbooks or a good applied statistics textbook to actually learn methods. Once you use these to understand the methods (when and why you’re doing things, and that real analysis is far from a cookbook approach), an applied stats programming book would likely be helpful for whatever software you want to use.
Learn Python if you want to learn a language that is easier to learn,in demand and has various data analysis applications. Not as academia friendly as R in statistics departments but very popular nonetheless.
R has a more developed set of statistical libraries. Python has a lot of similar functionality but has an advantage that it can be used as a more general purpose language. I use both but am more comfortable in R because I’ve used it longer.
R caught on in the academic world because it was open source and free to use, whereas SAS and Matlab have expensive licenses, and Stata and SPSS licenses are hard to compete with free and can be a hassle to renew
When I first found R, the biggest hurdle was the documentation was poor, but that is much better now.
Python isn’t bad but got good data management and statistical libraries much later so is playing catch up and may not be able to displace R for pure statistical stuff. But if your statistics needs are just regressions and stuff like that, Python isn’t a bad idea because it can be repurposed for other jobs in the market if you like.
I like R a lot and my last employer choose it over Matlab. Free is a difficult price to compete with as long as the quality is there.
Yes, for general purpose programming, but by the time you are using it for statistical analysis, the amount you have to get up to speed on is pretty comparable.
Python is relatively easy to learn, but before you can do stats on it, you’re going to have to have a grip on numpy, scipy, pandas, and matplotlib work, and then you’re pretty much at the level of complexity of R.
When I did a PhD in finance I used (1) SAS for data preparation and (2) Stata for analysis. I think that is what most people in academia do. However, R also seems to be popular, but never used it myself.
Yes, but the visualizations you can do with pandas are really solid and for datamining it is pretty great. Edx has some courses that make it easy to get to speed. I think python has a better github community, but I do know a lot of people that use R. In my opinion, python is better to learn simply because you can do other things with it too. Download Anaconda and work in spyder, check out edx to get set up.
I did not Pandas had visualization funcitons, I was under the impression it was simply used for data preparation and seaborne and matplotlib had to be used for visual.
I mistyped. You need to import matplotlib (among other libraries, depending on what you’re doing), as you say. All of the data handling for analysis, manipulation is done within pandas. Depending what you are trying to do you can import different methods. For example for clustering you could import KMeans from sklearn.cluster.
Math/Stat/Econ departments like R/Matlab (STATA too). The workplace choice of weapon is EViews/SAS/STATA. Focus more on knowing the principles, than pigeon-holing yourself into learning one software. You’re a time-series regressions expert no matter which tool you choose for forecasting purposes.
I have been using Golang for everything in the past 9 months, great language, hIghly recommend it. The only major drawback is the fairly small community, but I think golang in finance will be a big thing in a year or 2
Where I work, from what I can tell, SAS dominates. But it is very expensive. R vs Python is an interesting quandry. They both seem to have animated advocates. I have just begun to play around with R via a Coursera course, but just as a curiosity to learn some statistics (and avoid studying L3!).
I’m more comfortable with R because I’ve used it a lot more, but I don’t think Python is a bad choice, especially if you are not doing anything unusually complex.
For complex things, it’s likely that R has a fuller set of libraries, but that may not matter as much a few years down the road.