Data Analysis: It is not as hard as it may seem

Muhammad Maruf Sazed
3 min readMar 3, 2022

--

If you are an aspiring data scientist/analyst, and if you are wondering where to start, let me give you some encouragement. There are thousands of blogposts, videos, podcasts regarding which programming language that data analysts should learn. It seems that Python is the popular choice but there is a significant number of people who use R. More advanced users might suggest C++ to speed things up as both Python and R can be quite slow. There is also SAS. Some might say that SAS is irrelevant but that depends on the industry and the organization. And there are other frameworks and tools (SQL, PySpark, Excel, TensorFLow, PyTorch, Power BI, Tableau, etc.) out there that are necessary for different tasks. Obviously, this can be overwhelming for newcomers, but it does not have to be like that. I want to offer a slightly different perspective on how to become a good data analyst.

The most important skills of a data analyst is to analyze data (no wonder!), draw insights, and communicate findings (through visualization or other means). To do these tasks better,

⦿ one must be curious (ask a lot of questions)

⦿ understand the problem/context

⦿ be able to make sense of the patterns that is observed in the data

⦿ communication (I consider visualization a vehicle for communication)

Interestingly, all these things can be practiced rather easily with one tool quickly. Even in this age of big data and sophisticated tools, we must recognize the fact that excel is still the dominant tool in the data analysis world. There are plenty of jobs out there that do not require programming skills but do require strong excel skills. Excel is user friendly, easy to work with, and can do tons of things that are sufficient for most of the jobs. Now if we consider data analysis as a standalone skill (referring to the 4 points that I mentioned before), then one can acquire the skill using excel and then bring that skill to R and Python (or any other data analysis tool that they want to focus on).

There are other advantages too. In R and Python, for example, data frames are commonly used for data analysis tasks. There are other objects in both these languages but understanding and manipulating data frames can really help data analysts mastering R and Python for data analysis. Interestingly, one can think of data frames as a substitute of the excel tables inside those languages. So, understanding how to do data analysis in excel can help newcomers in the field to pick up the necessary skills.

Now, what if you are strong at analyzing data, and scratching your head as to how to expand your skills beyond excel. I just have one thing to say. Regardless of which programming language you learn first, you will always find the second language much easier to master. So, if you learn Python (Pandas and NumPy libraries) first, then you will find R much easier. Yes, there are differences between Python and R but as long as you develop good data analysis and programming skills in one language, you should be able to pick up the other language relatively easily. Similarly, SQL is without doubt one of the most important tools, but if you are good with SQL, you should find SAS easier to master and vice-versa. Even a lot of work in SAS could be done by SQL (proc SQL). My point is, it can be overwhelming at the beginning but if you are strategic about your learning, you should be able to master the necessary skills to become a good data analyst. Needless to say, discipline and patience are very important!

--

--