Data Analytics is Fun! (or why everybody needs to know data)

data analytics

I know when I was struggling with math in school I would often ask the question ‘When will I ever use this’?

My frustrated math teacher would almost always respond with ‘Everything is math, everybody needs to know it’.’

Now, you can argue all day about the utility of learning number theory or other abstract mathematical concepts, but it’s hard to imagine surviving the modern age without at least basic arithmetic. In the same way the information age practically requires basic knowledge of computers. As businesses become increasingly more data driven analytics is quickly becoming a skill set that everybody needs to thrive in the modern workplace.

Everything in the modern age can be boiled down to data, and you could make the case that everybody needs to know data. Does everybody need to be an expert? No, in the same way that not everybody needs to take calculus, I can get by with basic algebra just fine thank you. But if you know just the basics of data analytics you will have a fantastic advantage in your workplace and future career.

But don’t take my word for it. Let’s take a look at the data supporting the explosive growth of data analytics.

  • In 2014 IDG found that 70% of enterprise companies had implemented or were planning to implement big data related projects and that number continues to grow
  • According to Linkedin the #2 skill recruiters will be looking for in 2016 is statistical analysis and data mining. Also included within the top 25 skills are Data Engineering, Algorithm Design, Database management, and Business Intelligence.
  • For 2016, payscale.com found that 36% of managers reported that new grads lacked Data Analysis skills. Additionally, knowledge of Hadoop or Big Data Analytics gave an average pay boost of ~11%.

These skills aren’t just for those with sexy titles like ‘data scientist’, these are skills that are increasingly becoming integral to the modern workplace. In the same spirit as learning math it’s often best to master the basics of data analytics to solve problems and to save theories and abstract concepts for the experts.

So where should you start?

My first exposure to data analytics started with a practical problem I had at a part time job in college. My task was to create a monthly report from a mismanaged tangle of Excel files. The task was tedious and time consuming and after several months of data wrangling I knew that there had to be a better way. I taught myself SQL and went straight to the database, saving myself immeasurable time and sanity.

The weirdest part of all, I found out *gasp* data analytics can be fun! Coding and programming aren’t this mystical rocket science only known to tech wizards, it’s a language that anybody can learn to make their life better.

Data1UP is a space to learn data analytics, with an emphasis on practical application rather than mastering multiple programming languages or becoming a full blown developer. Data1Up is a space to share how data is making your life better, or to share an awesome insight you’ve scraped from the massive amounts of data all around us. Who knows, you might even find that data analysis can be fun.

Getting Started with Data Analytics

So you want to step up your data crunching game? Maybe you’re looking to transition towards a career more focused on data analytics, maybe you just want to add data proficiency to your current role. Regardless, the hardest question is almost always: Where do I start?

The first step is to learn the tools and languages you need for advanced data analytics.

Over the past several years with the spread of ‘big data’ several platforms have come forward as widespread standards. Python and R have emerged as the primary scripting languages in data analytics, these languages do the bulk of cleaning and analyzing data. SQL remains the primary database management langauge. Simply put, SQL manages storing and retrieving from the database.

Now, there are thousands of arguments for and against these languages and everyone has their own bias (they’re lying if they tell you they don’t!). I’m going to focus on Python, R, and SQL. Python, and R are open source (aka free!), relatively easy to learn, have large active communities, and already support statistical and data analysis packages. Also, I’ll admit, they’re the languages that I’ve come to know and love. Let’s take a look at the capabilities of each.

SQL

If you’re working with data, you’re most likely using a relational database and SQL. There are many different forms of SQL, including MySQL, SQLite, and PostgreSQL, just to name a few. However, although each has their own flavor the basic structure and concepts between each are the basically the same.

So what can SQL do for you? SQL gives you the keys to the data kingdom. With SQL knowledge you should only have to bother your database architect once, to obtain read-only access. No more relying on coworkers to send you spreadsheets, when you can access the data directly yourself. With lightweight databases like SQLite you can easily store data on your own computer as well.

For those just starting out, SQL is the best language to learn first. SQL will help you wrap your head around retrieving data and data structure.

R vs. Python: Data Analytics Showdown

R and Python are both great languages for data analytics. Both languages have more or less the same capabilities, although each has it’s own flavor. Let’s size up each language for it’s strengths and weaknesses.

R Language16266
other-python-iconPYTHON
The R programming language, or simply ‘R’, is a language for statistical computing, in the past it’s primarily been used in academic circles.


Summary

 

Python is a general purpose, production ready language, this means that it can be easily integrated into workflows, environments, and web apps.


R was built specifically for statistical analysis, therefore it’s ready right out of the box to import and manipulate data.


Usage

 

Python’s is fast and easy to debug, however, in order to use Python for analytics several libraries are required to get started.


R has a steep learning curve at first, however, once you establish the basics advanced R is easier to grasp.


Learning Curve

 

Python’s readability and simplicity make it easy to pick up, but models for complex data analysis can be difficult with pythons current libraries.


Rstudio is the undisputed leading IDE for coding with R. Rstudio also comes with the Shiny framework for building interactive web apps.


IDE’s and Programs

 

There are several great IDE’s for python. The most popular are Spyder, PyCharm, and IPython notebook. Python makes  it easy to publish web apps


If you’re serious about shifting your career towards data analytics you will most likely need to know both of these languages.

Now for my two cents:

I first learned R and then I learned Python, and this is the route I would recommend for anybody with little to no programming experience who want to focus on data analytics rather than development. Before I took a deep dive into R I knew SQL, some HTML, and some CSS. R was nothing like any of these languages, and at first it was difficult. However, once I got a feel for some of the idiosyncrasies of the language building on my R knowledge became exponentially easier.

I decided to learn Python next as a means to host some of my data analysis through a web portal. To me python was easy to read, beautiful in it’s simplicity, it was no problem to pick up the basics. I ran into trouble with python in the way it read data. Data analytics is not the primary function of Python and libraries like Pandas and NumPY are required to read data tables in the way that R does naturally. Certain object types that I didn’t have to worry about with R were suddenly a pain with Python. Despite these hang ups, I found that Python libraries like Flask, Django, and Bokeh make displaying interactive data visualizations simple and powerful in python.

Despite my path with learning R first, I’ve found developers tend to favor Python. Ultimately it depends on where you’re starting from and what your goals are when deciding which language you should pursue first. If you’re a developer, you’ll probably like python. If you’re purely interested in data analysis and statistics, I recommend R. If you’re serious about becoming a data scientist it’s best to learn both! Regardless both languages will help you with data analytics and your future career.