《5 Essential Skills Every Big Data Analyst Should Have》

Team Jigsaw

ELEMENTARY, MY DEAR WATSON.

Sherlock Holmes is the world’s greatest fictional detective. Though all that remains of 221/B Baker Street is ensconced in a bank in London, Sir Arthur Conan Doyle’s legacy remains alive and well today.

The most fascinating trait that Holmes possessed was that he was a master of deduction. He saw what everyone else saw, and then some, which allowed him to arrive at vastly different conclusions. H attributed his success to the following skills: observation, understanding of human behavior, knowledge of crime, constant experimentation and an ability to connect seemingly random pieces of information so that they formed a cohesive whole.

These five skills made Sherlock Holmes a legendary detective. He could handle any case and come out on top. He was approaches by kings and governments, princesses and paupers – and he was able to help them all.

To be a top-rung Big Data analyst, you need to have five very similar skills, which will be outlined in this article.

BIG DATA VS. DATA SCIENCE

Before we go further, it is important to differentiate between big data and data science. Big data is the application of specialized data science tools to huge data sets.

To learn more about this difference, make sure you read our post on Big Data vs Data Science vs Analytics.

THE IT ADVANTAGE

Having a strong information technology (IT) background is beneficial for a big data analyst. IT professionals are skilled at information handling and programming. This gives them a leg up on the competition.

Initially at least, IT professionals are better equipped to handle the programmatic and computational aspects of big data analytics than their non-IT peers, and can afford to skip the basics. They can also learn new tools such as BigSQL, Hive, Pig etc. with greater ease than non-IT professionals.

Non-IT professionals, however, shouldn’t worry too much. The path to becoming a good big data analyst is both an art and a science. Core skills of analysing, visualising and communicating data are not limited to IT professionals. It is challenging for everyone to derive the right insights and communicate them effectively. Your comfort with mathematics and statistics can level the playing ground.

In short, anybody can become a Big Data analyst. All they need to do is master the five essential skills every data analyst should know.

Essential big data skill #1: Programming

Learning how to code is an essential skill in the Big Data analyst’s arsenal. You need to code to conduct numerical and statistical analysis with massive data sets. Some of the languages you should invest time and money in learning are Python, R, Java, and C++ among others. The more you know, the better–just remember that you do not have to learn every single language out there.

As every IT professional can tell you, if you know one language well, you can easily pick up the rest. Hands on experience with these languages and programming will help in your learning effort. Finally, being able to think like a programmer will help you become a good big data analyst.

Tip: If you’re looking to start learning a programming language, start with Python.

Another important aspect of programming entails interacting with databases through queries and statements. Databases, instructional languages and big data tools should be a part of your repertoire. Tools such as R, HIVE, SQL, Scala, HIVE etc. are something that you should be comfortable with.

Essential big data skill #2: Quantitative Skills

As a big data analyst, programming helps you do what you need to do. But, what are you supposed to do?

The quantitative skills you need to be a good big data analyst answers this question. For starters, you need to know multivariable calculus and linear and matrix algebra. You will also need to know probability and statistics

By learning these skills, you will have a strong foundation in numerical analysis.

Numerical and statistical analysis are core quantitative skills that every good big data analyst needs. This knowledge enables the use of concepts such as neural networks and machine learning.

Essential big data skill #3: Multiple Technologies

Programming is an essential big data analysis skill. What makes it extra special, though, is the versatility. You can, and must, learn multiple technologies that will help you grow as a Big Data analyst.

But, technologies are not limited to programming alone. The range of technologies that a good big data analyst must be familiar with is huge. It spans myriad tools, platforms, hardware and software. For example, Microsoft Excel, SQL and R are basic tools. At the enterprise level, SPSS, Cognos, SAS, MATLAB are important to learn as are Python, Scala, Linux, Hadoop and HIVE.

The actual technologies that you use will depend upon the environment you are working in. It will also vary based on the requirements of your company and project.

The more technologies you are familiar with, the more versatile you will be.

Essential big data skill #4: Understanding of Business & Outcomes

Analysis of data and insights would be useless if it cannot be applied to a business setting. All big data analysts need to have a strong understanding of the business and domain they operate in.

Domain expertise can magnify the impact of the big data analyst’s insights.

Big data analysts can identify relevant opportunities and threats based on their business expertise. Consider the introduction of iPads. When they were introduced, the digital publishing industry was all set for disruption. But, outsiders could not realize the transformation that was possible. It took industry expertise and connections to usher in the era of digital publishing.

Domain expertise enables big data analysts to communicate effectively with different stakeholders. Consider recommending that new employees be added to a factory floor. When pitching it to the CFO it could be positioned as a net increase in top line margins. It may need to be repositioned as a reduction in quality test failures to the operations head. Domain expertise makes these conversations easier and more effective.

Essential big data skill #5: Interpretation of Data

Of all the skills we have outlined, interpretation of data is the outlier. It is the one skill that combines both art and science. It requires the precision and sterility of hard science and mathematics but also call for creativity, ingenuity, and curiosity.

In most companies, a large majority of employees don’t understand their own company’s data. In fact, most employees do not even have a clear idea of where all the data is. These employees often rely on preconfigured reports and dashboards to derive their insights. Unfortunately, this approach is dangerous. It does not provide a holistic view of the data procurement and analysis process.This problem is often compounded by the fragmentation of data systems. As companies grow inorganically, different data silos merge, resulting in a confusing mess.

However, by asking the right questions, a Big Data analyst can embark on a proper exploration of the raw data. The right questions and discoveries can change the course of business for an organization.

In Conclusion

Becoming a big data analyst requires the mastery of the five essential skills. IT professionals have an advantage in learning new programming languages and technologies. Others will need to put in more effort to learn computing skills and technologies. But, softer skills such as business experience and domain expertise level the playing ground.

原文地址:https://www.cnblogs.com/3OOO/p/10005214.html

时间: 2024-10-09 06:04:57

《5 Essential Skills Every Big Data Analyst Should Have》的相关文章

论文笔记《The Impact of Imbalanced Training Data for CNN》

原文是:<The Impact of Imbalanced Training Data for Convolutional Neural Networks> 本博客是该论文的阅读笔记,不免有很多细节不对之处. 还望各位看官能够见谅,欢迎批评指正. 更多相关博客请猛戳:http://blog.csdn.net/cyh_24 如需转载,请附上本文链接:http://blog.csdn.net/cyh_24/article/details/49871387 Abstract 本文主要研究使用不平衡数

数据分析师(Data Analyst),数据工程师(Data Engineer),数据科学家(Data Scientist)的区别

数据分析师(Data Analyst):负责从数据中提取出有用的信息,以帮助公司形成业务决策.工作内容包括:对数据进行提取,清洗,分析(用描述统计量,趋势分析,多维度分析,假设检验等统计常用方法对数据进行分析),总结结论并提出建议.数据分析师通常从业务团队那里获取需要分析的具体问题,并提供相应的解决方案.高级数据分析师还需要了解各种模型(如线性回归,决策树等),并能调包实现这些模型. 需要掌握的技能有:熟悉业务,会使用excel,ppt等基本工具,了解统计分析方法,会使用SQL从数据库提取数据,

Data Analyst Exercise

Data Analyst ExercisePlease be prepared to spend 45-60 minutes on this exercise. You will need access acomputer with internet connection and a SQL workbench of your choice. Thedatabase is Postgres. Please email back your answers and the amount of tim

《Sams Teach Yourself Windows? Workflow Foundation in 24 Hours》读书笔记目录

目录 1 Part I - The Basics 1.1 Hour 1 - Understanding Windows Workflow Foundation 1.2 Hour 2 - A Spin Around Windows Workflow Foundation 1.3 Hour 3 - Learning Basic Hosting 1.4 Hour 4 - Learning Host-Workflow Data Exchange 1.5 Hour 5 - Creating an Esca

《数字图像处理原理与实践(MATLAB版)》一书之代码Part9

本文系<数字图像处理原理与实践(MATLAB版)>一书之代码系列的Part9,辑录该书第431至第438页之代码,供有需要读者下载研究使用.至此全书代码发布已经接近尾声,希望这些源码能够对有需要的读者有所帮助.代码执行结果请参见原书配图,建议下载代码前阅读下文: 关于<数字图像处理原理与实践(MATLAB版)>一书代码发布的说明 http://blog.csdn.net/baimafujinji/article/details/40987807 首先给出的是原书P438所列之程序源

《数字图像处理原理与实践(MATLAB版)》一书之代码Part8

本文系<数字图像处理原理与实践(MATLAB版)>一书之代码系列的Part8,辑录该书第375至第415页之代码,供有需要读者下载研究使用.至此全书代码发布已经接近尾声,希望这些源码能够对有需要的读者有所帮助.代码执行结果请参见原书配图,建议下载代码前阅读下文: 关于<数字图像处理原理与实践(MATLAB版)>一书代码发布的说明 http://blog.csdn.net/baimafujinji/article/details/40987807 P385-1 function y

个人阅读作业2—《No Silver Bullet: Essence and Accidents of Software Engineering》读后感

在进行了一次结对编程.一次团队编程和一次个人编程项目后,读了<No Silver Bullet: Essence and Accidents of Software Engineering>,在此说说自己的感想体会.在团队编程中我们遇到了很多个人.结对编程时没有遇到的问题. Of all the monsters that fill the nightmares of our folklore, none terrify more than werewolves, because they t

CSDN日报20170224——《程序员该用哪种姿势来理财》

[程序人生] 程序员该用哪种姿势来理财 作者:纯洁的虫子 其实一直想写一篇文章名字都想好了,叫做"程序员该不该理财?".后来想了想,该不该这个就不用想了,必须要理财! 那么市面上那么多理财的方式对于我们屌丝的程序员该如何选择呢? 其实我也是那种土的掉渣的那种类型,以前几乎没有想过神马理财的,一来呢毕业的时候工资全都不够花的还理个毛线,二来总是感觉理财好像都是有钱人搞的东西:后来偶然进入了互联网金融行业,呆了几年,慢慢也接触了很多理财方式,但也还是一个门外汉,此文就是和大家一起聊聊我们程

《Linux设备驱动开发详解(第3版)》海量更新总结

本博实时更新<Linux设备驱动开发详解(第3版)>的最新进展. 2015.2.26 几乎完成初稿. [F]是修正或升级:[N]是新增知识点:[D]是删除的内容 第1章 <Linux设备驱动概述及开发环境构建>[D]删除关于LDD6410开发板的介绍[F]更新新的Ubuntu虚拟机[N]添加关于QEMU模拟vexpress板的描述 第2章 <驱动设计的硬件基础> [N]增加关于SoC的介绍:[N]增加关于eFuse的内容:[D]删除ISA总线的内容了:[N]增加关于SP