Commonly used terms in Data and Analytics

General terms

Analytics as a Service (AaaS) The provision of analytics through Web-delivered technologies. These solutions offer businesses an alternative to developing internal hardware setups to perform business analytics.

Artificial Intelligence (AI) The theory and development of computer systems able to perform tasks normally requiring human intelligence, such as visual perception and speech recognition.

Big data Large data sets which cannot be analysed using standard analytical techniques. We normally evaluate big data across four techniques: volume, variety, velocity and veracity.

Business intelligence (BI) The set of techniques and tools for the transformation of raw data into meaningful and useful information for business analysis purposes.

Cloud computing A model for delivering information technology services in which resources are retrieved from the internet through Web-based tools and applications, rather than a direct connection to a server.

Data Analytics Data analytics is the collecting, organising and examining of large volumes of data with the aim of discovering useful insights, suggesting conclusions, and supporting decision-making.

Data as a Service (DaaS) Data that can be provided on demand to the user, regardless of geographic or organisational separation of provider and consumer.

Database A collection of information that is organised so that it can be easily accessed, managed, and updated.

Internet of Things The concept of connecting any device, or component of a device, to the internet (and/or to each other).

Visualisation A visual abstraction of data designed for the purpose of deriving meaning or communicating information more effectively.

Mergers & Acquisition (M&A) A general term that refers to consolidation of companies or assets.

Data types

Byte A unit of digital information that most commonly consists of eight bits.

  • 1 gigabyte = 1024 megabytes
  • 1 terabyte = 1024 gigabytes
  • 1 petabyte = 1024 terabytes
  • 1 exabyte = 1024 petabytes
  • 1 zettabyte = 1024 exabytes
  • 1 yottabyte = 1024 zettabytes

Open-source data Data that is freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control.

Proprietary data Data that is owned by an individual or organisation, which is deemed important enough so that it gives competitive advantage to that individual or organisation. This data can be protected under copyright laws or patents.

Semi-structured data Data that is not structured by a formal data model, but provides other means of describing the data and hierarchies.

Structured data Refers to data that is identifiable as it is organised in structure-like rows and columns. The data resides in fixed fields within a record or file or the data is tagged correctly and can be accurately identified.

Unstructured data Refers to information that does not have a predefined data model or is not organised in a predefined manner. Examples include emails, SMS, video, audio, PDFs and social media.

Data management concepts

Data governance The practice of organising and implementing policies, procedures and standards for the effective use of an organisation’s data.

Data integration Data integration involves combining data residing in different sources and providing users with a unified view of this data.

Data quality Represents the reliability and effectiveness of data to serve its purpose in a given context.

Data warehouse A central repository of integrated data, from one or more disparate sources, which stores current and historical data.

Extract, transform load (ETL) A process used in data warehousing to prepare data for use in reporting or analytics.

In-memory Data that is loaded into memory (Random Access Memory (RAM) or flash memory) instead of hard discs so IT resource spends less development time on data modelling, query analysis, cube building and table design.

Online analytical processing (OLAP) OLAP tools enable users to analyse multidimensional data interactively from multiple perspectives. OLAP consists of three basic analytical operations: consolidation, drill down,and slicing and dicing.

Online transaction processing (OLTP) Refers to a class of information systems that facilitate and manage transaction-oriented applications, typically for data entry and retrieval transaction processing.

Structured query language (SQL) A programming language for managing data held in a relational database management system.

Data management technology

Hadoop An open-source framework which is built to enable the process and storage of big data across a distributed file system. Essentially, it accomplishes two tasks: massive data storage and faster processing.

Microsoft SQL Server Microsoft SQL Server is a relational database management system whose primary function is to store and retrieve data as requested by other software applications.

MongoDB Built on an architecture of collections and documents, instead of using tables and rows as in relational databases. Documents comprise sets of key value pairs (KVPs) and are the basic unit of data in MongoDB.

Neo4 A type of database that uses graph structures for semantic queries with nodes, edges and properties to represent and store data.

SAP HANA SAP HANA is an in-memory, column oriented, relational database management system, developed and marketed by SAP.

Data visualization tools

Microsoft PowerBI A collection of online services and features that enables users to find and visualise data, share discoveries and collaborate in intuitive ways.

QlikView A tool that supports the creation of visualisations that effectively organise, communicate and share analysis with clients. It offers more advanced functionality when compared with Tableau.

Tableau A visualisation tool that supports the creation of dashboards and interactive visualisations that we use to effectively organise, communicate and share analysis with clients.

TIBCO Spotfire Analytics’ software designed for data exploration. It enables users to discover and depict critical insights in data.

Analytical approaches

A/B testing An experiment whereby two versions (A and B) are compared. They are identical except for one variation that might affect a user’s behaviour. Version A might be the currently used version (control), while Version B is modified in some respect (treatment).

Data discovery A business intelligence architecture which allows users to explore data for hidden patterns and trends. It focuses on dynamic, easy-to-use reports, whereas traditional business intelligence reports are static.

Descriptive analytics Summarises what happened in a given situation or scenario. Examples include number of posts, mentions, followers, page views, comments and likes.

Optimisation Finding an alternative with the most cost effective or highest achievable performance under the given constraints, by maximising desired factors and minimising undesired ones.

Predictive analytics Uses statistical functions on one or more data sets to predict trends or future events.

Prescriptive analytics Recommends one or more courses of action and shows the likely outcome of each decision.

Analytical techniques

Cluster analysis The task of grouping a set of objects in such a way that objects in the same group (cluster) are more similar, in some sense or another, to each other than to those in other groups (clusters).

Comparative analysis A step-by-step procedure of comparisons and calculations to detect patterns within very large data sets.

Decision tree analysis A decision support tool that uses a tree-like graph of decisions and their possible consequences including chance event outcomes, resource costs and utility.

Factor analysis Used to analyse large numbers of dependent variables to detect certain aspects of the independent variables (factors) affecting those dependent variables.

Machine learning A type of artificial intelligence which provides computers with the ability to learn without being explicitly programmed.

Multivariate analysis The observation and analysis of more than one statistical outcome variable at a time.

Regression analysis A statistical process for estimating relationships between a dependent variable and one or more independent variables.

Segmentation analysis Divides a broad category into subsets that have, or are perceived to have, common features, needs, interests or priorities.

Sentiment analysis The process of identifying and categorising opinions expressed in a piece of text to determine whether the writer’s attitude towards a topic or issue is positive, negative or neutral.

Simulation The imitation of the operation of a real world process or system over time. It requires a model that represents the key characteristics or behaviours of the selected physical or abstract system or process.

Time Series analysis Comprises methods for analysing time series data to extract meaningful statistics and other characteristics of the data.

Analytics technology

MATLAB An abbreviation of the words matrix and laboratory. It is a computing environment which allows matrix manipulations, plotting of functions and data and implementation of algorithms.

Python An open-source general purpose programming language that can be used for everything from building web applications and enterprise programs to performing analysis on large amounts of data.

R. A software environment for statistical computing and graphics. It provides a wide variety of statistical and graphical techniques, and is highly extensible.

River Logic A modelling and analytics platform that leverages diagnostic, predictive and prescriptive analytics to conduct what-if and optimisation analysis.

SAS A leader in advanced analytics and business intelligence software. It offers a range of data management and analytics solutions.

Simul8 SIMUL8 is a tool that allows users to create a computer simulation, which takes into account existing constraints, capacities and other factors affecting the total performance of production.

SPSS Statistical Package for Social Sciences (SPSS) is a software package used for statistical analysis.

STATA An abbreviation of the words statistics and data. It is a statistical\software package and its capabilities include data management, statistical analysis and regression analysis.

原文地址:https://www.cnblogs.com/adelaide/p/11386092.html

时间: 2024-10-08 10:18:26

Commonly used terms in Data and Analytics的相关文章

基于Data Lake Analytics的Serverless SQL大数据分析

摘要: TableStore(简称OTS)是阿里云的一款分布式表格系统,为用户提供schema-free的分布式表格服务.随着越来越多用户对OLAP有强烈的需求,我们提供在表格存储上接入Data Lake Analytics(简称DLA)服务的方式,提供一种快速的OLAP解决方案. 背景介绍TableStore(简称OTS)是阿里云的一款分布式表格系统,为用户提供schema-free的分布式表格服务.随着越来越多用户对OLAP有强烈的需求,我们提供在表格存储上接入Data Lake Analy

Data Lake Analytics的Geospatial分析函数

0. 简介 为满足部分客户在云上做Geometry数据的分析需求,阿里云Data Lake Analytics(以下简称:DLA)支持多种格式的地理空间数据处理函数,符合Open Geospatial Consortium's (OGC) OpenGIS规范,支持的常用数据格式包括: WKT WKB GeoJson ESRI Geometry Object Json ESRI Shape DLA采用4326坐标系标准,EPSG 4326使用经纬度坐标,属于地理坐标系.GPS采用的就是这个坐标系.

Data Lake Analytics,大数据的ETL神器!

0. Data Lake Analytics(简称DLA)介绍 数据湖(Data Lake)是时下大数据行业热门的概念:https://en.wikipedia.org/wiki/Data_lake.基于数据湖做分析,可以不用做任何ETL.数据搬迁等前置过程,实现跨各种异构数据源进行大数据关联分析,从而极大的节省成本和提升用户体验.关于Data Lake的概念. 终于,阿里云现在也有了自己的数据湖分析产品:https://www.aliyun.com/product/datalakeanalyt

Data Lake Analytics账号和权限体系详细介绍

一.Data Lake Analytics介绍数据湖(Data Lake)是时下大数据行业热门的概念:https://en.wikipedia.org/wiki/Data_lake.基于数据湖做分析,可以不用做任何ETL.数据搬迁等前置过程,实现跨各种异构数据源进行大数据关联分析,从而极大的节省成本和提升用户体验. 阿里云数据湖分析产品Data Lake Analytics(简称DLA):https://www.aliyun.com/product/datalakeanalytics产品文档:h

Toward Scalable Systems for Big Data Analytics: A Technology Tutorial (I - III)

ABSTRACT Recent technological advancement have led to a deluge of data from distinctive domains (e.g., health care and scientific sensors, user-generated data, Internet and financial companies, and supply chain systems) over the past two decades. The

Unsupervised Learning and Text Mining of Emotion Terms Using R

Unsupervised learning refers to data science approaches that involve learning without a prior knowledge about the classification of sample data. In Wikipedia, unsupervised learning has been described as "the task of inferring a function to describe h

【转】The most comprehensive Data Science learning plan for 2017

I joined Analytics Vidhya as an intern last summer. I had no clue what was in store for me. I had been following the blog for some time and liked the community, but did not know what to expect as an intern. The initial few days were good – all the in

Android开发训练之第五章第三节——Transferring Data Without Draining the Battery

Transferring Data Without Draining the Battery GET STARTED DEPENDENCIES AND PREREQUISITES Android 2.0 (API Level 5) or higher YOU SHOULD ALSO READ Optimizing Battery Life In this class you will learn to minimize the battery life impact of downloads a

【转载】Data Science at the Command Line

Data Science at the Command Line Data Science at the Command Line is a new book written by Jeroen Janssens. This website contains information about the upcoming workshop in London, the webcast from August 20th, instructions on how to install the Data