列一下Cloudera丧心病狂的CCP:DS认证大纲

Required Exams

· DS700 – Descriptive and Inferential Statistics on Big Data

· DS701 – Advanced Analytical Techniques on Big Data

· DS702 - Machine Learning at Scale

Each exam may be taken in any order. All three exams must be passed within 365 days of each other. Candidates who fail an exam must wait a period of thirty calendar days, beginning the day after the failed attempt, before they may retake the same exam. Candidates must pay for each exam attempt.

Each passed exam is verifiable in your exam transcript and history.

Each exam is a single challenge scenario. You are provided access to the scenario, the data sets, and the cluster. You are given eight (8) hours to complete the challenge.

Required Skills

Common Skills (all exams)

· Extract relevant features from a large dataset that may contain bad records, partial records, errors, or other forms of “noise”

· Extract features from a data stored in a wide range of possible formats, including JSON, XML, raw text logs, industry-specific encodings, and graph link data

DS700 - Descriptive and Inferential Statistics on Big Data

· Use statistical tests to determine confidence for a hypothesis

· Calculate common summary statistics, such as mean, variance, and counts

· Fit a distribution to a dataset and use that distribution to predict event likelihoods

· Perform complex statistical calculations on a large dataset

DS701 - Advanced Analytical Techniques on Big Data

· Build a model that contains relevant features from a large dataset

· Define relevant data groupings, including number, size, and characteristics

· Assign data records from a large dataset into a defined set of data groupings

· Evaluate goodness of fit for a given set of data groupings and a dataset

· Apply advanced analytical techniques, such as network graph analysis or outlier detection

DS702 - Machine Learning at Scale

· Build a model that contains relevant features from a large dataset

· Predict labels for an unlabeled dataset using a labeled dataset for reference

· Select a classification algorithm that is appropriate for the given dataset

· Tune algorithm metaparameters to maximize algorithm performance

· Use validation techniques to determine the successfulness of a given algorithm for the given dataset

Exam Delivery and Cluster Information

All CCP: Data Scientist exams are remote-proctored and available anywhere, anytime.

Exams are hands-on, practical exams using data science tools on Cloudera technologies. Each user is given their own 7-node, high-performance CDH5 (currently 5.3.2) cluster pre-loaded with Spark, Impala, Crunch, Hive, Pig, Sqoop, Kafka, Flume, Kite, Hue, Oozie, DataFu, and many others . In addition the cluster also comes with Python (2.6 and 3.4), Perl 5.10, Elephant Bird, Cascading 2.6, Brickhouse, Hive Swarm, Scala 2.11, Scalding, IDEA, Sublime, Eclipse, NetBeans, scikit-learn, octave, NumPy, SciPy, Anaconda, R, plyr, dplyrimpaladb, SparkML, vowpal wabbit, clouderML, oryx, impyla, CoreNLP, The Stanford Parser: A statistical parser, Stanford Log-linear Part-Of-Speech Tagger, Stanford Named Entity Recognizer (NER), Stanford Word Segmenter, opennlp, H2O, java-ml, RapidMiner, caffe, Weka, NLTK, matplotlib, ggplot, d3py, SparkingPandas, randomforest, R: ggplot2, Sparkling water.

Currently, the cluster is open to the internet and there are no restrictions on tools you can install or websites or resources you may use.

时间: 2024-10-06 00:10:04

列一下Cloudera丧心病狂的CCP:DS认证大纲的相关文章

Mongo DB Developer 认证 -- 大纲&目录

报名了5月10日的认证考试,打算准备之余把学习的笔记记录下来,所以计划就搞个Mongo系列作为博客首秀吧. 下面的思维导图是根据Mongo DB Developer 认证的考试大纲所绘制,主要是为了在学习中将各个主题细分,然后逐一去了解各个模块.Mongo DB 认证官网:https://university.mongodb.com/exam/DEVELOPER/about Mongo DB中的概念和思想(待更新): Mongo DB 的shell 和CRUD 操作(待更新): Mongo DB

新的Linux水平认证大纲(通用证书)颁布

当今,Linux操作系统是世界上应用最广泛的基础软件,拥有一本世界通用的Linux水平认证"证书"对于就业极端重要.也就是说,在中国境内拿了这本"证书",到国外就业也管用.此话当真?当然. 8月20日,<Linux基金会>正式颁布了新的Linux技术水平认证大纲,引起业内广泛关注(国内除外).此大纲与多年前制定的LPI认证大纲相比,大纲的考核内容更加适合当前的Linux技术发展现况. 在我们国内,Linux原本处于是非多多的状态,无所谓社会公认的Linu

使用wiwiz实现WiFi无线热点认证与计费网关

本文介绍如何利用Wiwiz HotSpot Builder系统在笔记本电脑上创建WiFi无线热点认证网关并实现计费管理. 随着WiFi与WLAN技术的普及,现在,架设一个无线热点变得越来越容易.一般,只需要购买一个普通的无线路由器,通过简单的设置SSID,WEP或WPA加密设置之后就可以组建一个无线局域网了.这种无线局域网在一般的家庭或小型企业中很常见.但是这种应用模式并不能满足商业应用或运营级的企业应用.作为商业应用的无线WiFi热点仅仅为用户提供简单的网络互连是不够的,还需要网络接入的认证机

spring项目篇5----shiro以及实现登陆认证

接下来做一下,用户的认证登陆以及权限验证,在这里使用shiro,首先来看一下shiro Apache Shiro是一个强大且易用的Java安全框架,执行身份验证.授权.密码和会话管理.使用Shiro的易于理解的API,您可以快速.轻松地获得任何应用程序,从最小的移动应用程序到最大的网络和企业应用程序. 主要功能 三个核心组件:Subject, SecurityManager 和 Realms. Subject:即“当前操作用户”.但是,在Shiro中,Subject这一概念并不仅仅指人,也可以是

[转]ISTQB FL初级认证考试资料(中文)

[转]ISTQB FL初级认证考试资料(中文) 2015-06-22 ISTQB作为一个专业的提供软件测试认证的机构,得到了全球软件测试人员的认可.目前中国有越来越多的人已经获得或者希望获得ISTQB的认证.本人作为ISTQB初级大纲中文版的译者之一,同时也参与了ISTQB初级认证教材<软件测试基础教程:第2版>的翻译. 本文主要包括ISTQB FL初级认证考试说明.ISTQB FL初级认证培训资料.ISTQB FL初级认证考点分析和模拟题,希望能够帮助广大考生更好的了解ISTQB FL初级认

Hive学习笔记【转载】

本文转载自:http://blog.csdn.net/haojun186/article/details/7977565 1.  HIVE结构 Hive 是建立在 Hadoop 上的数据仓库基础构架.它提供了一系列的工具,可以用来进行数据提取转化加载(ETL),这是一种可以存储.查询和分析存储在 Hadoop 中的大规模数据的机制.Hive 定义了简单的类 SQL 查询语言,称为 QL,它允许熟悉 SQL 的用户查询数据.同时,这个语言也允许熟悉 MapReduce 开发者的开发自定义的 map

Hadoop Hive基础sql语法

Hive 是基于Hadoop 构建的一套数据仓库分析系统,它提供了丰富的SQL查询方式来分析存储在Hadoop 分布式文件系统中的数据,可以将结构化的数据文件映射为一张数据库表,并提供完整的SQL查询功能,可以将SQL语句转换为MapReduce任务进行运行,通过自己的SQL 去查询分析需要的内容,这套SQL 简称Hive SQL,使不熟悉mapreduce 的用户很方便的利用SQL 语言查询,汇总,分析数据.而mapreduce开发人员可以把己写的mapper 和reducer 作为插件来支持

Hive入门到剖析(二)

5 Hive参数 hive.exec.max.created.files 说明:所有hive运行的map与reduce任务可以产生的文件的和 默认值:100000 hive.exec.dynamic.partition 说明:是否为自动分区 默认值:false hive.mapred.reduce.tasks.speculative.execution 说明:是否打开推测执行 默认值:true hive.input.format 说明:Hive默认的input format 默认值: org.a

Hive QL——深入浅出学Hive

第一部分:DDL DDL ?建表 ?删除表 ?修改表结构 ?创建/删除视图 ?创建数据库 ?显示命令 建表 CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name   [(col_name data_type [COMMENT col_comment], ...)]   [COMMENT table_comment]   [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]