运维系列：07、spark-submit

bin/spark-submit --help

Usage: spark-submit [options] <app jar | python file> [app options]

Options:

--master MASTER_URL spark://host:port, mesos://host:port, yarn, or local.

--deploy-mode DEPLOY_MODE Whether to launch the driver program locally ("client") or

on one of the worker machines inside the cluster ("cluster")

(Default: client).

--class CLASS_NAME Your application‘s main class (for Java / Scala apps).

--name NAME A name of your application.

--jars JARS Comma-separated list of local jars to include on the driver

and executor classpaths.

--py-files PY_FILES Comma-separated list of .zip, .egg, or .py files to place

on the PYTHONPATH for Python apps.

--files FILES Comma-separated list of files to be placed in the working

directory of each executor.

--properties-file FILE Path to a file from which to load extra properties. If not

specified, this will look for conf/spark-defaults.conf.

--driver-memory MEM Memory for driver (e.g. 1000M, 2G) (Default: 512M).

--driver-java-options Extra Java options to pass to the driver.bin

--driver-library-path Extra library path entries to pass to the driver.

--driver-class-path Extra class path entries to pass to the driver. Note that

jars added with --jars are automatically included in the

classpath.

--executor-memory MEM Memory per executor (e.g. 1000M, 2G) (Default: 1G).

--help, -h Show this help message and exit

--verbose, -v Print additional debug output

Spark standalone with cluster deploy mode only:

--driver-cores NUM Cores for driver (Default: 1).

--supervise If given, restarts the driver on failure.

Spark standalone and Mesos only:

--total-executor-cores NUM Total cores for all executors.

YARN-only:

--executor-cores NUM Number of cores per executor (Default: 1).

--queue QUEUE_NAME The YARN queue to submit to (Default: "default").

--num-executors NUM Number of executors to launch (Default: 2).

--archives ARCHIVES Comma separated list of archives to be extracted into the

working directory of each executor.

样例：

YARN：

./bin/spark-submit \

--class org.apache.spark.examples.SparkPI \

--master yarn-cluster \

--num-executors 3 \

--driver-memory 4g \

--executor-memory 2g \

--executor-cores 1 \

lib/spark-examples*.jar \

10

注解：

时间： 2024-10-22 08:22:06

运维系列：07、spark-submit的相关文章

自动化运维系列之Ansible的简介与安装【持续更新···】

自动化运维系列之Ansible的简介与安装自动化运维工具简介由于互联网的快速发展导致产品更新换代的速度逐渐加快,这就导致运维人员的日常工作会大大增加,如果还是按照传统方式进行维护工作会使工作效率低下.此时,就需要部署自动化运维了,自动化运维会尽可能安全.高效的完成运维人员的日常工作. 自动化运维工具划分为两类:一类是需要使用代理工具的,也就是基于专用的Agent程序来完成管理功能,如:Puppet.Func.Zabbix等:另一类是不需要配置代理工具的,可以直接基于SSH服务来完成管理功能,

自动化运维系列之Ansible命令应用基础(模块的应用)【持续更新中···】

自动化运维系列之Ansible命令应用基础(模块的应用) 模块简介 Ansible可以使用命令行方式进行自动化管理,基本语法如下: ansible <host-pattern> [-m module_name] [-a args] <host-pattern> 对哪些主机生效 [-m module_name] 需要使用的模块 [-a args] 模块特有的参数,这里在使用时需加单引号哦! Ansible的命令行管理工具都是由一系列模块.参数所支持的,可以在命令行后加上-h或--he

自动化运维系列之Ansible的YAML、基础元素介绍

自动化运维系列之Ansible的YAML.基础元素介绍 YAML简介 YAML是一种用来表达资料序列的格式.YAML是YAML Ain't Markup Lanaguage的缩写,即YAML不是XML. 特点 1.具有很好的可读性,易于实现: 2.表达能力强,扩展性好: 3.和脚本语言的交互性好: 4.有一个一致的信息模型: 5.可以基于流来处理. YAML语法 YAML的语法和其他语言类似,也可以表达散列表.标量等数据结构. YAML结构通过空格来展示:序列里的项用"-"来代表:Ma

自动化运维系列之SaltStack批量部署Apache服务

自动化运维系列之SaltStack批量部署Apache服务 saltstack原理 SalStack由master和minion构成,master是服务端,表示一台服务器:minion是客户端,表示多台服务器.在Master上发送命令给符合条件的minion,Minion就会执行相应的命令.Master和Minion之间是通过ZeroMQ(消息队列)进行通信的. SaltStack的Master端的监听端口是4505和4506,4505端口是Master和Minion认证通信端口:4506端口是

Python自动化运维系列之Django Form表单验证

Form表单验证 Django核心功能组件之一,虽然也可以在前端使用JS对表单验证, 但是Django中已经为我们准备好的Form功能还算是很强大的,有时候比较适合运维,为我们简化了很多前端开发工作. Django最主要的几个功能有4个 · 生成HTML标签 · 验证数据(表单错误信息提示) · HTML 表单保留上次提交数据 · 初始化页面表单内容 Django的Form内容还是挺多的,我们可以从一个简单的登陆验证例子来看看Form的基本功能使用 1)新

Azure运维系列 3：善用Azure捕获功能事半功倍

在使用虚拟机的过程中,肯定会使用到虚拟机模板,从而简化我们的日常操作.如果没有虚拟机模板可能需要花费很多的时间来创建多个相同环境的虚拟机,所以在使用Azure的时候我们会发现Azure本身已经有不少的虚拟机操作系统版本和日期进行选择,但这最多也只是简单的包含操作系统和补丁更新,并没有包含我们需要的应用环境. 那么既然Azure并没有我们需要的应用环境,那我们可以通过捕获自定义映像的方式,来创建我们自己适用的虚拟机模板映像.首先,我们需要在虚拟机在搭建好我们需要的环境,然后通过运行sysprep来

运维系列：01、Spark编译与打包

1.SBT编译 vi project/SparkBuild.scala 修改仓库: "Maven Repository" at "http://172.16.219.120:8080/nexus/content/groups/public/" SPARK_HADOOP_VERSION=2.4.0.2.1.2.0-402 SPARK_YARN=true sbt/sbt clean assembly 2.MAVEN编译 export MAVEN_OPTS=&qu

运维系列：06、Spark调优

1.垃圾回收在conf/spark-env.sh中添加 SPARK_JAVA_OPTS=-verberos:gc -XX;+PrintGCDetails -XX:+PrintGCTimeStamps 如果发现集群耗费过多时间在垃圾回收上,可以通过spark.storage.memoryFaction调低RDD缓存的使用,这个值的默认值是0.66. 如果要运行的是耗时很久的Spark作业,可以通过设定spark.cleaner.ttl为一个非零值n,表示每隔n秒清理一次元数据.默认Spark不会

运维系列：04、Spark Standalone运行

安装配置 hosts配置: 用户: useradd spark 1.SSH无密码登录 2.JDK 3.安装Scala 2.10.4 4.安装配置Spark 4.1.解压与权限 chown -R spark:spark /opt/spark 4.2.配置worker vi conf/slaves ES122 ES123 ES124 4.3.配置spark-env.sh cp conf/spark-env.sh.template conf/spark-env.sh vi conf/spark-env