storm文档(8)----配置文件说明

转载请注明出处:http://blog.csdn.net/beitiandijun/article/details/41547569

源地址:http://storm.apache.org/documentation/Configuration.html

storm由丰富的configure选项, 用来调整nibus、supervisor、以及运行时topologies的行为。某些配置选项是系统配置,例如topology基础配置,修改某个topology的这些选项有可能影响到所有topologies,而某些选项只是涉及到每个topology自身的配置,这就可以根据需要修改了。

每个配置选项在Storm代码库中的defaults.yaml文件中都有它的默认值。你可以通过定义Nimbus和supervisor的classpath下storm.yaml文件进行覆盖默认配置。最后, 可以定义topology-specific配置,这样你就可以使用StormSubmitter类提交topology时一块提交这个配置文件。然而,
topology-specific配置仅能覆盖前缀为“TOPOLOGY”的配置选项。

从Storm 0.7.0开始,你可以覆盖每一个bolt或者每个spout自己特有的配置选项。这些配置如下所示:

1、"topology.debug"

2、"topology.max.spout.pending"

3、"topology.max.task.parallelism"

4、"topology.kryo.register":这个选项和其他选项的作用方式有点不同, 因为序列对topology中的所有组件都是可用的。 更多信息可以查看序列化

Java API 允许你使用两种方式制定组件的特定配置选项:

1、 内部方式: 在任何spout或者bolt中覆盖getComponentConfiguration,然后返回component-specific配置映射。

2、 外部方式: TopologyBuilder类中setSpout方法和setBolt方法会返回带有addConfiguration以及addConfigurations方法的对象,这个对象可以用来覆盖组件的配置选项。

配置选项值的推荐顺序是: defaults.yaml < storm.yaml < topology specific configuration <internal component specific configuration < external component specificconfiguration

资料:

Config:所有配置选项的列表, 也是创建特定topology配置的帮助类

defaults.yaml:所有配置的默认值

配置storm集群:说明了如何创建和配置storm集群

在生产集群上运行topologies:列出对运行集群上topologies有用的配置

本地模式:列出对使用本地模式有用的配置

下面为defaults.yaml内容:

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

########### These all have default values as shown
########### Additional configuration goes into storm.yaml

java.library.path: "/usr/local/lib:/opt/local/lib:/usr/lib"

### storm.* configs are general configurations
# the local dir is where jars are kept
storm.local.dir: "storm-local"
storm.zookeeper.servers:
    - "localhost"
storm.zookeeper.port: 2181
storm.zookeeper.root: "/storm"
storm.zookeeper.session.timeout: 20000
storm.zookeeper.connection.timeout: 15000
storm.zookeeper.retry.times: 5
storm.zookeeper.retry.interval: 1000
storm.zookeeper.retry.intervalceiling.millis: 30000
storm.zookeeper.auth.user: null
storm.zookeeper.auth.password: null
storm.cluster.mode: "distributed" # can be distributed or local
storm.local.mode.zmq: false
storm.thrift.transport: "backtype.storm.security.auth.SimpleTransportPlugin"
storm.principal.tolocal: "backtype.storm.security.auth.DefaultPrincipalToLocal"
storm.group.mapping.service: "backtype.storm.security.auth.ShellBasedGroupsMapping"
storm.messaging.transport: "backtype.storm.messaging.netty.Context"
storm.nimbus.retry.times: 5
storm.nimbus.retry.interval.millis: 2000
storm.nimbus.retry.intervalceiling.millis: 60000
storm.auth.simple-white-list.users: []
storm.auth.simple-acl.users: []
storm.auth.simple-acl.users.commands: []
storm.auth.simple-acl.admins: []
storm.meta.serialization.delegate: "backtype.storm.serialization.DefaultSerializationDelegate"

### nimbus.* configs are for the master
nimbus.host: "localhost"
nimbus.thrift.port: 6627
nimbus.thrift.threads: 64
nimbus.thrift.max_buffer_size: 1048576
nimbus.childopts: "-Xmx1024m"
nimbus.task.timeout.secs: 30
nimbus.supervisor.timeout.secs: 60
nimbus.monitor.freq.secs: 10
nimbus.cleanup.inbox.freq.secs: 600
nimbus.inbox.jar.expiration.secs: 3600
nimbus.task.launch.secs: 120
nimbus.reassign: true
nimbus.file.copy.expiration.secs: 600
nimbus.topology.validator: "backtype.storm.nimbus.DefaultTopologyValidator"
nimbus.credential.renewers.freq.secs: 600

### ui.* configs are for the master
ui.port: 8080
ui.childopts: "-Xmx768m"
ui.actions.enabled: true
ui.filter: null
ui.filter.params: null
ui.users: null
ui.header.buffer.bytes: 4096
ui.http.creds.plugin: backtype.storm.security.auth.DefaultHttpCredentialsPlugin

logviewer.port: 8000
logviewer.childopts: "-Xmx128m"
logviewer.cleanup.age.mins: 10080
logviewer.appender.name: "A1"

logs.users: null

drpc.port: 3772
drpc.worker.threads: 64
drpc.max_buffer_size: 1048576
drpc.queue.size: 128
drpc.invocations.port: 3773
drpc.invocations.threads: 64
drpc.request.timeout.secs: 600
drpc.childopts: "-Xmx768m"
drpc.http.port: 3774
drpc.https.port: -1
drpc.https.keystore.password: ""
drpc.https.keystore.type: "JKS"
drpc.http.creds.plugin: backtype.storm.security.auth.DefaultHttpCredentialsPlugin
drpc.authorizer.acl.filename: "drpc-auth-acl.yaml"
drpc.authorizer.acl.strict: false

transactional.zookeeper.root: "/transactional"
transactional.zookeeper.servers: null
transactional.zookeeper.port: null

### supervisor.* configs are for node supervisors
# Define the amount of workers that can be run on this machine. Each worker is assigned a port to use for communication
supervisor.slots.ports:
    - 6700
    - 6701
    - 6702
    - 6703
supervisor.childopts: "-Xmx256m"
supervisor.run.worker.as.user: false
#how long supervisor will wait to ensure that a worker process is started
supervisor.worker.start.timeout.secs: 120
#how long between heartbeats until supervisor considers that worker dead and tries to restart it
supervisor.worker.timeout.secs: 30
#how frequently the supervisor checks on the status of the processes it's monitoring and restarts if necessary
supervisor.monitor.frequency.secs: 3
#how frequently the supervisor heartbeats to the cluster state (for nimbus)
supervisor.heartbeat.frequency.secs: 5
supervisor.enable: true
supervisor.supervisors: []
supervisor.supervisors.commands: []

### worker.* configs are for task workers
worker.childopts: "-Xmx768m"
worker.gc.childopts: ""
worker.heartbeat.frequency.secs: 1

# control how many worker receiver threads we need per worker
topology.worker.receiver.thread.count: 1

task.heartbeat.frequency.secs: 3
task.refresh.poll.secs: 10
task.credentials.poll.secs: 30

zmq.threads: 1
zmq.linger.millis: 5000
zmq.hwm: 0

storm.messaging.netty.server_worker_threads: 1
storm.messaging.netty.client_worker_threads: 1
storm.messaging.netty.buffer_size: 5242880 #5MB buffer
# Since nimbus.task.launch.secs and supervisor.worker.start.timeout.secs are 120, other workers should also wait at least that long before giving up on connecting to the other worker.
storm.messaging.netty.max_retries: 300
storm.messaging.netty.max_wait_ms: 1000
storm.messaging.netty.min_wait_ms: 100

# If the Netty messaging layer is busy(netty internal buffer not writable), the Netty client will try to batch message as more as possible up to the size of storm.messaging.netty.transfer.batch.size bytes, otherwise it will try to flush message as soon as possible to reduce latency.
storm.messaging.netty.transfer.batch.size: 262144

# We check with this interval that whether the Netty channel is writable and try to write pending messages if it is.
storm.messaging.netty.flush.check.interval.ms: 10

# By default, the Netty SASL authentication is set to false.  Users can override and set it true for a specific topology.
storm.messaging.netty.authentication: false

# default number of seconds group mapping service will cache user group
storm.group.mapping.service.cache.duration.secs: 120

### topology.* configs are for specific executing storms
topology.enable.message.timeouts: true
topology.debug: false
topology.workers: 1
topology.acker.executors: null
topology.tasks: null
# maximum amount of time a message has to complete before it's considered failed
topology.message.timeout.secs: 30
topology.multilang.serializer: "backtype.storm.multilang.JsonSerializer"
topology.skip.missing.kryo.registrations: false
topology.max.task.parallelism: null
topology.max.spout.pending: null
topology.state.synchronization.timeout.secs: 60
topology.stats.sample.rate: 0.05
topology.builtin.metrics.bucket.size.secs: 60
topology.fall.back.on.java.serialization: true
topology.worker.childopts: null
topology.executor.receive.buffer.size: 1024 #batched
topology.executor.send.buffer.size: 1024 #individual messages
topology.receiver.buffer.size: 8 # setting it too high causes a lot of problems (heartbeat thread gets starved, throughput plummets)
topology.transfer.buffer.size: 1024 # batched
topology.tick.tuple.freq.secs: null
topology.worker.shared.thread.pool.size: 4
topology.disruptor.wait.strategy: "com.lmax.disruptor.BlockingWaitStrategy"
topology.spout.wait.strategy: "backtype.storm.spout.SleepSpoutWaitStrategy"
topology.sleep.spout.wait.strategy.time.ms: 1
topology.error.throttle.interval.secs: 10
topology.max.error.report.per.interval: 5
topology.kryo.factory: "backtype.storm.serialization.DefaultKryoFactory"
topology.tuple.serializer: "backtype.storm.serialization.types.ListDelegateSerializer"
topology.trident.batch.emit.interval.millis: 500
topology.testing.always.try.serialize: false
topology.classpath: null
topology.environment: null

dev.zookeeper.path: "/tmp/dev-storm-zookeeper"
时间: 2024-10-06 23:36:16

storm文档(8)----配置文件说明的相关文章

storm 文档(3)----入门指导

转载请注明出处:http://blog.csdn.net/beitiandijun/article/details/41517897 源地址:http://storm.apache.org/documentation/Tutorial.html 本文主要讲述了如何创建Storm topologies以及如何将它们部署在Storm集群中.Java是主要使用的语言,但是依然使用少量Python例子证明了Storm的多语言特性. 初步配置: 本文使用的例子源自storm-start项目.建议你复制这个

storm文档(6)----storm手册目录

源地址:http://storm.apache.org/documentation/Documentation.html storm基础知识 l  Javadoc l  概念 l  配置 l  保证消息处理机制 l  容错性能 l  命令行客户端 l  理解storm topology并行机制 l  FAQ trident 对storm来说,trident是可选接口.它提供了准确的一次性处理.事务性数据存储保持以及一系列通用数据流分析操作. l  Trident指导-----基本概念及浏览 l 

storm文档(12)----自己搭建storm集群

转载请注明出处:http://blog.csdn.net/beitiandijun/article/details/41802543 ubuntu下  storm  安装步骤 安装storm之前首先需要安装一些依赖库: zookeeper.JDK 6.python2.6.6.jzmq.zeromq 这些库所需要的依赖库不再一一笔述. 以下为具体安装过程: 一.安装JDK zookeeper要求安装JDK 6或更高版本( 目前最新稳定版本为JDK8), 但是由于storm要求安装JDK 6, 因此

storm文档(11)----搭建storm集群

转载请注明出处:http://blog.csdn.net/beitiandijun/article/details/41684717 源地址:http://storm.apache.org/documentation/Setting-up-a-Storm-cluster.html 本文叙述了storm集群搭建和运行步骤.如果你打算在AWS上进行的话,可以使用storm-deploy项目.storm-deploy在EC2上完全自动进行下载.配置.以及storm集群的安装等步骤.它也为你配置了Gan

Storm文档详解

1.Storm基础概念 1.1.什么是storm? Apache Storm is a free and open source distributed realtime computation system. Storm是免费开源的分布式实时计算系统 实时和离线的区别: 1 离线计算:批量获取数据.批量传输数据.周期性批量计算数据.数据展示 代表技术:Sqoop批量导入数据.HDFS批量存储数据.MapReduce批量计算数据.Hive批量计算数据.***任务调度 2 流式计算:数据实时产生.

storm文档(10)----容错

源地址:http://storm.apache.org/documentation/Fault-tolerance.html 本文主要介绍Storm作为容错系统的设计细节. 当worker死掉时会发生什么? 当worker死掉时, supervisor将重启它. 如果worker启动总是失败,则worker就不能发送心跳消息给Nimbus, 那Nimbus就会重新在另一台machine上启动它. 当node死掉时会发生什么? 分配到这个节点的所有tasks都会超时,那Nimbus会将这些task

storm文档(5)----创建storm新项目

源地址:http://storm.apache.org/documentation/Creating-a-new-Storm-project.html 本文主要介绍如何配置开发的storm项目.步骤如下: 1.将storm jar包加到classpath中 2.如果使用多语言特性,将多语言实现的目录加到classpath中 下面跟着一块看一下在Eclipse环境中如何配置storm-starter项目. 将Storm jars包加到classpath中 你需要将storm jars包加到你的cl

storm文档(4)----开发环境环境搭建

转载请注明出处:http://blog.csdn.net/beitiandijun/article/details/41519053 源地址:http://storm.apache.org/documentation/Setting-up-development-environment.html 本文大体介绍了如何搭建Storm开发环境.总的来说,步骤如下: 1.下载storm release版本.解压缩,然后将解压缩版本的/bin目录放在你的环境变量PATH中. 2.远程集群上topologi

storm文档(9)----消息处理保证机制

转载请注明出处:http://blog.csdn.net/beitiandijun/article/details/41577125 源地址:http://storm.apache.org/documentation/Guaranteeing-message-processing.html Storm保证:每条离开spout的消息都可以得到"fullyprocessed".本文描述了storm如何实现这种保证以及你如何能够从Storm这种可靠性能力中受益. "fully pr