【Druid】Druid读取Kafka数据的简单配置过程

Druid的单机版安装参考：https://blog.51cto.com/10120275/2429912

Druid实时接入Kafka的过程

下载、安装、启动kafka过程：

wget http://mirrors.tuna.tsinghua.edu.cn/apache/kafka/2.2.1/kafka_2.11-2.2.1.tgz
tar -zxvf kafka_2.11-2.2.1.tgz
ln -s kafka_2.11-2.2.1 kafka
$KAFKA_HOME/kafka-server-start.sh ~/kafka/config/server.properties 1>/dev/null 2>&1 &

创建topic ： wikipedia
./bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic wikipedia

解压wikiticker-2015-09-12-sampled.json.gz文件，这个步骤是给kafka topic准备输入文件

cd $DRUID_HOME/quickstart/tutorial
gunzip -k wikiticker-2015-09-12-sampled.json.gz

这个步骤操作完成后，在$DRUID_HOME/quickstart/tutorial文件夹下生成wikiticker-2015-09-12-sampled.json

上图配置文件如下，其中bootstrap.servers配置kafka地址

{
  "type": "kafka",
  "dataSchema": {
    "dataSource": "wikipedia",
    "parser": {
      "type": "string",
      "parseSpec": {
        "format": "json",
        "timestampSpec": {
          "column": "time",
          "format": "auto"
        },
        "dimensionsSpec": {
          "dimensions": [
            "channel",
            "cityName",
            "comment",
            "countryIsoCode",
            "countryName",
            "isAnonymous",
            "isMinor",
            "isNew",
            "isRobot",
            "isUnpatrolled",
            "metroCode",
            "namespace",
            "page",
            "regionIsoCode",
            "regionName",
            "user",
            { "name": "added", "type": "long" },
            { "name": "deleted", "type": "long" },
            { "name": "delta", "type": "long" }
          ]
        }
      }
    },
    "metricsSpec" : [],
    "granularitySpec": {
      "type": "uniform",
      "segmentGranularity": "DAY",
      "queryGranularity": "NONE",
      "rollup": false
    }
  },
  "tuningConfig": {
    "type": "kafka",
    "reportParseExceptions": false
  },
  "ioConfig": {
    "topic": "wikipedia",
    "replicas": 2,
    "taskDuration": "PT10M",
    "completionTimeout": "PT20M",
    "consumerProperties": {
      "bootstrap.servers": "localhost:9092"
    }
  }
}

接下来要将wikiticker-2015-09-12-sampled.json文件内容，利用kafka生产者脚本写入wikipedia的topic中

export KAFKA_OPTS="-Dfile.encoding=UTF-8"
./bin/kafka-console-producer.sh --broker-list localhost:9092 --topic wikipedia < {PATH_TO_DRUID}/quickstart/tutorial/wikiticker-2015-09-12-sampled.json

原文地址：https://blog.51cto.com/10120275/2430043

时间： 2024-10-14 15:38:10

【Druid】Druid读取Kafka数据的简单配置过程的相关文章

flume 读取kafka 数据

本文介绍flume读取kafka数据的方法代码: /******************************************************************************* * Licensed to the Apache Software Foundation (ASF) under one * or more contributor license agreements. See the NOTICE file * distributed with

Spark Streaming 读取 Kafka 数据的两种方式

在Spark1.3之前,默认的Spark接收Kafka数据的方式是基于Receiver的,在这之后的版本里,推出了Direct Approach,现在整理一下两种方式的异同. 1. Receiver-based Approach val kafkaStream = KafkaUtils.createDstream(ssc, [zk], [consumer group id], [per-topic,partitions] ) 2. Direct Approach (No Receivers) v

Linux rhel 6.4 apache编译安装以及简单配置过程(2)

注:以下摘取的都是安装过程中执行的命令,命令反馈没有贴出来以"......"代替.观看的时候注意执行命令时所在的目录. 将apache的科执行程序软连接到/usr/local/bin下(可执行命令放到$PATH包含的路径,方便执行apache的命令) [[email protected] init.d]# ln -s /usr/local/apache/bin/* /usr/local/bin 将httpd加入到chkconfig中 service的管理命令都是在/etc/init.d

SparkStreaming python 读取kafka数据将结果输出到单个指定本地文件

# -*- coding: UTF-8 -*- #!/bin/env python3 # filename readFromKafkaStreamingGetLocation.py import IP from pyspark import SparkContext from pyspark.streaming import StreamingContext from pyspark.streaming.kafka import KafkaUtils import datetime class

Linux rhel 6.4 apache编译安装以及简单配置过程(1)

Linux rhel 6.4 编译安装apache过程(1) 注:以下摘取的都是安装过程中执行的命令,命令反馈没有贴出来以"......"代替.观看的时候注意执行命令时所在的目录. 安装平台 [[email protected] ~]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 6.4 (Santiago) 需要的工具版本以及下载地址 1.httpd-2.4.25.tar.gz (http://ht

版本管理之Git(二)：Win7上Git安装及简单配置过程

一.安装包 msysgit(Windows版本的Git) 下载地址:http://code.google.com/p/msysgit/downloads/list?q=full+installer+official+git 我下载了红色圈中的Git TortoiseGit 下载地址:http://code.google.com/p/tortoisegit/wiki/Download 我下载了红色圈中的TortoiseGit 二.安装说明为什么安装Git? Git 是 Linux Torvald

sparkStreaming读取kafka的两种方式

概述 Spark Streaming 支持多种实时输入源数据的读取,其中包括Kafka.flume.socket流等等.除了Kafka以外的实时输入源,由于我们的业务场景没有涉及,在此将不会讨论.本篇文章主要着眼于我们目前的业务场景,只关注Spark Streaming读取Kafka数据的方式. Spark Streaming 官方提供了两种方式读取Kafka数据: 一是Receiver-based Approach.该种读取模式官方最先支持,并在Spark 1.2提供了数据零丢失(zero-d

spark streaming从指定offset处消费Kafka数据

spark streaming从指定offset处消费Kafka数据 2017-06-13 15:19 770人阅读评论(2) 收藏举报分类: spark(5) 原文地址:http://blog.csdn.net/high2011/article/details/53706446 首先很感谢原文作者,看到这篇文章我少走了很多弯路,转载此文章是为了保留一份供复习用,请大家支持原作者,移步到上面的连接去看,谢谢一.情景:当Spark streaming程序意外退出时,数据仍然再往Kafka中

<转载> FreeNAS的安装和简单配置 http://freenas.cn/?p=342

前些日子在公司搭了一个模拟生产环境的平台.由于是测试环境,资源有限只能使用虚拟机实现,所以存储这块就想到了使用FreeNAS.很早以前玩儿过几次,当时是生产环境需要上存储设备,经过对比还是选择的更可靠的NetApp的存储设备.目前FreeNAS最新版本到了0.7.3514,下载地址是 http://www.freenas.cn/?page_id=9 .借用FreeNAS官方网站对它的描述:FreeNAS是一套免费的NAS服务器,它能将一部普通PC变成网络存储服务器.该软件基于FreeBSD,Sa