flume - Using Kafka to import data to Hadoop -


At first I was wondering if incidents should be used to obtain in Hadop, where they will be stored and From time to time analysis will be done on them (possibly using Ooozie to periodically analyze) Kafka or flu, and decided that Kafka is probably a better solution, because we have a component that processes the event So, in this way, both batch and Event processing components get data in the same

but know that I'm curtly looking for suggestions on how to get data from Broker to Hadoop.

I found that Flume can be used in conjunction with kafka

  • Fluid - Includes the source (consumer) and sync (manufacturer)

And it has also been found on the same page and has a name in it

  • Camus - LinkedIn's Kaafka => HDFS pipeline This is a link for all data usage Is done, and works great.

I'm interested in what would be better (and easier, better document solution) to do this? Also, are there any examples or tutorials how to do this?

When I want to simplify this type of use, high-level consumer ?

I have been opened for suggestions if there is another / better solution than this two.

Thank you

You use the flu to dump the data from Kfca to HDFS can do. The following is an example of the fact that the source of the flu is the source and the sink is its property file change.

Step:

  1. Create a cool theme Cafa theme - Creating - Zucker localhost: 2181 cuff console creator Kafka-console-producer Use the above mentioned subject to write on.

  2. < / Li>

    flume1.sources = Kafka-source-1
    flume1.channels = hdfs-channel-1
    flume1.sinks = hdfs-sink-1
    flume1.sources. Kafka-source-1.type = Org.apache.flume.source.kafka.KafkaSource
    flume1.sources.kafka-source-1.zookeeperConnect = Localhost: 2181
    flume1.sources.kafka-source-1. Topic = testkafka
    flume1 .sources.kafka-source-1.batchSize = 100
    flume1.sources.kafka-source-1.channels = hdfs-channel-1

    flume1 .channels.hdfs-channel- 1. Type = Memory-flume1.sinks.hdfs-sink-1.channel = hdfs-channel-1
    Flu 1.sinks.hdfs-sink-1.type = hdfs
    Flume1.sinks.hdfs-sink-1.hdfs.writeFormat = text
    flume1.sinks.hdfs-sink-1.hdfs.fileType = datastream flume1.sinks.hdfs-sink-1.hdfs.filePrefix = Test-events
    flume1.sinks.hdfs-sink-1.hdfs.useLocalTimeStamp = true
    flu Me1.sinks.hdfs-sink-1 Hdfs.path = / tmp / cuff /% {subject} /% y-% m-% d
    flume1.sinks.hdfs-sink-1.hdfs.rollCount = 100 < Br> flume1.sinks.hdfs-sync1.hdfs.rollSize = 0
    flume1.channels.hdfs-channel-1.capacity = 10000
    flume1.channels.hdfs-channel-1.transactionCapacity = 1000 < Save the above config file as example.conf

    1. Flom agent flume-ng agent -n flume1 -c conf -f example.conf - run Dflume.root

      / tmp / kafka /% {subject} / /

    2. < / Li>% Y-% m-% d


Comments

Popular posts from this blog

apache - 504 Gateway Time-out The server didn't respond in time. How to fix it? -

c# - .net WebSocket: CloseOutputAsync vs CloseAsync -

c++ - How to properly scale qgroupbox title with stylesheet for high resolution display? -