scala - Self-consistent and updated example of using Spark over ElasticSearch -


showing how to combine eListyCearch and Spark, when all ES ECOs The version was 0.9 in the system. Nowadays, it does not work anymore (and googling for it does not seem to be an easy feat). You can give a small, self-contained scala example to a person:

  1. Open a file in Spark (in the above example, it was / var / log / syslog ) ;
  2. Do something with it;
  3. The result is being sent in ES;
  4. Opening that result in Spark.

... which works with elastic search 1.3.4 and Spark 1.1.0.

I gave a speech shortly after spark and elastic discovery (around 0.9 days), and I have recently updated some examples (read 1.1). I have posted more Hope it helps!

I have also copied the relevant parts (from my own Gitob repo):

  import org.elasticsearch.spark.sql._...walt tweetsAsCS = CreateSchemaRDD (tweetRDD.map (sharedIndex.prepareTweetsCaseClass)) tweetsAsCS.saveToEs (esResource)  

Note that we have not specified an ES node. This will default to the local host trying to save the cluster. If we want to use a different cluster, then we can add it:

  // If we want a different SE cluster, then we import org.elasticsearch.hadoop.cfg.ConfigurationOptions val Config = new SparkConf () config.set (ConfigurationOptions.ES_NODES, node) // Discovery // Set the node for other configuration settings Val = new spark contact (config)  
< P> Part before doing this (indexing some data).

Spark has also become very easy to inquire with E, although only your data types are supported by the mapping of the connector (in the primary which I went to it was the geographical location but to expand the mapper Easy enough for you to walk on it) {\ "Match_all \": {}}, \ "filter \", \ "filter \", \ "query \": {\ "filter \": {\ "match_all \": {}}, \ "filter \ {\ "Geo_distance \": {\ "distance \": \ "" + dist + "km \", \ "location \": {\ "Lat \": "+ lat +", \ "lon \ ":" + Lon + "}}}}} Wall tweets = sqlCtx.esRDD (esResource, query)

The ESRDD function is not normal, on Y, SQLContext, but the above conversions They import, they provide us. Tweets are now a schema rudy and we can update it as desired and save the result in advance as we did in the first part of this example.

Hope it helps!


Comments

Popular posts from this blog

apache - 504 Gateway Time-out The server didn't respond in time. How to fix it? -

c# - .net WebSocket: CloseOutputAsync vs CloseAsync -

c++ - How to properly scale qgroupbox title with stylesheet for high resolution display? -