apache spark - Scala import TSV -


I have some problems in parsing a TSV file in Scala. The following is the code that operates on the code.

  var lines = file.map (_. Partition ('\ n')) var node = lines.foreach (_.split (' 

Input

pre> 1 one 24 2 3 3 6 6

desired Operation

  a 27 b 6  

I get an error after performing the split ('\ t'), it states that this [ String], which is quite strange like - inside each foreach, every element is taken at a time.

First of all, a foreach This is an assignment unit, so it is not what you want. Use unchanging vals It is also unclear what a format file is, so you have to get the string anyway.

  Val file = "1 \ ta \ t24 \ n2 \ ta \ t3 \ n3 \ tb \ t6" wall line = file split ("\ n") val nodes = lines.map (_ divided ("\ t") Map. (A = & gt; (A (1), A (2)). GroupBy (_._ 1) .map (a => (a._1, a._2.map (b = & Gt; B._2.toInt) .sum) // map (b -> 6, a - gt; 27)  

This is a big hot mess so I try to break it down I will:

  val lines = file.split ("\ n") // strings divided into rows of strings wall nodes = lines Split the map into the array of strings. Map (a => (a (1), a (2)) // // From the array as a tube Get only two other items. Group by (_._ 1) // First item in group trupp. Map (a => (a._1, a._2.map (b = & gt; b._2.toInt) .sum)) // Take each tube and map the second value (array of strings) into an array of intits and get the amount  

If you do not like the map for the final output So you can easily replace it with toList or map to the map for whatever you want.


Comments

Popular posts from this blog

apache - 504 Gateway Time-out The server didn't respond in time. How to fix it? -

c# - .net WebSocket: CloseOutputAsync vs CloseAsync -

c++ - How to properly scale qgroupbox title with stylesheet for high resolution display? -