apache spark - Scala import TSV -
I have some problems in parsing a TSV file in Scala. The following is the code that operates on the code.
var lines = file.map (_. Partition ('\ n')) var node = lines.foreach (_.split ('
Input
pre> 1 one 24 2 3 3 6 6
desired Operation
a 27 b 6
I get an error after performing the split ('\ t'), it states that this [ String], which is quite strange like - inside each foreach, every element is taken at a time.
First of all, a foreach This is an assignment unit, so it is not what you want. Use unchanging vals It is also unclear what a format file is, so you have to get the string anyway.
Val file = "1 \ ta \ t24 \ n2 \ ta \ t3 \ n3 \ tb \ t6" wall line = file split ("\ n") val nodes = lines.map (_ divided ("\ t") Map. (A = & gt; (A (1), A (2)). GroupBy (_._ 1) .map (a => (a._1, a._2.map (b = & Gt; B._2.toInt) .sum) // map (b -> 6, a - gt; 27)
This is a big hot mess so I try to break it down I will:
val lines = file.split ("\ n") // strings divided into rows of strings wall nodes = lines Split the map into the array of strings. Map (a => (a (1), a (2)) // // From the array as a tube Get only two other items. Group by (_._ 1) // First item in group trupp. Map (a => (a._1, a._2.map (b = & gt; b._2.toInt) .sum)) // Take each tube and map the second value (array of strings) into an array of intits and get the amount
If you do not like the map for the final output So you can easily replace it with toList
or map to the map for whatever you want.
Comments
Post a Comment