functional programming - Consolidating a data table in Scala -

i working on small data analysis tool, , practicing/learning scala in process. got stuck @ small problem.

assume data of type:

x   gr1     x_11    ... x_1n x   gr2     x_21    ... x_2n .. x   grk     x_k1    ... x_kn y   gr1     y_11    ... y_1n y   gr3     y_31    ... y_3n .. y   gr(k-1)     ...

here have entries (x,y...) may or may not exist in k groups, series of values each group. want pretty simple (in theory), consolidate rows belong same "entity" in different groups. instead of multiple lines start x, want have 1 row values x_11 x_kn in columns.

what makes things complicated not entities exist in groups. wherever there's "missing data" pad instance zeroes, or string denotes missing value. if have (x,y,z) in 3 groups, type table want have follows:

x   x_11    x_12    x_21    x_22    x_31    x_32 y   y_11    y_12    n/a     n/a     y_31    y_32 z   n/a     n/a     z_21    z_22    n/a     n/a

i have been stuck trying figure out, there smart way use list functions solve this?

i wrote simple loop:

for {   (id, hitlist) <- hits.groupby(_.acc)   h <- hitlist } println(id + "\t" + h.sampleid + "\t" + h.ratios.mkstring("\t"))

to able generate tables example above. note that, original data of different format , layout,but has little problem @ hand, have skipped steps regarding parsing. should able use groupby in better way solves me, can't seem there.

then modified loop mapping hits ratios , appending them 1 another:

for ((id, hitlist) <- hits.groupby(_.acc)){   val l = hitlist.map(_.ratios).foldright(list[double]()){     (l1: list[double], l2: list[double]) => l1 ::: l2   }   println(id + "\t" + l.mkstring("\t"))   //println(id + "\t" + h.sampleid + "\t" + h.ratios.mkstring("\t")) }

that gets me 1 step closer still no cigar! instead of padded "matrix" jagged table. taking example above:

x   x_11    x_12    x_21    x_22    x_31    x_32 y   y_11    y_12    y_31    y_32 z   z_21    z_22

any ideas how can pad table values respective groups aligned 1 another? should able use _.sampleid, holds "group membersip" each "hit", not sure how exactly. ´hits´ list of type hit practically wrapper each row, giving convenience methods getting individual values, tuple have "named indices" (such .acc, .sampleid..)

(i solve problem without hardcoding number of groups, might change case case)

thanks!

this bit of contrived example, think can see going:

  case class hit(acc:string, subacc:string, value:int)    val hits = list(hit("x", "x_11", 1), hit("x", "x_21", 2), hit("x", "x_31", 3))   val kmax = 4   val nmax = 2    {     (id, hitlist) <- hits.groupby(_.acc)     k <- 1 kmax     n <- 1 nmax   } yield {     val subid = "x_%s%s".format(k, n)     val row = hitlist.find(h => h.subacc == subid).getorelse(hit(id, subid, 0))      println(row)   }  //prints hit(x,x_11,1) hit(x,x_12,0) hit(x,x_21,2) hit(x,x_22,0) hit(x,x_31,3) hit(x,x_32,0) hit(x,x_41,0) hit(x,x_42,0)

if provide more information on hits lists come little more accurate.

Brazier

Search This Blog

functional programming - Consolidating a data table in Scala -

Comments

Post a Comment