r - Collapse a dataframe -
I have a dataID with a sampleID, chromosome, start and stop, and minol score. I want to break the dataframe so that for each possible combination of start.pos and stop.pos, Chrome adds a minol score to all the samnains.
Input:
sampleID chrom start.pos end.pos meancol 1.1 0012102_A01 1 0 11,194,349 1 1.4 0012102_A01 1 11,194,349 11,492,125 1.5 1.51212102_A01 1 11,492,125 71,442,329 1 1.9 0012102_A01 1 71,442,329 249250621 1 1.13 0012102_A02 1 0 65493011 1 1,92 0012102_A02 1 65493011 66164733 1 1,102 0012102_A02 1 66164733 121347754 1 1,52 0012102_A02 1 121347754 249250621 0 1,14 0012102_A03 1 56384956 1 1,83 0012102_A03 1 56384956 106266297 1 1,73 0012102_A03 1 106266297 249250621 0 1,15 0012102_A04 1 0 51,484,139 1 1.27 0012102_A04 1 51484139 249250621 0 2.1 0012102_A01 2 50000001 1 2.2 0012102_A01 2 50000001 250000001 1 2.3 0012102_A02 2 50000001 0 2.7 0012102_A02 2 50000020 270000001 0 2.18 0012102_A03 2 50000004 2.19 0012102_A03 2 50000004004 250000001 0 1.15 0012102_A04 2 0 51,484,139 0 1.27 0012102_A04 2 51,484,139 249,250,621 0
production: here means everything from a sampleID points all Nbv start.pos and each chromosome end.pos has been added to the combination.
chrome Startkpos Andkpos Meankl 1 0 11194349 4 1 11194349 11492125 3 1 11492125 51484139 4 1 51484139 56384956 3 1 56384956 65493011 3 1 65493011 66164733 1 1 66164733 71442329 3 1 71442329 106266297 2 1 106266297 121347754 1 1 121347754 249250621 1 2 0 50000001 1 2 50000001 50000004 0 2 50000004 50000020 0 2 50000004 51484139 0 2 51484139 249250621 0 2 249250621 250000001 0
line of the input, for example, :
sampleID gives chrom start.pos end.pos meancol 1.1 0012102_A01 1 0 11,194,349 1
production amount for this area points in all Nmunaids:
chrome start.pos end.pos meancol 1 0 11194349 4
I'm not completely clear Here's what your criteria are for "overlap". You comment that, for chromosome 1, the range (0,11194349) appears in four rows: 1.1, 1.13, 1.14 and 11.15. Fair enough But you believe that limit
(65493011,66164733)
is only once appeared also appear in this border lines: 1.5, 1.92, 1.83, and 1.27 (a sum ( Meancol) = 3
). So either I can not understand your criteria, or there are errors in your example
Below the latter, here in FoolLope (...)
.table is a method by using the package (> = 1.9.4). As.data.table (DF) # df setkey has assumed its data (dt, chrom, start.pos -;
library (data.table) # version 1.9.4+ DT requires & lt; end.pos) limitations, list (start = head (type (unique (c) (start.pos, end.pos)), - 1), end = tail (sort (unique (c) (Start (limitations, Chrome, start, end) Indks and LT; - Fvrlap (limitations, DT, type = "safe") Indks [list (. Minol = sum (Minol)) = list (Chrome, start, end )] #Chrome Start and Meyonol # 1 1 0 11194349 4 # 2: 1 11194349 11492125 3 # 3: 1 11492125 51484139 4 # 4: 1 51484139 56384956 3 # 5: 1 56384956 65493011 3 # 6: 1 65493011 66164733 3 # 7: 1 66164733 71442329 3 # 8 : 1 71442329 106266297 3 # 9: 1 106 266 297 121 347 754 2 # 10: 1 121 347 754 249 250 621 1 # 11: 2 50000001 50000004 1 # 13: 2 50000004 50000020 1 # 14: 2 50000020 51484139 1 # 15: 2 51484139 249250621 1 # 16: 2 249250621 250000001 1 # 17: 2 250 000 001 270 000 001 0
Comments
Post a Comment