i working on similar issue stated on this other posting , tried adapting code select columns interested in , making fit data file.
my issue, however, resulting file has become larger original one, , i'm not sure code working way intended.
when open spss, dataset seems have taken in header line, , made millions of copies without end of second line (i had force stop process).
i noticed there's no counter in while loop specifying line, might case? background in programming r limited. file .csv , 4.8gb 329 variables , millions of rows. need keep around 30 of variables.
this code used:
##open separate connections hold cursor position file.in <- file('npidata_20050523-20130707.csv', 'rt') file.out<- file('mainoutnpidata.txt', 'wt') line<-readlines(file.in,n=1) line.split <-strsplit(line, ',') ##column picking, column 1 cat(line.split[[1]][1:11],line.split[[1]][23:25], line.split[[1]][31:33], line.split[[1]][308:311], sep = ",", file = file.out, fill= true) ##use loop read in rest of lines line <-readlines(file.in, n=1) while (length(line)){ line.split <-strsplit(line, ',') if (length(line.split[[1]])>1) { cat(line.split[[1]][1:11],line.split[[1]][23:25], line.split[[1]][31:33], line.split[[1]][308:311],sep = ",", file = file.out, fill= true) } } close(file.in) close(file.out)
one thing wrong jumps out missing lines <- readlines(file.in, n=1) inside while loop. stuck in infinite loop. also, reading 1 line @ time going terribly slow.
if in file (unlike 1 in example linked to) every row contains same number of columns, use laf package. should result in along lines of:
library(laf) m <- detect_dm_csv("npidata_20050523-20130707.csv", header=true) laf <- laf_open(m) begin(laf) con <- file("mainoutnpidata.txt", 'wt') while(true) { d <- next_block(laf, columns = c(1:11, 23:25, 31:33, 308:311)) if (nrow(d) == 0) break; write.csv(d, file=con, row.names=false, header=false) } close(con) close(laf) if 30 columns fit memory do:
library(laf) m <- detect_dm_csv("npidata_20050523-20130707.csv", header=true) laf <- laf_open(m) d <- laf[, c(1:11, 23:25, 31:33, 308:311)] close(laf) i couldn't test code above on file, can't guarantee there no errors (let me know if there are).
Comments
Post a Comment