i'm new pandas , that's first question on stackoverflow, i'm trying analytics pandas.
i have text files data records want process. each line of file match record fields in fixed place , have length of fixed number of characters. there different kinds of records on same file, records share first field 2 characters depending of type of record. example:
some file: 01jhon smith 555-1234 03cow bos primigenius taurus 00401 01jannette jhonson 00100000000 ... field start length type 1 2 *common records, example: 01 = person, 03 = animal name 3 10 surname 13 10 phone 23 8 credit 31 11 fill of spaces i'm writing code convert 1 record dictionary:
person1 = {'type': 01, 'name': = 'jhon', 'surname': = 'smith', 'phone': '555-1234'} person2 = {'type': 01, 'name': 'jannette', 'surname': 'jhonson', 'credit': 1000000.00} animal1 = {'type': 03, 'cname': 'cow', 'sciname': 'bos....', 'legs': 4, 'tails': 1 } if field empty (filled spaces) there not in dictionary).
with records of 1 kind want create pandas dataframe dicts keys columns names, i've try pandas.dataframe.from_dict() without success.
and here comes question: way pandas dict keys become column names? other standard method deal kind of files?
to make dataframe dictionary, can pass list of dictionaries:
>>> person1 = {'type': 01, 'name': 'jhon', 'surname': 'smith', 'phone': '555-1234'} >>> person2 = {'type': 01, 'name': 'jannette', 'surname': 'jhonson', 'credit': 1000000.00} >>> animal1 = {'type': 03, 'cname': 'cow', 'sciname': 'bos....', 'legs': 4, 'tails': 1 } >>> pd.dataframe([person1]) name phone surname type 0 jhon 555-1234 smith 1 >>> pd.dataframe([person1, person2]) credit name phone surname type 0 nan jhon 555-1234 smith 1 1 1000000 jannette nan jhonson 1 >>> pd.dataframe.from_dict([person1, person2]) credit name phone surname type 0 nan jhon 555-1234 smith 1 1 1000000 jannette nan jhonson 1 for more fundamental issue of 2 differently-formatted files intermixed, , assuming files aren't big can't read them , store them in memory, i'd use stringio make object sort of file has lines want, , use read_fwf (fixed-width-file). example:
from stringio import stringio def get_filelike_object(filename, line_prefix): s = stringio() open(filename, "r") fp: line in fp: if line.startswith(line_prefix): s.write(line) s.seek(0) return s and then
>>> type01 = get_filelike_object("animal.dat", "01") >>> df = pd.read_fwf(type01, names="type name surname phone credit".split(), widths=[2, 10, 10, 8, 11], header=none) >>> df type name surname phone credit 0 1 jhon smith 555-1234 nan 1 1 jannette jhonson nan 100000000 should work. of course separate files different types before pandas ever sees them, might easiest of all.
Comments
Post a Comment