Create a pandas DataFrame from multiple dicts -

i'm new pandas , that's first question on stackoverflow, i'm trying analytics pandas.

i have text files data records want process. each line of file match record fields in fixed place , have length of fixed number of characters. there different kinds of records on same file, records share first field 2 characters depending of type of record. example:

some file: 01jhon      smith     555-1234                                         03cow            bos primigenius taurus        00401                   01jannette  jhonson           00100000000                              ...   field    start  length    type         1       2   *common records, example: 01 = person, 03 = animal name         3      10 surname     13      10 phone       23       8 credit      31      11 fill of spaces

i'm writing code convert 1 record dictionary:

person1 = {'type': 01, 'name': = 'jhon', 'surname': = 'smith', 'phone': '555-1234'} person2 = {'type': 01, 'name': 'jannette', 'surname': 'jhonson', 'credit': 1000000.00} animal1 = {'type': 03, 'cname': 'cow', 'sciname': 'bos....', 'legs': 4, 'tails': 1 }

if field empty (filled spaces) there not in dictionary).

with records of 1 kind want create pandas dataframe dicts keys columns names, i've try pandas.dataframe.from_dict() without success.

and here comes question: way pandas dict keys become column names? other standard method deal kind of files?

to make dataframe dictionary, can pass list of dictionaries:

>>> person1 = {'type': 01, 'name': 'jhon', 'surname': 'smith', 'phone': '555-1234'} >>> person2 = {'type': 01, 'name': 'jannette', 'surname': 'jhonson', 'credit': 1000000.00} >>> animal1 = {'type': 03, 'cname': 'cow', 'sciname': 'bos....', 'legs': 4, 'tails': 1 } >>> pd.dataframe([person1])    name     phone surname  type 0  jhon  555-1234   smith     1 >>> pd.dataframe([person1, person2])     credit      name     phone  surname  type 0      nan      jhon  555-1234    smith     1 1  1000000  jannette       nan  jhonson     1 >>> pd.dataframe.from_dict([person1, person2])     credit      name     phone  surname  type 0      nan      jhon  555-1234    smith     1 1  1000000  jannette       nan  jhonson     1

for more fundamental issue of 2 differently-formatted files intermixed, , assuming files aren't big can't read them , store them in memory, i'd use stringio make object sort of file has lines want, , use read_fwf (fixed-width-file). example:

from stringio import stringio  def get_filelike_object(filename, line_prefix):     s = stringio()     open(filename, "r") fp:         line in fp:             if line.startswith(line_prefix):                 s.write(line)     s.seek(0)     return s

and then

>>> type01 = get_filelike_object("animal.dat", "01") >>> df = pd.read_fwf(type01, names="type name surname phone credit".split(),                       widths=[2, 10, 10, 8, 11], header=none) >>> df    type      name  surname     phone     credit 0     1      jhon    smith  555-1234        nan 1     1  jannette  jhonson       nan  100000000

should work. of course separate files different types before pandas ever sees them, might easiest of all.

Brazier

Search This Blog

Create a pandas DataFrame from multiple dicts -

Comments

Post a Comment