python - pandas drop duplicates of one column with criteria -

- January 15, 2013

I have such a datafile:

  AB 239616412 none 239616414 Name 2 239616417 None 239616417 None 239616417 None 239616418 Name 1 23 9 616418 None 23 9 616428 Name 1 23 9 616429 None 23 9 616429 None 23 9 616429 Name 1

I I want to delete the duplicate of column A, and I want to put that line in which there is a name in it (! = None, originally) in column B, but if there is no value in all duplicates, then I still want to keep it (like 23 9 616417 ).

It should be reduced:

  AB 239616412 None 23 9 616414 Name 2 23 9 616417 None 23 9 616418 Name 1 23 9 616428 Name 1 23 9 616429 Name 1

  df.sort ('B', inplace = True) DF out [24]: AB 5 23 9 616418 Name 1 7 23 9 616428 Name 1 10 23 9 616429 Name 1 1 23 9616414 Name 2 0 23 9 616412 Nain 2 23 9 616417 Nain 3 23 9 616417 Nain 4 23 9 616417 NAN 6 23 9 616418 NAN 8 23 9 616429 Nain 9 23 9 616429 Nain

Then drop duplicate wrt column 'A':

  df .drop_duplicates ('a', inplace = true) df out [26]: ab 5 23 9 616418 name 1 7 23 9 616428 Name 1 10 23 9 616429 Name 1 1 23 9 616414 Name 2 0 23 9 616412 NAN 2 239616417 NAN

You can re-sort the data frame to get exactly the same:

  df.sort (inplace = True) df Out [30]: AB 0 23 9 616412 Nain 1 23 9 616414 Name 2 2 23 9 616417 Nain 5 23 9 616418 Name 1 7 23 9 616428 Name 1 10 23 9 616429 Name 1

Search This Blog

Updating

python - pandas drop duplicates of one column with criteria -

Comments

Post a Comment

Popular posts from this blog

apache - 504 Gateway Time-out The server didn't respond in time. How to fix it? -

c# - .net WebSocket: CloseOutputAsync vs CloseAsync -

c++ - How to properly scale qgroupbox title with stylesheet for high resolution display? -