Grouping and Sorting
Table of contents
import pandas as pd
Grouping (groupby()
)
groupby()
created a group of (primitive) dataframe which allotted the same column values to the rows. (ifaxis='index'
)- return 되는 객체(
pandas.core.groupby
)의 구조 : We generate each group as being a slice of our DataFrame containing only data with values that match. : 기준열(행)의 값에 따라 dataframe 을 분류(분리)해 놓은 것. - method 작동 구조 (roughly) : 원래의 dataframe 을 기준에 따라 여러 dataframe 덩어리로 나누고, 각각의 dataframe 에 대하여 덩어리씩 method() 적용.
groupby()
의 return 인 GroupBy 자체보다는 이 객체가 가지고 있는 수많은 method 가 중요하다.
(ex) dataframe.groupby('column_name').column_name.count()
== dataframe.column_name.value_counts()
: just shortcut
DataFrame.groupby(by, axis) # df.groupby('column_name')
- by : mapping, function, label, or list of labels
- list of labels : multiple classification criteria -> multi-index dataframe
- axis: {0 or ‘index’, 1 or ‘columns’}, default 0
return : pandas.core.groupby.generic.DataFrameGroupBy
useful Groupby methods
Groupby.agg([callable, callable, ...]) # 한꺼번에 적용
DataFrameGroupBy.coloumn_name.count() == DataFrameGroupBy.size()
DataFrameGroupBy.min()
DataFrameGroupBy.describe()
DataFrameGroupBy.first()
DataFrameGroupBy.last()
DataFrameGroupBy.apply()
Sorting
sort by values of rows/columns
DataFrame.sort_values(by, axis, ascending, inplace)
- by : str or list of str
- list of str -> multiple criteria
- axis : {0 or ‘index’, 1 or ‘columns’}, default 0
- ascending : bool or list of bool, default True
- list of bool -> must match the length of the by
- inplace : bool, default False
sort by index of row/column
DataFrame.sort_index(axis, level, ascending, inplace)
- axis : {0 or ‘index’, 1 or ‘columns’}, default 0
- level : int or level name or list of ints or list of level names
- ascending : bool or list-like of bools, default True
- list-like of bool -> for multiindex, indivisually
- inplace : bool, default False