Pandas的count()与value_counts()区别
时间:2023-05-27 22:37:01
pandas.DataFrame.count使用函数方法
count计算每列或每行的非NA单元格。
值None,NaN,NaT和可选的numpy.inf(取决于pandas.options.mode.use_inf_as_na)被视为NA。
demo:
import pandas as pdimport numpy as npdf = pd.DataFrame({ "Person": ["John", "Myla", "Lewis", "John", "Myla"], "Age": [24., np.nan, 21., 33, 26], "Single": [False, True, True, True, False]})print(df)print('______count begin_________')print(df.groupby('Person').count())
Out:
Person Age Single0 John 24.0 False1 Myla NaN True2 Lewis 21.0 True3 John 33.0 True4 Myla 26.0 False______count begin_________ Age SinglePerson John 2 2Lewis 1 1Myla 1 2
由上可见,False也是计入count的。
进一步探讨。
df = pd.DataFrame({ "Person": ["John", "Myla", "Lewis", "John", "Myla"], "Age": [24., np.nan, 21., 33, 26], "Single": [False, True, True, True, False]})print(df)print('______count begin_________')print(df.groupby('Person')['Age'].count())print('______valuecount begin_________')print(df.groupby('Person')['Age'].value_counts())
Out:
Person Age Single0 John 24.0 False1 Myla NaN True2 Lewis 21.0 True3 John 33.0 True4 Myla 26.0 False______count begin_________PersonJohn 2Lewis 1Myla 1Name: Age, dtype: int64______valuecount begin_________Person Age John 24.0 1 33.0 1Lewis 21.0 1Myla 26.0 1Name: Age, dtype: int64
总结:
groupby('Person')['Age'].count()
统计了Person里各个特征在Age中的非N值出现的次数。John出现了两次,Lewis出现了一次, Myla出现了两次,但有一次在Age为空值。
groupby('Person')['Age'].value_counts()
统计了以Person为索引,Age每一个特征出现的次数。
groupby(‘A’)[‘B’].count(),统计的还是A的特征
groupby(‘A’)[‘B’].value_counts(),统计的是B的特征
print(df.groupby(['Person'])['Age'].count())
还是统计Person的特征在Age里非零值的出现次数。
PersonJohn 2Lewis 1Myla 1Name: Age, dtype: int64
print(df.groupby(['Person'])['Age'].sum())
统计Person在Age里对应的数值的和。
PersonJohn 57.0Lewis 21.0Myla 26.0Name: Age, dtype: float64