1. ホーム

pandas Tutorial [4] データフレームフィルタリングデータ !

2022-01-23 16:45:31
>>> import pandas as pd
>>> import numpy as np
>>> # DataFrame is still used today, if you use its function of filtering data, you will be amazed, it is very good at filtering data, can greatly improve your work efficiency, without further ado, the following look at a few examples of complex data filtering.
>>> # First we create a DataFrame, the DataFrame contains the following data
>>> df=pd.DataFrame(np.random.randn(6,4),columns=list('ABCD'))
>>> df
          A B C D
0 -1.108935 1.187163 1.546778 0.246329
1 -0.015045 1.367264 -0.617322 -1.068358
2 0.502788 0.305497 -0.819171 -0.331027
3 2.585354 -0.043285 1.056259 -0.079882
4 0.316549 -1.464567 1.504431 0.803362
5 -1.097251 -0.706594 -1.393058 -0.251690
>>> #If we want to filter the data in column D for rows greater than 0
>>> df[df.D>0]
          A B C D
0 -1.108935 1.187163 1.546778 0.246329
4 0.316549 -1.464567 1.504431 0.803362
>>> # Use & symbols can achieve multi-conditional filtering, of course, is to use "|" symbols can also achieve multi-conditional, except that he is the relationship of or.
>>> df[(df.D>0)&(df.C<0)]
Empty DataFrame
Columns: [A, B, C, D]
Index: []
>>> df[(df.D<0)&(df.C>0)]
          A B C D
3 2.585354 -0.043285 1.056259 -0.079882
>>> df[(df.D<0.5)&(df.C>1.5)]
          A B C D
0 -1.108935 1.187163 1.546778 0.246329
>>> df[(df.D<0.5)|(df.C>1.5)]
          A B C D
0 -1.108935 1.187163 1.546778 0.246329
1 -0.015045 1.367264 -0.617322 -1.068358
2 0.502788 0.305497 -0.819171 -0.331027
3 2.585354 -0.043285 1.056259 -0.079882
4 0.316549 -1.464567 1.504431 0.803362
5 -1.097251 -0.706594 -1.393058 -0.251690
>>> df[(df.D<0.5)|(df.C>1.52)]
          A B C D
0 -1.108935 1.187163 1.546778 0.246329
1 -0.015045 1.367264 -0.617322 -1.068358
2 0.502788 0.305497 -0.819171 -0.331027
3 2.585354 -0.043285 1.056259 -0.079882
5 -1.097251 -0.706594 -1.393058 -0.251690
>>> # If we only need the data in columns A and B, and the data in columns D and C are used for filtering, we can write it like this: only the data in columns AB are returned
>>> df[['A','B']][(df.D>0)&(df.C<0)]
Empty DataFrame
Columns: [A, B]
Index: []
>>> df[['A','B']][(df.D<0)&(df.C>0)]
          A B
3 2.585354 -0.043285
>>> index = (df.D<0)&(df.C>0)
>>> index
0 False
1 False
2 False
3 True
4 False
5 False
dtype: bool
>>> df(index)
Traceback (most recent call last):
  File "<pyshell#19>", line 1, in <module>
    df(index)
TypeError: 'DataFrame' object is not callable
>>> df[index]
          A B C D
3 2.585354 -0.043285 1.056259 -0.079882
>>> # We can also use the insin method to filter for specific values by writing the values to be filtered to a list, such as alist
>>> alist=[-0.079882,0.687050,0.3685412]
>>> df['D'].isin(alist)
0 False
1 False
2 False
3 False
4 False
5 False
Name: D, dtype: bool
>>> alist=[0.246329]
>>> df['D'].isin(alist)
0 False
1 False
2 False
3 False
4 False
5 False
Name: D, dtype: bool
>>> df[df['D'].isin(alist)]
Empty DataFrame
Columns: [A, B, C, D]
Index: []
>>> df=pd.DataFrame(np.random.normal(6,4),columns=list('ABCD'))
Traceback (most recent call last):
  File "<pyshell#27>", line 1, in <module>
    df=pd.DataFrame(np.random.normal(6,4),columns=list('ABCD'))
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\frame.py", line 422, in __init__
    raise ValueError('DataFrame constructor not properly called!')
ValueError: DataFrame constructor not properly called!
>>> df=pd.DataFrame(np.range(16).reshape(4,4),columns=list('ABCD'))
>>> df
    A B C D
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
3 12 13 14 15
>>> alist=[11]
>>> df['D'].isin(alist)
0 False
1 False
2 True
3 False
Name: D, dtype: bool
>>> df[df['D'].isin(alist)]
   A B C D
2 8 9 10 11
>>>