1. ホーム
  2. パイソン

python は特定の列を選択する - pandas の iloc と loc と icol の使用 (列のスライシングと行のスライシング)

2022-02-28 13:45:06
Suppose df is a dataframe with column names A B C D
as follows.


A B C D

0 ss 小红 8
1 aa 小明 d
4 f f
6 ak 小紫 7
The attributes in dataframe are not specified, and the null value defaults to NA.
I. Select the columns labeled A and C, and the selected object type is still dataframe


df = df.loc[:, ['A', 'C']]
df = df.iloc[:, [0, 2]]
Second, select the label C and only take the first two lines, and the selected object type or dataframe


df = df.loc[0:2, ['A', 'C']]
df = df.iloc[0:2, [0, 2]] 
The smart ones can already see the difference between iloc and loc: loc picks columns based on the specific label of the dataframe, while iloc counts from 0 based on where the label is located. ", the
" in front of the ": " that the selection of the entire column, the second example of 0:2 that the selection of the 0th line to the second line, where the 0:2 equivalent to [0,2) before closed after open, 2 is not in the range.

Note that in the case of


df = df.loc[0:2, ['A', 'C']]
or


df = df.loc[0:2, ['A', 'C']]
The type after slicing is still a dataframe, so you can't add, subtract, multiply, or divide directly.
For example, if one column of the dataframe is the math grade (shuxue) and the other is the language grade (yuwen), and now you need to ask for the sum of the two courses, you can use


df['shuxue'] + df['yuwen'] # After selecting, the type is series
to get the total score, instead of using


df.iloc[:,[2]]+df.iloc[:,[1]]
or


df.iloc[:,['shuxue']]+df.iloc[:,['yuwen']]
This produces an error result.

There is also a way to use df.icol(i) to select the column, select the end of not dataframe but series, i for the column where the position, counting from 0.

If you want to select a row of data, you can use df.loc[[i]] or df.iloc[[i]].


A B C D

0 ss 小红 8
1 aa 小明 d
4 f f
6 ak 小紫 7

The attributes in dataframe are not specified, and the null value defaults to NA.
I. Select the columns labeled A and C, and the selected object type is still dataframe


df = df.loc[:, ['A', 'C']]
df = df.iloc[:, [0, 2]]

Second, select the label C and only take the first two lines, and the selected object type or dataframe


df = df.loc[0:2, ['A', 'C']]
df = df.iloc[0:2, [0, 2]] 

The smart ones can already see the difference between iloc and loc: loc picks columns based on the specific label of the dataframe, while iloc counts from 0 based on where the label is located. ", the
" in front of the ": " that the selection of the entire column, the second example of 0:2 that the selection of the 0th line to the second line, where the 0:2 equivalent to [0,2) before closed after open, 2 is not in the range.

Note that in the case of


df = df.loc[0:2, ['A', 'C']]

or


df = df.loc[0:2, ['A', 'C']]

The type after slicing is still a dataframe, so you can't add, subtract, multiply, or divide directly.
For example, if one column of the dataframe is the math grade (shuxue) and the other is the language grade (yuwen), and now you need to ask for the sum of the two courses, you can use


df['shuxue'] + df['yuwen'] # After selecting, the type is series

to get the total score, instead of using


df.iloc[:,[2]]+df.iloc[:,[1]]

or


df.iloc[:,['shuxue']]+df.iloc[:,['yuwen']]

This produces an error result.

There is also a way to use df.icol(i) to select the column, select the end of not dataframe but series, i for the column where the position, counting from 0.

If you want to select a row of data, you can use df.loc[[i]] or df.iloc[[i]].