pd.read_csv() パラメータ
2022-03-01 22:34:20
filepath_or_buffer : various file paths
sep What to split
delimiter : str, default None What to divide by default None
delim_whitespace : boolean, default False Equivalent to setting sep='\s+'
header : int or list of ints, default 'infer' header=0 means the first line of the file
You can also write header=[1,2,3] as the column name if there is no column name header=None is usually used together with names
names means that the column names used are packed in a list, and it is better not to have duplicate values, don't forget the rename method
index_col : int or sequence or False, default None Specify that column as the index of the line, can be a number or a list, if there is a separator at the end, you can put index_col=False
usecols : list-like or callable, default None Returns a subset of the given list, depending on which column you want, e.g. usecols=[1, 5] will only read the first and fifth columns, or you can define your own column names
from pandas.compat import StringIO, BytesIO
data = ('col1,col2,col3\n''a,b,1\n''a,b,2\n''c,d,3')
pd.read_csv(StringIO(data))
squeeze : boolean, default False Returns Series if the parsed data returns only 1 column
prefix : str, default None Add a prefix to the column number when there is no header, you can use header=None, prefix='x'
mangle_dupe_cols : boolean, default True Set duplicate columns to X', 'X.1'...'X.N'
dtype : Type name or dict of column -> type, default None
dtype : Type name or dict of column -> type, default None Set different types for each column when reading data
It can look like this {'a': str}, str
engine : {'c', 'python'} supported engines, c's engine is the fastest but python engine is more compatible
converters : dict, default None Converts a value in a column my usage is converters={'col1': lambda x: x.upper()} which feels similar to apply
true_values : list, default None sets the values in the list to True
false_values sets the values in the list to False e.g. false_false=['me'] will make all the words that are 'me' False
skipinitialspace : boolean, default False Skip spaces after separator
skiprows : list-like or integer, default None Skip the first few lines
pd.read_csv(StringIO(data), skiprows=lambda x: x % 2 ! = 0)
skipfooter : int, default 0
nrows is used to read the number of lines in the file, usually used when reading large files
memory_map : boolean, default False If a filepath is provided for filepath_ or _buffer, map the file object directly to memory and access the data directly from memory. Use this option to improve performance, as there is no longer any I/O overhead.
na_values : scalar, str, list-like, or dict, default None Set what value to Nan For example, set bat to a Nan value like this na_values=['bat]
keep_default_na : boolean, default True
na_filter : boolean, default True If = False the first 2 settings are invalid
False can improve the reading speed of large files when there is no na value inside the file
verbose : boolean, default False
skip_blank_lines : boolean, default True If true, skip blank lines instead of interpreting them as NaN values.
parse_dates : boolean or list of ints or names or list of lists or dict, default False.
If True -> try parsing the index. will automatically parse the dates
If [1, 2, 3] -> try parsing columns 1, 2, 3 each as a separate date column.
If [[1, 3]] -> combine columns 1 and 3 and parse as a single date column.
If {'foo': [1, 3]} -> parse columns 1, 3 as date and call result 'foo'. A fast-path exists for iso8601-formatted dates.
infer_datetime_format : boolean, default False
If True and parse_dates is enabled for a column, attempt to infer the datetime format to speed up the processing.
keep_date_col : boolean, default False
If True and parse_dates specifies combining multiple columns then keep the original columns.
date_parser : function, default None
dayfirst : boolean, default False
DD/MM format dates, international and European format.
iterator : boolean, default False Returns the textfileReader object to iterate over or fetch chunks using get_chunk().
chunksize : int, default None
compression : {'infer', 'gzip', 'bz2', 'zip', 'xz', None}, default 'infer'}
Used to dynamically decompress data on disk. If filepath_ or _buffer is a string ending with ".gz", ".bz2", ".zip" or ". xz", then gzip, bz2, zip or xz is used, otherwise no decompression is done. If "zip" is used, the zip file can only contain one data file to be read. Set to "None" for no decompression.
thousands : str, default None thousands separator
decimal : str, default '.'
The character to be recognized as a decimal point. For example, use "," for European data.
float_precision : string, default None
comment : str, default None
encoding : str, default None
error_bad_lines : boolean, default True Whether to skip the wrong lines Default is not to skip 2
warn_bad_lines : boolean, default True
df['col_1'].apply(type).value_counts()
df2['col_1'] = pd.to_numeric(df2['col_1'], errors='coerce')
Finally, how to handle mixed data types containing
.... : 'A,B,C\n'
.... : '1,2.,4.\n'
.... : '5.,NaN,10.0\n')
pd.read_csv(StringIO(data), comment='#', skiprows=4, header=1)
When the file is prepared with separators at the end of each data line, some exceptions occur, which can confuse the parser. To explicitly disable index column inference and drop the last column, pass index_col=False:
pd.read_csv(StringIO(data), index_col=False)
df = pd.read_csv('foo.csv', index_col=0, parse_dates=True)
kord,19990127, 19:00:00, 18:56:00, 0.8100
kord,19990127, 20:00:00, 19:56:00, 0.0100
kord,19990127, 21:00:00, 20:56:00, -0.5900
kord,19990127, 21:00:00, 21:18:00, -0.9900
kord,19990127, 22:00:00, 21:56:00, -0.5900
kord,19990127, 23:00:00, 22:56:00, -0.5900
df = pd.read_csv('tmp.csv', header=None, parse_dates=[[1, 2], [1, 3]])
1_2 1_3 0 4
0 1999-01-27 19:00:00 1999-01-27 18:56:00 KORD 0.81
1 1999-01-27 20:00:00 1999-01-27 19:56:00 KORD 0.01
2 1999-01-27 21:00:00 1999-01-27 20:56:00 KORD -0.59
3 1999-01-27 21:00:00 1999-01-27 21:18:00 KORD -0.99
4 1999-01-27 22:00:00 1999-01-27 21:56:00 KORD -0.59
5 1999-01-27 23:00:00 1999-01-27 22:56:00 KORD -0.59
df = pd.read_csv('tmp.csv', header=None, parse_dates=[[1, 2], [1, 3]],keep_date_col=True)
By default, the parser removes the component date columns, but you can choose to keep them via the "keep_dates" column.
df = pd.read_csv('tmp.csv', header=None, parse_dates=[[1, 2], [1, 3]],keep_date_col=True)
1_2 1_3 0 1 2 3 4
0 1999-01-27 19:00:00 1999-01-27 18:56:00 KORD 19990127 19:00:00 18:56:00 0.81
1 1999-01-27 20:00:00 1999-01-27 19:56:00 KORD 19990127 20:00:00 19:56:00 0.01
2 1999-01-27 21:00:00 1999-01-27 20:56:00 KORD 19990127 21:00:00 20:56:00 -0.59
3 1999-01-27 21:00:00 1999-01-27 21:18:00 KORD 19990127 21:00:00 21:18:00 -0.99
4 1999-01-27 22:00:00 1999-01-27 21:56:00 KORD 19990127 22:00:00 21:56:00 -0.59
5 1999-01-27 23:00:00 1999-01-27 22:56:00 KORD 19990127 23:00:00 22:56:00 -0.59
Note that if you want to combine multiple columns into a single date column, you must use a nested list. In other words, parse_dates=[1, 2] means that the second and third columns should be parsed as separate date columns, and parse_dates=[[1, 2]] means that the two columns should be parsed as a single column.
date_spec = {'nominal': [1, 2], 'actual': [1, 3]}
df = pd.read_csv('tmp.csv', header=None, parse_dates=date_spec)
nominal actual 0 4
0 1999-01-27 19:00:00 1999-01-27 18:56:00 KORD 0.81
1 1999-01-27 20:00:00 1999-01-27 19:56:00 KORD 0.01
2 1999-01-27 21:00:00 1999-01-27 20:56:00 KORD -0.59
3 1999-01-27 21:00:00 1999-01-27 21:18:00 KORD -0.99
4 1999-01-27 22:00:00 1999-01-27 21:56:00 KORD -0.59
5 1999-01-27 23:00:00 1999-01-27 22:56:00 KORD -0.59
date_spec = {'nominal': [1, 2], 'actual': [1, 3]}
It is important to remember that if multiple text columns are to be parsed into a single date column, the data will be preceded by a new column. The index column specification is based on this new set of columns, not the original data columns.
df = pd.read_csv('tmp.csv', header=None, parse_dates=date_spec,index_col=0)
actual 0 4
nominal
1999-01-27 19:00:00 1999-01-27 18:56:00 KORD 0.81
1999-01-27 20:00:00 1999-01-27 19:56:00 kord 0.01
1999-01-27 21:00:00 1999-01-27 20:56:00 kord -0.59
1999-01-27 21:00:00 1999-01-27 21:18:00 kord -0.99
1999-01-27 22:00:00 1999-01-27 21:56:00 kord -0.59
1999-01-27 23:00:00 1999-01-27 22:56:00 KORD -0.59
You can define your own time and date parsing function to ensure data flexibility
df = pd.read_csv('tmp.csv', header=None, parse_dates=date_spec,date_parser=pd.io.date_converters.parse_date_time)
nominal actual 0 4
0 1999-01-27 19:00:00 1999-01-27 18:56:00 KORD 0.81
1 1999-01-27 20:00:00 1999-01-27 19:56:00 KORD 0.01
2 1999-01-27 21:00:00 1999-01-27 20:56:00 KORD -0.59
3 1999-01-27 21:00:00 1999-01-27 21:18:00 KORD -0.99
4 1999-01-27 22:00:00 1999-01-27 21:56:00 KORD -0.59
5 1999-01-27 23:00:00 1999-01-27 22:56:00 KORD -0.59
If you know t
最新
-
nginxです。[emerg] 0.0.0.0:80 への bind() に失敗しました (98: アドレスは既に使用中です)
-
htmlページでギリシャ文字を使うには
-
ピュアhtml+cssでの要素読み込み効果
-
純粋なhtml + cssで五輪を実現するサンプルコード
-
ナビゲーションバー・ドロップダウンメニューのHTML+CSSサンプルコード
-
タイピング効果を実現するピュアhtml+css
-
htmlの選択ボックスのプレースホルダー作成に関する質問
-
html css3 伸縮しない 画像表示効果
-
トップナビゲーションバーメニュー作成用HTML+CSS
-
html+css 実装 サイバーパンク風ボタン
おすすめ
-
ハートビート・エフェクトのためのHTML+CSS
-
HTML ホテル フォームによるフィルタリング
-
HTML+cssのボックスモデル例(円、半円など)「border-radius」使いやすい
-
HTMLテーブルのテーブル分割とマージ(colspan, rowspan)
-
ランダム・ネームドロッパーを実装するためのhtmlサンプルコード
-
Html階層型ボックスシャドウ効果サンプルコード
-
QQの一時的なダイアログボックスをポップアップし、友人を追加せずにオンラインで話す効果を達成する方法
-
sublime / vscodeショートカットHTMLコード生成の実装
-
HTMLページを縮小した後にスクロールバーを表示するサンプルコード
-
html のリストボックス、テキストフィールド、ファイルフィールドのコード例