1. ホーム
  2. スクリプト・コラム
  3. DOS/BAT

重複行のフィルタリングを行うバッチ処理

2022-01-25 22:22:05

a.txtは以下のように読み取れます。
123
456
789
123
123
789

重複する行を取得し、重複する行を1回だけ残し、以下のように処理してb.txtの内容を取得します。
123
789

方法1.

@echo off
REM Disadvantage 1: Cannot handle very large files
REM Disadvantage 2: Need to mark variables with characters that do not appear in the file (in this case, underscores are used)
setlocal
for /f "delims=" %%i in (a.txt) do (
  set /a _%%i+=1
)
(for /f "tokens=1-2 delims=_=" %%i in ('set _') do (
  if %%j gtr 1 (
    echo,%%i
  )
))>b.txt
endlocal


方法2.

@echo off
setlocal enabledelayedexpansion
set "PriLine="
set "DupNum=1"
(for /f "delims=" %%i in ('sort a.txt') do (
  if ""PriLine!" equ "%%i" (
    set /a DupNum+=1
  ) else (
    if !DupNum! gtr 1 (
      echo,!PriLine!
    )
    set DupNum=1
  )
  set "PriLine=%%i"
))>b.txt
if !DupNum! gtr 1 (
  >>b.txt echo,!PriLine!
)


方法3.

gawk "{a[$0]++}END{for(i in a)if(a[i]>1)print i}" a.txt > b.txt


方法4:(より簡潔に)

gawk "a[$0]++" a.txt>b.txt


方法5.

@echo off
for /f "tokens=*" %%i in (a.txt) do set #%%i=%%i
(for /f "tokens=2 delims==" %%i in ('set #') do echo %%i)>b.txt