How to find duplicates in a column

Searching for duplicate values in a column can be done using cat, csvcols, sort and csvfind. Here's the basic algorithm from the command line or Bash script.

for each line of your CSV file
- extract the value in the colum
- sort for unique values
- for each unique value use csvfind to output matching rows

Here's an example Bash script looking for duplicates in dups.csv in column 2, second column (columns are counted from 1 rather than zero)

    CSV_FILE="dups.csv"
    CSV_COL_NO="2"

    csvcols -i "$CSV_FILE" -col "$CSV_COL_NO" | sort -u | while read CELL; do
	    if [ "$CELL" != "" ]; then
		    csvfind -i "$CSV_FILE" -trim-spaces -col "$CSV_COL_NO"  "${CELL}"
	    fi
    done

This would result a new CSV file with duplicates grouped together.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

find-duplicates-in-a-column.md

find-duplicates-in-a-column.md

How to find duplicates in a column

Files

find-duplicates-in-a-column.md

Latest commit

History

find-duplicates-in-a-column.md

File metadata and controls

How to find duplicates in a column