Sean's Data Analysis Note: September 2012

Thursday, September 20, 2012

Remove duplicated rows in Knime

Goal
Delete the duplicated rows in table.

Strategy
1. Use GroupBy node, refer to
Detect and delete duplicate files (rows) based on two variable identifiers (date and lot number) and not based on row ID
2. Use Database Query. SELECT DISTINCT
3. Use code snippet in scripts: R, JPyhton, and Java to deal the table

Comment
1. Delete empty rows is a special case of this topic.
2. Two rows are duplicated at all the columns, or only on several columns (combination key)
3. S2, and S3 are not tested