Sean's Data Analysis Note: February 2018

Tuesday, February 13, 2018

Tow way to dedup in R

To delete duplication in raw data with dplr

# simple but lost other columns
dfRaw %>%
distinct(`PK1`, `PK2`, `PK3`) ->
dfWork

# tow more lines, but keep other columns, e.g. RID
dfRaw %>%
group_by(`PK1`, `PK2`, `PK3`) %>%
mutate(gid = 1:n()) %>%
filter(gid < 2) ->
dfWork