Friday, September 28, 2018

Reformat time string in R with 12M 12N

Sometimes time string has variant format, not well-defined military time format (HH:MM). Exceptions include: 12M (12 o'clock middle night), 12N (12 noon), 9A, 1P, while better format should be 12:00A, 9:00A, 12:00P and 01:00P.
Here is function to reformat the time string:
library(lubridate)

formatTimeString<- function(aTimeStr){
    timeFlag <- tolower(str_sub(aTimeStr, start = -1))
    timeNum <- tolower(str_sub(aTimeStr, end = -2))
    stopifnot(timeFlag %in% c('m', 'a', 'n', 'p'))
   
    daypartOffset <- c('m' = -12, # '12M' become 00:00
                       'n' = 0,   # '12N' become 12:00
                       'a' = 0, 'p' = 12)
   
    timeNum <- if_else(nchar(timeNum)<=2, paste0(timeNum,'00'), timeNum)
    timeNum <- str_sub(paste0('0', timeNum), -4, -1)
    ret <- lubridate::as_datetime(timeNum, format = '%H%M')
    ret <- ret + hours(daypartOffset[timeFlag])
    ret <- format(ret, '%H:%M')
   
    return(ret)
}

testIn <- c('12M', paste0(1:11, 'A'),
             '12N', paste0(1:11, 'P'))
testthat::expect_equal(formatTimeString(testIn),
                                    str_sub(paste0('0', 0:23, ':00'), -4, -1))

Tuesday, February 13, 2018

Tow way to dedup in R

To delete duplication in raw data with dplr

# simple but lost other columns
dfRaw %>%
  distinct(`PK1`, `PK2`, `PK3`) ->
  dfWork


# tow more lines, but keep other columns, e.g. RID
dfRaw %>%
  group_by(`PK1`, `PK2`, `PK3`) %>%
  mutate(gid = 1:n()) %>%
  filter(gid < 2) ->
  dfWork

Sunday, January 21, 2018

Hotel california

Mobile data
And she said: "We are all just prisoners here of our own device"

PaaS/DMP
You can checkout any time you like, but you can never leave!"