|
Method
|
Pro
|
Con
|
Funcitons
|
|
Image the object in binary format
|
Fast, can keep object name and other environment information
|
R specific
|
save(df, file= "filename")
rm(df)
load("filename", .GlobalEnv)
|
|
Save in coded text
|
Full information, e.g. data mode
|
Size is big, can not exchange with other software
|
dump(c("df"),"filename")
newDf = source(“filename")$value
|
|
Export to plain text
|
Human readable, and software exchangeable
|
May need to recast R types when read in
|
write.table()
read.table()
|
|
Export to other format
|
Software specific
|
Software specific
|
Write.X()
Read.X(), where X can be spss, sas,csv, excel
|
Wednesday, June 19, 2013
Methods to save data frame in file in R
When you try to "save" your data set in a data frame object in R, you have several options:
Friday, January 18, 2013
Data Mining: Best Buy mobile web log
Ongoing project
Outline: finished part
===================================
Objective of the project
Data features:Query/Product Data
Data exploring
Descriptive statistics
Survival analysis
Models
Collaborative Filtering
Summary
Download slide: https://docs.google.com/presentation/d/1i37NWAxcqnsETR4a9jNJaqvhLn90FkpbFKJDjxE70zU/edit
Outline: finished part
===================================
Objective of the project
Data features:Query/Product Data
Data exploring
Descriptive statistics
Survival analysis
Models
Collaborative Filtering
Summary
Download slide: https://docs.google.com/presentation/d/1i37NWAxcqnsETR4a9jNJaqvhLn90FkpbFKJDjxE70zU/edit
Wednesday, November 7, 2012
Generate random permutation in R
Generate random permutation in R
start = 1
end = 10
seq = c(start, end)
sample(seq,length(seq), replace=FALSE)
Equivalent MathLab: X = randperm(End-Begin+1)+Begin
example: [1] 5 1 2 8 6 4 3 7 9 10
start = 1
end = 10
seq = c(start, end)
sample(seq,length(seq), replace=FALSE)
Equivalent MathLab: X = randperm(End-Begin+1)+Begin
example: [1] 5 1 2 8 6 4 3 7 9 10
Thursday, September 20, 2012
Remove duplicated rows in Knime
Goal
Delete the duplicated rows in table.
Strategy
1. Use GroupBy node, refer to
Detect and delete duplicate files (rows) based on two variable identifiers (date and lot number) and not based on row ID
2. Use Database Query. SELECT DISTINCT
3. Use code snippet in scripts: R, JPyhton, and Java to deal the table
Comment
1. Delete empty rows is a special case of this topic.
2. Two rows are duplicated at all the columns, or only on several columns (combination key)
3. S2, and S3 are not tested
Saturday, July 21, 2012
Regression Analysis of Airline’s Incidents
Term Research Project in SAS and Statistic Consulting
Title: Regression Analysis of Airline’s Incidents
Outline
============================================
Download slide: https://docs.google.com/open?id=0Bw64rMSoJR_ZUjl6Q2NkODNmRFE
Title: Regression Analysis of Airline’s Incidents
Outline
============================================
Introduction
Data
Analysis
Simple linear regression
Variance stabilizing transformation
Non-linear Regression (Poisson )
Conclusion
Wednesday, July 18, 2012
SAS PROC SQL enhancement to ANSI
The difference between PROC SQL and ANSI SQL is documented in "PROC SQL and the ANSI Standard" (9.2 Help link), with content on:
- Compliance
- SQL Procedure Enhancements
- SQL Procedure Omissions
Useful enhancements include:
- Format, Informat, Label
- Contain
- Calculated
Monday, July 9, 2012
Pricing scheme of Group-buy Websites in China
Term Project for Applied Statistics
Outline
============================================
Here
Outline
============================================
Background: Group Buying/ C2C
Data set: Public production data
Method: Descriptive/ Inference/ Diagnostics
Analysis: Preparing/Visualization/2 Models
Conclusion: Competition/Market
Data set: Public production data
Method: Descriptive/ Inference/ Diagnostics
Analysis: Preparing/Visualization/2 Models
Conclusion: Competition/Market
Here
Subscribe to:
Posts (Atom)