Welcome to WuJiGu Developer Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
407 views
in Technique[技术] by (71.8m points)

r - Remove rows if string matches X but not Y

I have the following data frame:

data <- data.frame(id = c(1,2,3,4,5,6),
                   exposure = c("BMI", "BMI etc.", "BMI neuronal", "WHRadjBMI", "WHR", "BF"))
    
    id  exposure
1   1   BMI
2   2   BMI etc.
3   3   BMI neuronal
4   4   WHRadjBMI
5   5   WHR
6   6   BF

I want to remove all rows from this data frame which have "BMI" but not "adj" in the exposure column so that I can group all of the BMI related rows into a single factor level called "BMI. The real data frame is ~2500 rows by 50 columns.

Subsetting would therefore result in the following data frame, here rows 1, 2, and 3 have been removed because they contain "BMI" but do not contain "adj":

    id  exposure
4   4   WHRadjBMI
5   5   WHR
6   6   BF

I can then combine the "BMI" but not "adj" containing rows into a single factor level such that rows 1, 2, and 3 would become:

    id  exposure
1   1   BMI
2   2   BMI
3   3   BMI

I can do this final part as follows:

data$exposure <- "BMI"

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

We can use grepl to remove the rows and converge all BMI exposure values to a single BMI value.

data <- data[!grepl("BMI", data$exposure, fixed=TRUE) ||
             grepl("adj", data$exposure, fixed=TRUE), ]
data$exposure <- ifelse(grepl("BMI", data$exposure, fixed=TRUE),
                        "BMI", data$exposure)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to WuJiGu Developer Q&A Community for programmer and developer-Open, Learning and Share
...