Image the following structure of a dataset.
DF <- expand.grid(car=c("BMW","TESLA", "Mercedes"),
id=c("id1","id2","id3"),
id2=c("idA","idB"),
color_blue=c(0,1),
color_red=c(0,1),
color_black=c(0,1,2),
color_white=c(0,1),
tech_radio=c(0,1),
comf_heat=c(0,1),
stringsAsFactors=TRUE)
expand.grid gives a dataset with every combination, which serves my purpose here. Combinations such as colour_blue=1 and colour_red=1 are possible, which I want to split up when they occur.
I want to go from here:
car id id2 color_blue color_red color_black color_white tech_radio comf_heat
BMW id1 idA 1 1 1 0 1 2
to there
car id id2 color_blue color_red color_black color_white tech_radio comf_heat
BMW_blue id1 idA 1 0 0 0 1 2
BMW_red id1 idA 0 1 0 0 1 2
BMW_black id1 idA 0 0 1 0 1 2
In effect two things shall happen:
- adding rows as sort-of-duplicates IF certain similarly named variables (not a range, as that might change) > 0
- rename value of "car"-variable by certain part of that one variable that is kept.
I know there maybe a lot of pipe-using solutions with dplyr or tidyverse or so around. As I am not using those, I am very unfamiliar with them and will have a (harder) time to apply them to my data. But in the end: any solution will be progress.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…