Welcome to WuJiGu Developer Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
193 views
in Technique[技术] by (71.8m points)

r - Missing cases while using summarise(across())

I have data.frame that looks like this:

enter image description here

I want to quickly reshape it so I will only one record for each ID, something that is looks like this:

enter image description here

df can be build using codes:

df<-structure(list(ID = structure(c("05-102", "05-102", "05-102", 
"01-103", "01-103", "01-103", "08-104", "08-104", "08-104", "05-105", 
"05-105", "05-105", "02-106", "02-106", "02-106", "05-107", "05-107", 
"05-107", "08-108", "08-108", "08-108", "02-109", "02-109", "02-109", 
"05-111", "05-111", "05-111", "07-115", "07-115", "07-115"), label = "Unique Subject Identifier", format.sas = "$"), 
    EXSTDTC1 = structure(c(NA, NA, NA, 17022, NA, NA, 17024, 
    NA, NA, 17032, NA, NA, 17038, NA, NA, 17092, NA, NA, 17108, 
    NA, NA, 17155, NA, NA, 17247, NA, NA, 17333, NA, NA), class = "Date"), 
    EXSTDTC6 = structure(c(NA, 16885, NA, NA, NA, 17031, NA, 
    NA, 17032, NA, NA, 17041, NA, NA, 17047, NA, NA, 17100, NA, 
    NA, 17116, NA, 17164, NA, NA, NA, 17256, NA, 17342, NA), class = "Date"), 
    EXSTDTC3 = structure(c(NA, NA, 16881, NA, 17027, NA, NA, 
    17029, NA, NA, 17037, NA, NA, 17043, NA, NA, 17097, NA, NA, 
    17113, NA, NA, NA, 17160, NA, 17252, NA, NA, NA, 17338), class = "Date"), 
    EXDOSEA1 = c("73.8+147.6", NA, NA, "64.5+129", NA, NA, "62.7+125.4", 
    NA, NA, "114+57", NA, NA, "60+117.5", NA, NA, "48.6+97.2", 
    NA, NA, "61.2+122.4", NA, NA, "47.7+95.4", NA, NA, "51.6+103.2", 
    NA, NA, "68+136", NA, NA), EXDOSEA6 = c(NA, "100", NA, NA, 
    NA, "86", NA, NA, "83.5", NA, NA, "76", NA, NA, "39.2", NA, 
    NA, "32", NA, NA, "81.5", NA, "69.6", NA, NA, NA, "68", NA, 
    "91", NA), EXDOSEA3 = c(NA, NA, "1600", NA, "4302", NA, NA, 
    "4185", NA, NA, "3900", NA, NA, "3921", NA, NA, "3300", NA, 
    NA, "4080", NA, NA, NA, "3183", NA, "3300", NA, NA, NA, "1514"
    )), row.names = c(NA, -30L), class = c("tbl_df", "tbl", "data.frame"
))

right now I have my codes as:

df %>%
  group_by(ID) %>%
  summarise(across(EXSTDTC1:EXDOSEA3, na.omit))

But it seems remove the 05-102 as it did not have value on EXSTDTC1. I would like to see how we can address this. Is it possible to keep across still?

Many thanks.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

We could use an if/else condition to address those cases where there is only NA

library(dplyr)
df %>%
  group_by(ID) %>%
  summarise(across(EXSTDTC1:EXDOSEA3,
        ~ if(all(is.na(.))) NA else .[complete.cases(.)]), .groups = 'drop')

-output

# A tibble: 10 x 7
#   ID     EXSTDTC1   EXSTDTC6   EXSTDTC3   EXDOSEA1   EXDOSEA6 EXDOSEA3
#   <chr>  <date>     <date>     <date>     <chr>      <chr>    <chr>   
# 1 01-103 2016-08-09 2016-08-18 2016-08-14 64.5+129   86       4302    
# 2 02-106 2016-08-25 2016-09-03 2016-08-30 60+117.5   39.2     3921    
# 3 02-109 2016-12-20 2016-12-29 2016-12-25 47.7+95.4  69.6     3183    
# 4 05-102 NA         2016-03-25 2016-03-21 73.8+147.6 100      1600    
# 5 05-105 2016-08-19 2016-08-28 2016-08-24 114+57     76       3900    
# 6 05-107 2016-10-18 2016-10-26 2016-10-23 48.6+97.2  32       3300    
# 7 05-111 2017-03-22 2017-03-31 2017-03-27 51.6+103.2 68       3300    
# 8 07-115 2017-06-16 2017-06-25 2017-06-21 68+136     91       1514    
# 9 08-104 2016-08-11 2016-08-19 2016-08-16 62.7+125.4 83.5     4185    
#10 08-108 2016-11-03 2016-11-11 2016-11-08 61.2+122.4 81.5     4080    

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to WuJiGu Developer Q&A Community for programmer and developer-Open, Learning and Share
...