The acronym mde
stands for Missing Data Explorer, a package that is intended to make missing data exploration as smooth and easy as possible.
The goal of mde
is to ease exploration of missingness without feeling overwhelmed by syntax with particular focus on simplicity.
We can install mde
as follows:
Loading the package
get_na_counts
This provides a convenient way to show the number of missing values columnwise. It is relatively fast(tests done on about 400,000 rows, took a few microseconds.)
The above might be less useful if one would like to get the results by group. In that case, one can set grouped
to TRUE
and provide a vector of names in grouping_cols
that will be used for grouping.
test <- structure(list(Subject = structure(c(1L, 1L, 2L, 2L), .Label = c("A",
"B"), class = "factor"), res = c(NA, 1, 2, 3), ID = structure(c(1L,
1L, 2L, 2L), .Label = c("1", "2"), class = "factor")), class = "data.frame", row.names = c(NA,
-4L))
get_na_counts(test, grouped = TRUE, grouping_cols = "ID")
#> # A tibble: 2 x 3
#> ID Subject res
#> <fct> <int> <int>
#> 1 1 0 1
#> 2 2 0 0
percent_missing
This is a very simple to use but quick way to take a look at the percentage of data that is missing columnwise.
percent_missing(airquality)
#> Ozone Solar.R Wind Temp Month Day
#> 24.183007 4.575163 0.000000 0.000000 0.000000 0.000000
We can get the results by group by providing an optional grouping_cols
character vector.
percent_missing(test, grouping_cols = "Subject")
#> # A tibble: 2 x 3
#> Subject res ID
#> <fct> <dbl> <dbl>
#> 1 A 50 0
#> 2 B 0 0
To exclude some columns from the above exploration, one can provide an optional character vector in exclude_cols
.
percent_missing(airquality,exclude_cols = c("Day","Temp"))
#> Ozone Solar.R Wind Month
#> 24.183007 4.575163 0.000000 0.000000
recode_as_na
As the name might imply, this converts any value or vector of values with NA
i.e we take a value such as “missing” and convert it to R’s known handler for missing values(NA
).
To use the function out of the box(with default arguments), one simply does something like:
dummy_test <- data.frame(ID = c("A","B","B","A"),
values = c("n/a",NA,"Yes","No"))
# Convert n/a to NA
recode_as_na(dummy_test, value = "n/a")
#> Warning in recode_as_na.data.frame(dummy_test, value = "n/a"): Factor columns
#> have been converted to character
#> ID values
#> 1 A <NA>
#> 2 B <NA>
#> 3 B Yes
#> 4 A No
Great, but I want to do so for specific columns not the entire dataset. You can do this by setting subset_df
to TRUE
and providing column names to subset_cols
.
another_dummy <- data.frame(ID = 1:5, Subject = 7:11,
Change = c("missing","n/a",2:4 ))
# Only change values at the column Change
recode_as_na(another_dummy, subset_df = TRUE,
subset_cols = "Change", value = c("n/a",
"missing"))
#> Warning in recode_as_na.data.frame(another_dummy, subset_df = TRUE, subset_cols
#> = "Change", : Factor columns have been converted to character
#> ID Subject Change
#> 1 1 7 <NA>
#> 2 2 8 <NA>
#> 3 3 9 2
#> 4 4 10 3
#> 5 5 11 4
To use tidy
selection, one can do the following:
mde::recode_as_na(airquality, subset_df = TRUE,
tidy=TRUE, pattern_type="starts_with",
pattern="Solar")
#> Ozone Solar.R Wind Temp Month Day
#> 1 41 190 7.4 67 5 1
#> 2 36 118 8.0 72 5 2
#> 3 12 149 12.6 74 5 3
#> 4 18 313 11.5 62 5 4
#> 5 NA NA 14.3 56 5 5
#> 6 28 NA 14.9 66 5 6
#> 7 23 299 8.6 65 5 7
#> 8 19 99 13.8 59 5 8
#> 9 8 19 20.1 61 5 9
#> 10 NA 194 8.6 69 5 10
#> 11 7 NA 6.9 74 5 11
#> 12 16 256 9.7 69 5 12
#> 13 11 290 9.2 66 5 13
#> 14 14 274 10.9 68 5 14
#> 15 18 65 13.2 58 5 15
#> 16 14 334 11.5 64 5 16
#> 17 34 307 12.0 66 5 17
#> 18 6 78 18.4 57 5 18
#> 19 30 322 11.5 68 5 19
#> 20 11 44 9.7 62 5 20
#> 21 1 8 9.7 59 5 21
#> 22 11 320 16.6 73 5 22
#> 23 4 25 9.7 61 5 23
#> 24 32 92 12.0 61 5 24
#> 25 NA 66 16.6 57 5 25
#> 26 NA 266 14.9 58 5 26
#> 27 NA NA 8.0 57 5 27
#> 28 23 13 12.0 67 5 28
#> 29 45 252 14.9 81 5 29
#> 30 115 223 5.7 79 5 30
#> 31 37 279 7.4 76 5 31
#> 32 NA 286 8.6 78 6 1
#> 33 NA 287 9.7 74 6 2
#> 34 NA 242 16.1 67 6 3
#> 35 NA 186 9.2 84 6 4
#> 36 NA 220 8.6 85 6 5
#> 37 NA 264 14.3 79 6 6
#> 38 29 127 9.7 82 6 7
#> 39 NA 273 6.9 87 6 8
#> 40 71 291 13.8 90 6 9
#> 41 39 323 11.5 87 6 10
#> 42 NA 259 10.9 93 6 11
#> 43 NA 250 9.2 92 6 12
#> 44 23 148 8.0 82 6 13
#> 45 NA 332 13.8 80 6 14
#> 46 NA 322 11.5 79 6 15
#> 47 21 191 14.9 77 6 16
#> 48 37 284 20.7 72 6 17
#> 49 20 37 9.2 65 6 18
#> 50 12 120 11.5 73 6 19
#> 51 13 137 10.3 76 6 20
#> 52 NA 150 6.3 77 6 21
#> 53 NA 59 1.7 76 6 22
#> 54 NA 91 4.6 76 6 23
#> 55 NA 250 6.3 76 6 24
#> 56 NA 135 8.0 75 6 25
#> 57 NA 127 8.0 78 6 26
#> 58 NA 47 10.3 73 6 27
#> 59 NA 98 11.5 80 6 28
#> 60 NA 31 14.9 77 6 29
#> 61 NA 138 8.0 83 6 30
#> 62 135 269 4.1 84 7 1
#> 63 49 248 9.2 85 7 2
#> 64 32 236 9.2 81 7 3
#> 65 NA 101 10.9 84 7 4
#> 66 64 175 4.6 83 7 5
#> 67 40 314 10.9 83 7 6
#> 68 77 276 5.1 88 7 7
#> 69 97 267 6.3 92 7 8
#> 70 97 272 5.7 92 7 9
#> 71 85 175 7.4 89 7 10
#> 72 NA 139 8.6 82 7 11
#> 73 10 264 14.3 73 7 12
#> 74 27 175 14.9 81 7 13
#> 75 NA 291 14.9 91 7 14
#> 76 7 48 14.3 80 7 15
#> 77 48 260 6.9 81 7 16
#> 78 35 274 10.3 82 7 17
#> 79 61 285 6.3 84 7 18
#> 80 79 187 5.1 87 7 19
#> 81 63 220 11.5 85 7 20
#> 82 16 7 6.9 74 7 21
#> 83 NA 258 9.7 81 7 22
#> 84 NA 295 11.5 82 7 23
#> 85 80 294 8.6 86 7 24
#> 86 108 223 8.0 85 7 25
#> 87 20 81 8.6 82 7 26
#> 88 52 82 12.0 86 7 27
#> 89 82 213 7.4 88 7 28
#> 90 50 275 7.4 86 7 29
#> 91 64 253 7.4 83 7 30
#> 92 59 254 9.2 81 7 31
#> 93 39 83 6.9 81 8 1
#> 94 9 24 13.8 81 8 2
#> 95 16 77 7.4 82 8 3
#> 96 78 NA 6.9 86 8 4
#> 97 35 NA 7.4 85 8 5
#> 98 66 NA 4.6 87 8 6
#> 99 122 255 4.0 89 8 7
#> 100 89 229 10.3 90 8 8
#> 101 110 207 8.0 90 8 9
#> 102 NA 222 8.6 92 8 10
#> 103 NA 137 11.5 86 8 11
#> 104 44 192 11.5 86 8 12
#> 105 28 273 11.5 82 8 13
#> 106 65 157 9.7 80 8 14
#> 107 NA 64 11.5 79 8 15
#> 108 22 71 10.3 77 8 16
#> 109 59 51 6.3 79 8 17
#> 110 23 115 7.4 76 8 18
#> 111 31 244 10.9 78 8 19
#> 112 44 190 10.3 78 8 20
#> 113 21 259 15.5 77 8 21
#> 114 9 36 14.3 72 8 22
#> 115 NA 255 12.6 75 8 23
#> 116 45 212 9.7 79 8 24
#> 117 168 238 3.4 81 8 25
#> 118 73 215 8.0 86 8 26
#> 119 NA 153 5.7 88 8 27
#> 120 76 203 9.7 97 8 28
#> 121 118 225 2.3 94 8 29
#> 122 84 237 6.3 96 8 30
#> 123 85 188 6.3 94 8 31
#> 124 96 167 6.9 91 9 1
#> 125 78 197 5.1 92 9 2
#> 126 73 183 2.8 93 9 3
#> 127 91 189 4.6 93 9 4
#> 128 47 95 7.4 87 9 5
#> 129 32 92 15.5 84 9 6
#> 130 20 252 10.9 80 9 7
#> 131 23 220 10.3 78 9 8
#> 132 21 230 10.9 75 9 9
#> 133 24 259 9.7 73 9 10
#> 134 44 236 14.9 81 9 11
#> 135 21 259 15.5 76 9 12
#> 136 28 238 6.3 77 9 13
#> 137 9 24 10.9 71 9 14
#> 138 13 112 11.5 71 9 15
#> 139 46 237 6.9 78 9 16
#> 140 18 224 13.8 67 9 17
#> 141 13 27 10.3 76 9 18
#> 142 24 238 10.3 68 9 19
#> 143 16 201 8.0 82 9 20
#> 144 13 238 12.6 64 9 21
#> 145 23 14 9.2 71 9 22
#> 146 36 139 10.3 81 9 23
#> 147 7 49 10.3 69 9 24
#> 148 14 20 16.6 63 9 25
#> 149 30 193 6.9 70 9 26
#> 150 NA 145 13.2 77 9 27
#> 151 14 191 14.3 75 9 28
#> 152 18 131 8.0 76 9 29
#> 153 20 223 11.5 68 9 30
sort_by_missingness
This provides a very simple but relatively fast way to sort variables by missingness. Unless otherwise stated, this does not currently support arranging grouped percents.
Usage:
sort_by_missingness(airquality, sort_by = "counts")
#> variable count
#> 1 Wind 0
#> 2 Temp 0
#> 3 Month 0
#> 4 Day 0
#> 5 Solar.R 7
#> 6 Ozone 37
# sort in descending order
sort_by_missingness(airquality, sort_by = "counts",
descend = TRUE)
#> variable count
#> 1 Ozone 37
#> 2 Solar.R 7
#> 3 Wind 0
#> 4 Temp 0
#> 5 Month 0
#> 6 Day 0
# Use percents
sort_by_missingness(airquality, sort_by = "percents")
#> variable percent
#> 1 Wind 0.000000
#> 2 Temp 0.000000
#> 3 Month 0.000000
#> 4 Day 0.000000
#> 5 Solar.R 4.575163
#> 6 Ozone 24.183007
recode_na_as
Sometimes, for whatever reason one would like to replace NA
s with whatever value they would like. recode_na_as
provides a very simple way to do just that.
# defaults
head(recode_na_as(airquality))
#> Ozone Solar.R Wind Temp Month Day
#> 1 41 190 7.4 67 5 1
#> 2 36 118 8.0 72 5 2
#> 3 12 149 12.6 74 5 3
#> 4 18 313 11.5 62 5 4
#> 5 0 0 14.3 56 5 5
#> 6 28 0 14.9 66 5 6
To use a different value,
head(recode_na_as(airquality, value=NaN))
#> Ozone Solar.R Wind Temp Month Day
#> 1 41 190 7.4 67 5 1
#> 2 36 118 8.0 72 5 2
#> 3 12 149 12.6 74 5 3
#> 4 18 313 11.5 62 5 4
#> 5 NaN NaN 14.3 56 5 5
#> 6 28 NaN 14.9 66 5 6
As a “bonus”, you can manipulate the data only at specific columns as shown here:
head(recode_na_as(airquality, value=0, subset_df=TRUE, subset_cols="Ozone"))
#> Ozone Solar.R Wind Temp Month Day
#> 1 41 190 7.4 67 5 1
#> 2 36 118 8.0 72 5 2
#> 3 12 149 12.6 74 5 3
#> 4 18 313 11.5 62 5 4
#> 5 0 NA 14.3 56 5 5
#> 6 28 NA 14.9 66 5 6
The above also supports tidy
selection as follows:
head(mde::recode_na_as(airquality, subset_df=TRUE, tidy=TRUE,
value=0, pattern_type="starts_with",
pattern="solar",ignore.case=TRUE))
#> Ozone Solar.R Wind Temp Month Day
#> 1 41 190 7.4 67 5 1
#> 2 36 118 8.0 72 5 2
#> 3 12 149 12.6 74 5 3
#> 4 18 313 11.5 62 5 4
#> 5 NA 0 14.3 56 5 5
#> 6 28 0 14.9 66 5 6
recode_na_if
Given a data.frame
object, one can recode NA
s as another value based on a grouping variable. In the example below, we replace all NA
s in all columns with 0s if the ID is A2
or A3
some_data <- data.frame(ID=c("A1","A2","A3", "A4"),
A=c(5,NA,0,8), B=c(10,0,0,1),
C=c(1,NA,NA,25))
recode_na_if(some_data,grouping_col="ID", target_groups=c("A2","A3"),
replacement= 0)
#> # A tibble: 4 x 4
#> ID A B C
#> <fct> <dbl> <dbl> <dbl>
#> 1 A1 5 10 1
#> 2 A2 0 0 0
#> 3 A3 0 0 0
#> 4 A4 8 1 25
drop_na_if
Suppose you wanted to drop any column that has a percentage of NA
s greater than or equal to a certain value? drop_na_if
does just that.
We can drop any columns that have greater than or equal 24% of the values missing from airquality
:
drop_na_if(airquality, sign = "gteq",percent_na = 24)
#> Solar.R Wind Temp Month Day
#> 1 190 7.4 67 5 1
#> 2 118 8.0 72 5 2
#> 3 149 12.6 74 5 3
#> 4 313 11.5 62 5 4
#> 5 NA 14.3 56 5 5
#> 6 NA 14.9 66 5 6
#> 7 299 8.6 65 5 7
#> 8 99 13.8 59 5 8
#> 9 19 20.1 61 5 9
#> 10 194 8.6 69 5 10
#> 11 NA 6.9 74 5 11
#> 12 256 9.7 69 5 12
#> 13 290 9.2 66 5 13
#> 14 274 10.9 68 5 14
#> 15 65 13.2 58 5 15
#> 16 334 11.5 64 5 16
#> 17 307 12.0 66 5 17
#> 18 78 18.4 57 5 18
#> 19 322 11.5 68 5 19
#> 20 44 9.7 62 5 20
#> 21 8 9.7 59 5 21
#> 22 320 16.6 73 5 22
#> 23 25 9.7 61 5 23
#> 24 92 12.0 61 5 24
#> 25 66 16.6 57 5 25
#> 26 266 14.9 58 5 26
#> 27 NA 8.0 57 5 27
#> 28 13 12.0 67 5 28
#> 29 252 14.9 81 5 29
#> 30 223 5.7 79 5 30
#> 31 279 7.4 76 5 31
#> 32 286 8.6 78 6 1
#> 33 287 9.7 74 6 2
#> 34 242 16.1 67 6 3
#> 35 186 9.2 84 6 4
#> 36 220 8.6 85 6 5
#> 37 264 14.3 79 6 6
#> 38 127 9.7 82 6 7
#> 39 273 6.9 87 6 8
#> 40 291 13.8 90 6 9
#> 41 323 11.5 87 6 10
#> 42 259 10.9 93 6 11
#> 43 250 9.2 92 6 12
#> 44 148 8.0 82 6 13
#> 45 332 13.8 80 6 14
#> 46 322 11.5 79 6 15
#> 47 191 14.9 77 6 16
#> 48 284 20.7 72 6 17
#> 49 37 9.2 65 6 18
#> 50 120 11.5 73 6 19
#> 51 137 10.3 76 6 20
#> 52 150 6.3 77 6 21
#> 53 59 1.7 76 6 22
#> 54 91 4.6 76 6 23
#> 55 250 6.3 76 6 24
#> 56 135 8.0 75 6 25
#> 57 127 8.0 78 6 26
#> 58 47 10.3 73 6 27
#> 59 98 11.5 80 6 28
#> 60 31 14.9 77 6 29
#> 61 138 8.0 83 6 30
#> 62 269 4.1 84 7 1
#> 63 248 9.2 85 7 2
#> 64 236 9.2 81 7 3
#> 65 101 10.9 84 7 4
#> 66 175 4.6 83 7 5
#> 67 314 10.9 83 7 6
#> 68 276 5.1 88 7 7
#> 69 267 6.3 92 7 8
#> 70 272 5.7 92 7 9
#> 71 175 7.4 89 7 10
#> 72 139 8.6 82 7 11
#> 73 264 14.3 73 7 12
#> 74 175 14.9 81 7 13
#> 75 291 14.9 91 7 14
#> 76 48 14.3 80 7 15
#> 77 260 6.9 81 7 16
#> 78 274 10.3 82 7 17
#> 79 285 6.3 84 7 18
#> 80 187 5.1 87 7 19
#> 81 220 11.5 85 7 20
#> 82 7 6.9 74 7 21
#> 83 258 9.7 81 7 22
#> 84 295 11.5 82 7 23
#> 85 294 8.6 86 7 24
#> 86 223 8.0 85 7 25
#> 87 81 8.6 82 7 26
#> 88 82 12.0 86 7 27
#> 89 213 7.4 88 7 28
#> 90 275 7.4 86 7 29
#> 91 253 7.4 83 7 30
#> 92 254 9.2 81 7 31
#> 93 83 6.9 81 8 1
#> 94 24 13.8 81 8 2
#> 95 77 7.4 82 8 3
#> 96 NA 6.9 86 8 4
#> 97 NA 7.4 85 8 5
#> 98 NA 4.6 87 8 6
#> 99 255 4.0 89 8 7
#> 100 229 10.3 90 8 8
#> 101 207 8.0 90 8 9
#> 102 222 8.6 92 8 10
#> 103 137 11.5 86 8 11
#> 104 192 11.5 86 8 12
#> 105 273 11.5 82 8 13
#> 106 157 9.7 80 8 14
#> 107 64 11.5 79 8 15
#> 108 71 10.3 77 8 16
#> 109 51 6.3 79 8 17
#> 110 115 7.4 76 8 18
#> 111 244 10.9 78 8 19
#> 112 190 10.3 78 8 20
#> 113 259 15.5 77 8 21
#> 114 36 14.3 72 8 22
#> 115 255 12.6 75 8 23
#> 116 212 9.7 79 8 24
#> 117 238 3.4 81 8 25
#> 118 215 8.0 86 8 26
#> 119 153 5.7 88 8 27
#> 120 203 9.7 97 8 28
#> 121 225 2.3 94 8 29
#> 122 237 6.3 96 8 30
#> 123 188 6.3 94 8 31
#> 124 167 6.9 91 9 1
#> 125 197 5.1 92 9 2
#> 126 183 2.8 93 9 3
#> 127 189 4.6 93 9 4
#> 128 95 7.4 87 9 5
#> 129 92 15.5 84 9 6
#> 130 252 10.9 80 9 7
#> 131 220 10.3 78 9 8
#> 132 230 10.9 75 9 9
#> 133 259 9.7 73 9 10
#> 134 236 14.9 81 9 11
#> 135 259 15.5 76 9 12
#> 136 238 6.3 77 9 13
#> 137 24 10.9 71 9 14
#> 138 112 11.5 71 9 15
#> 139 237 6.9 78 9 16
#> 140 224 13.8 67 9 17
#> 141 27 10.3 76 9 18
#> 142 238 10.3 68 9 19
#> 143 201 8.0 82 9 20
#> 144 238 12.6 64 9 21
#> 145 14 9.2 71 9 22
#> 146 139 10.3 81 9 23
#> 147 49 10.3 69 9 24
#> 148 20 16.6 63 9 25
#> 149 193 6.9 70 9 26
#> 150 145 13.2 77 9 27
#> 151 191 14.3 75 9 28
#> 152 131 8.0 76 9 29
#> 153 223 11.5 68 9 30
If for whatever reason one would like to use decimals instead of percentages, then:
drop_na_if(airquality, sign="gteq",percent_na = 0.24)
#> Wind Temp Month Day
#> 1 7.4 67 5 1
#> 2 8.0 72 5 2
#> 3 12.6 74 5 3
#> 4 11.5 62 5 4
#> 5 14.3 56 5 5
#> 6 14.9 66 5 6
#> 7 8.6 65 5 7
#> 8 13.8 59 5 8
#> 9 20.1 61 5 9
#> 10 8.6 69 5 10
#> 11 6.9 74 5 11
#> 12 9.7 69 5 12
#> 13 9.2 66 5 13
#> 14 10.9 68 5 14
#> 15 13.2 58 5 15
#> 16 11.5 64 5 16
#> 17 12.0 66 5 17
#> 18 18.4 57 5 18
#> 19 11.5 68 5 19
#> 20 9.7 62 5 20
#> 21 9.7 59 5 21
#> 22 16.6 73 5 22
#> 23 9.7 61 5 23
#> 24 12.0 61 5 24
#> 25 16.6 57 5 25
#> 26 14.9 58 5 26
#> 27 8.0 57 5 27
#> 28 12.0 67 5 28
#> 29 14.9 81 5 29
#> 30 5.7 79 5 30
#> 31 7.4 76 5 31
#> 32 8.6 78 6 1
#> 33 9.7 74 6 2
#> 34 16.1 67 6 3
#> 35 9.2 84 6 4
#> 36 8.6 85 6 5
#> 37 14.3 79 6 6
#> 38 9.7 82 6 7
#> 39 6.9 87 6 8
#> 40 13.8 90 6 9
#> 41 11.5 87 6 10
#> 42 10.9 93 6 11
#> 43 9.2 92 6 12
#> 44 8.0 82 6 13
#> 45 13.8 80 6 14
#> 46 11.5 79 6 15
#> 47 14.9 77 6 16
#> 48 20.7 72 6 17
#> 49 9.2 65 6 18
#> 50 11.5 73 6 19
#> 51 10.3 76 6 20
#> 52 6.3 77 6 21
#> 53 1.7 76 6 22
#> 54 4.6 76 6 23
#> 55 6.3 76 6 24
#> 56 8.0 75 6 25
#> 57 8.0 78 6 26
#> 58 10.3 73 6 27
#> 59 11.5 80 6 28
#> 60 14.9 77 6 29
#> 61 8.0 83 6 30
#> 62 4.1 84 7 1
#> 63 9.2 85 7 2
#> 64 9.2 81 7 3
#> 65 10.9 84 7 4
#> 66 4.6 83 7 5
#> 67 10.9 83 7 6
#> 68 5.1 88 7 7
#> 69 6.3 92 7 8
#> 70 5.7 92 7 9
#> 71 7.4 89 7 10
#> 72 8.6 82 7 11
#> 73 14.3 73 7 12
#> 74 14.9 81 7 13
#> 75 14.9 91 7 14
#> 76 14.3 80 7 15
#> 77 6.9 81 7 16
#> 78 10.3 82 7 17
#> 79 6.3 84 7 18
#> 80 5.1 87 7 19
#> 81 11.5 85 7 20
#> 82 6.9 74 7 21
#> 83 9.7 81 7 22
#> 84 11.5 82 7 23
#> 85 8.6 86 7 24
#> 86 8.0 85 7 25
#> 87 8.6 82 7 26
#> 88 12.0 86 7 27
#> 89 7.4 88 7 28
#> 90 7.4 86 7 29
#> 91 7.4 83 7 30
#> 92 9.2 81 7 31
#> 93 6.9 81 8 1
#> 94 13.8 81 8 2
#> 95 7.4 82 8 3
#> 96 6.9 86 8 4
#> 97 7.4 85 8 5
#> 98 4.6 87 8 6
#> 99 4.0 89 8 7
#> 100 10.3 90 8 8
#> 101 8.0 90 8 9
#> 102 8.6 92 8 10
#> 103 11.5 86 8 11
#> 104 11.5 86 8 12
#> 105 11.5 82 8 13
#> 106 9.7 80 8 14
#> 107 11.5 79 8 15
#> 108 10.3 77 8 16
#> 109 6.3 79 8 17
#> 110 7.4 76 8 18
#> 111 10.9 78 8 19
#> 112 10.3 78 8 20
#> 113 15.5 77 8 21
#> 114 14.3 72 8 22
#> 115 12.6 75 8 23
#> 116 9.7 79 8 24
#> 117 3.4 81 8 25
#> 118 8.0 86 8 26
#> 119 5.7 88 8 27
#> 120 9.7 97 8 28
#> 121 2.3 94 8 29
#> 122 6.3 96 8 30
#> 123 6.3 94 8 31
#> 124 6.9 91 9 1
#> 125 5.1 92 9 2
#> 126 2.8 93 9 3
#> 127 4.6 93 9 4
#> 128 7.4 87 9 5
#> 129 15.5 84 9 6
#> 130 10.9 80 9 7
#> 131 10.3 78 9 8
#> 132 10.9 75 9 9
#> 133 9.7 73 9 10
#> 134 14.9 81 9 11
#> 135 15.5 76 9 12
#> 136 6.3 77 9 13
#> 137 10.9 71 9 14
#> 138 11.5 71 9 15
#> 139 6.9 78 9 16
#> 140 13.8 67 9 17
#> 141 10.3 76 9 18
#> 142 10.3 68 9 19
#> 143 8.0 82 9 20
#> 144 12.6 64 9 21
#> 145 9.2 71 9 22
#> 146 10.3 81 9 23
#> 147 10.3 69 9 24
#> 148 16.6 63 9 25
#> 149 6.9 70 9 26
#> 150 13.2 77 9 27
#> 151 14.3 75 9 28
#> 152 8.0 76 9 29
#> 153 11.5 68 9 30
The above also supports less than or equal to(lteq
), equal to(eq
), greater than(gt
) and less than(lt
).
To keep certain columns despite fitting the target percent_na
criteria, one can provide an optional keep_columns
character vector.
head(drop_na_if(airquality, percent_na = 24, keep_columns = "Ozone"))
#> Solar.R Wind Temp Month Day Ozone
#> 1 190 7.4 67 5 1 41
#> 2 118 8.0 72 5 2 36
#> 3 149 12.6 74 5 3 12
#> 4 313 11.5 62 5 4 18
#> 5 NA 14.3 56 5 5 NA
#> 6 NA 14.9 66 5 6 28
Compare the above result to the following:
head(drop_na_if(airquality, percent_na = 24))
#> Solar.R Wind Temp Month Day
#> 1 190 7.4 67 5 1
#> 2 118 8.0 72 5 2
#> 3 149 12.6 74 5 3
#> 4 313 11.5 62 5 4
#> 5 NA 14.3 56 5 5
#> 6 NA 14.9 66 5 6
For more information, please see the documentation for drop_na_if
especially for grouping support.
drop_na_at
This provides a simple way to drop missing values only at specific columns. It currently only returns those columns with their missing values removed. See usage below. Further details are given in the documentation. It is currently case sensitive.
drop_na_at(airquality,pattern_type = "starts_with","O")
#> Ozone
#> 1 41
#> 2 36
#> 3 12
#> 4 18
#> 5 28
#> 6 23
#> 7 19
#> 8 8
#> 9 7
#> 10 16
#> 11 11
#> 12 14
#> 13 18
#> 14 14
#> 15 34
#> 16 6
#> 17 30
#> 18 11
#> 19 1
#> 20 11
#> 21 4
#> 22 32
#> 23 23
#> 24 45
#> 25 115
#> 26 37
#> 27 29
#> 28 71
#> 29 39
#> 30 23
#> 31 21
#> 32 37
#> 33 20
#> 34 12
#> 35 13
#> 36 135
#> 37 49
#> 38 32
#> 39 64
#> 40 40
#> 41 77
#> 42 97
#> 43 97
#> 44 85
#> 45 10
#> 46 27
#> 47 7
#> 48 48
#> 49 35
#> 50 61
#> 51 79
#> 52 63
#> 53 16
#> 54 80
#> 55 108
#> 56 20
#> 57 52
#> 58 82
#> 59 50
#> 60 64
#> 61 59
#> 62 39
#> 63 9
#> 64 16
#> 65 78
#> 66 35
#> 67 66
#> 68 122
#> 69 89
#> 70 110
#> 71 44
#> 72 28
#> 73 65
#> 74 22
#> 75 59
#> 76 23
#> 77 31
#> 78 44
#> 79 21
#> 80 9
#> 81 45
#> 82 168
#> 83 73
#> 84 76
#> 85 118
#> 86 84
#> 87 85
#> 88 96
#> 89 78
#> 90 73
#> 91 91
#> 92 47
#> 93 32
#> 94 20
#> 95 23
#> 96 21
#> 97 24
#> 98 44
#> 99 21
#> 100 28
#> 101 9
#> 102 13
#> 103 46
#> 104 18
#> 105 13
#> 106 24
#> 107 16
#> 108 13
#> 109 23
#> 110 36
#> 111 7
#> 112 14
#> 113 30
#> 114 14
#> 115 18
#> 116 20
recode_as_na_for
For all values greater/less/less or equal/greater or equal than some value, can I convert them to NA
?!
Yes You Can! All we have to do is use recode_as_na_for
:
recode_as_na_for(airquality,criteria="gt",value=25)
#> Ozone Solar.R Wind Temp Month Day
#> 1 NA NA 7.4 NA 5 1
#> 2 NA NA 8.0 NA 5 2
#> 3 12 NA 12.6 NA 5 3
#> 4 18 NA 11.5 NA 5 4
#> 5 NA NA 14.3 NA 5 5
#> 6 NA NA 14.9 NA 5 6
#> 7 23 NA 8.6 NA 5 7
#> 8 19 NA 13.8 NA 5 8
#> 9 8 19 20.1 NA 5 9
#> 10 NA NA 8.6 NA 5 10
#> 11 7 NA 6.9 NA 5 11
#> 12 16 NA 9.7 NA 5 12
#> 13 11 NA 9.2 NA 5 13
#> 14 14 NA 10.9 NA 5 14
#> 15 18 NA 13.2 NA 5 15
#> 16 14 NA 11.5 NA 5 16
#> 17 NA NA 12.0 NA 5 17
#> 18 6 NA 18.4 NA 5 18
#> 19 NA NA 11.5 NA 5 19
#> 20 11 NA 9.7 NA 5 20
#> 21 1 8 9.7 NA 5 21
#> 22 11 NA 16.6 NA 5 22
#> 23 4 25 9.7 NA 5 23
#> 24 NA NA 12.0 NA 5 24
#> 25 NA NA 16.6 NA 5 25
#> 26 NA NA 14.9 NA 5 NA
#> 27 NA NA 8.0 NA 5 NA
#> 28 23 13 12.0 NA 5 NA
#> 29 NA NA 14.9 NA 5 NA
#> 30 NA NA 5.7 NA 5 NA
#> 31 NA NA 7.4 NA 5 NA
#> 32 NA NA 8.6 NA 6 1
#> 33 NA NA 9.7 NA 6 2
#> 34 NA NA 16.1 NA 6 3
#> 35 NA NA 9.2 NA 6 4
#> 36 NA NA 8.6 NA 6 5
#> 37 NA NA 14.3 NA 6 6
#> 38 NA NA 9.7 NA 6 7
#> 39 NA NA 6.9 NA 6 8
#> 40 NA NA 13.8 NA 6 9
#> 41 NA NA 11.5 NA 6 10
#> 42 NA NA 10.9 NA 6 11
#> 43 NA NA 9.2 NA 6 12
#> 44 23 NA 8.0 NA 6 13
#> 45 NA NA 13.8 NA 6 14
#> 46 NA NA 11.5 NA 6 15
#> 47 21 NA 14.9 NA 6 16
#> 48 NA NA 20.7 NA 6 17
#> 49 20 NA 9.2 NA 6 18
#> 50 12 NA 11.5 NA 6 19
#> 51 13 NA 10.3 NA 6 20
#> 52 NA NA 6.3 NA 6 21
#> 53 NA NA 1.7 NA 6 22
#> 54 NA NA 4.6 NA 6 23
#> 55 NA NA 6.3 NA 6 24
#> 56 NA NA 8.0 NA 6 25
#> 57 NA NA 8.0 NA 6 NA
#> 58 NA NA 10.3 NA 6 NA
#> 59 NA NA 11.5 NA 6 NA
#> 60 NA NA 14.9 NA 6 NA
#> 61 NA NA 8.0 NA 6 NA
#> 62 NA NA 4.1 NA 7 1
#> 63 NA NA 9.2 NA 7 2
#> 64 NA NA 9.2 NA 7 3
#> 65 NA NA 10.9 NA 7 4
#> 66 NA NA 4.6 NA 7 5
#> 67 NA NA 10.9 NA 7 6
#> 68 NA NA 5.1 NA 7 7
#> 69 NA NA 6.3 NA 7 8
#> 70 NA NA 5.7 NA 7 9
#> 71 NA NA 7.4 NA 7 10
#> 72 NA NA 8.6 NA 7 11
#> 73 10 NA 14.3 NA 7 12
#> 74 NA NA 14.9 NA 7 13
#> 75 NA NA 14.9 NA 7 14
#> 76 7 NA 14.3 NA 7 15
#> 77 NA NA 6.9 NA 7 16
#> 78 NA NA 10.3 NA 7 17
#> 79 NA NA 6.3 NA 7 18
#> 80 NA NA 5.1 NA 7 19
#> 81 NA NA 11.5 NA 7 20
#> 82 16 7 6.9 NA 7 21
#> 83 NA NA 9.7 NA 7 22
#> 84 NA NA 11.5 NA 7 23
#> 85 NA NA 8.6 NA 7 24
#> 86 NA NA 8.0 NA 7 25
#> 87 20 NA 8.6 NA 7 NA
#> 88 NA NA 12.0 NA 7 NA
#> 89 NA NA 7.4 NA 7 NA
#> 90 NA NA 7.4 NA 7 NA
#> 91 NA NA 7.4 NA 7 NA
#> 92 NA NA 9.2 NA 7 NA
#> 93 NA NA 6.9 NA 8 1
#> 94 9 24 13.8 NA 8 2
#> 95 16 NA 7.4 NA 8 3
#> 96 NA NA 6.9 NA 8 4
#> 97 NA NA 7.4 NA 8 5
#> 98 NA NA 4.6 NA 8 6
#> 99 NA NA 4.0 NA 8 7
#> 100 NA NA 10.3 NA 8 8
#> 101 NA NA 8.0 NA 8 9
#> 102 NA NA 8.6 NA 8 10
#> 103 NA NA 11.5 NA 8 11
#> 104 NA NA 11.5 NA 8 12
#> 105 NA NA 11.5 NA 8 13
#> 106 NA NA 9.7 NA 8 14
#> 107 NA NA 11.5 NA 8 15
#> 108 22 NA 10.3 NA 8 16
#> 109 NA NA 6.3 NA 8 17
#> 110 23 NA 7.4 NA 8 18
#> 111 NA NA 10.9 NA 8 19
#> 112 NA NA 10.3 NA 8 20
#> 113 21 NA 15.5 NA 8 21
#> 114 9 NA 14.3 NA 8 22
#> 115 NA NA 12.6 NA 8 23
#> 116 NA NA 9.7 NA 8 24
#> 117 NA NA 3.4 NA 8 25
#> 118 NA NA 8.0 NA 8 NA
#> 119 NA NA 5.7 NA 8 NA
#> 120 NA NA 9.7 NA 8 NA
#> 121 NA NA 2.3 NA 8 NA
#> 122 NA NA 6.3 NA 8 NA
#> 123 NA NA 6.3 NA 8 NA
#> 124 NA NA 6.9 NA 9 1
#> 125 NA NA 5.1 NA 9 2
#> 126 NA NA 2.8 NA 9 3
#> 127 NA NA 4.6 NA 9 4
#> 128 NA NA 7.4 NA 9 5
#> 129 NA NA 15.5 NA 9 6
#> 130 20 NA 10.9 NA 9 7
#> 131 23 NA 10.3 NA 9 8
#> 132 21 NA 10.9 NA 9 9
#> 133 24 NA 9.7 NA 9 10
#> 134 NA NA 14.9 NA 9 11
#> 135 21 NA 15.5 NA 9 12
#> 136 NA NA 6.3 NA 9 13
#> 137 9 24 10.9 NA 9 14
#> 138 13 NA 11.5 NA 9 15
#> 139 NA NA 6.9 NA 9 16
#> 140 18 NA 13.8 NA 9 17
#> 141 13 NA 10.3 NA 9 18
#> 142 24 NA 10.3 NA 9 19
#> 143 16 NA 8.0 NA 9 20
#> 144 13 NA 12.6 NA 9 21
#> 145 23 14 9.2 NA 9 22
#> 146 NA NA 10.3 NA 9 23
#> 147 7 NA 10.3 NA 9 24
#> 148 14 20 16.6 NA 9 25
#> 149 NA NA 6.9 NA 9 NA
#> 150 NA NA 13.2 NA 9 NA
#> 151 14 NA 14.3 NA 9 NA
#> 152 18 NA 8.0 NA 9 NA
#> 153 20 NA 11.5 NA 9 NA
To do so at specific columns, pass an optional subset_cols
character vector:
recode_as_na_for(airquality, value=25,subset_cols="Solar.R",
criteria="gt")
#> Ozone Solar.R Wind Temp Month Day
#> 1 41 NA 7.4 67 5 1
#> 2 36 NA 8.0 72 5 2
#> 3 12 NA 12.6 74 5 3
#> 4 18 NA 11.5 62 5 4
#> 5 NA NA 14.3 56 5 5
#> 6 28 NA 14.9 66 5 6
#> 7 23 NA 8.6 65 5 7
#> 8 19 NA 13.8 59 5 8
#> 9 8 19 20.1 61 5 9
#> 10 NA NA 8.6 69 5 10
#> 11 7 NA 6.9 74 5 11
#> 12 16 NA 9.7 69 5 12
#> 13 11 NA 9.2 66 5 13
#> 14 14 NA 10.9 68 5 14
#> 15 18 NA 13.2 58 5 15
#> 16 14 NA 11.5 64 5 16
#> 17 34 NA 12.0 66 5 17
#> 18 6 NA 18.4 57 5 18
#> 19 30 NA 11.5 68 5 19
#> 20 11 NA 9.7 62 5 20
#> 21 1 8 9.7 59 5 21
#> 22 11 NA 16.6 73 5 22
#> 23 4 25 9.7 61 5 23
#> 24 32 NA 12.0 61 5 24
#> 25 NA NA 16.6 57 5 25
#> 26 NA NA 14.9 58 5 26
#> 27 NA NA 8.0 57 5 27
#> 28 23 13 12.0 67 5 28
#> 29 45 NA 14.9 81 5 29
#> 30 115 NA 5.7 79 5 30
#> 31 37 NA 7.4 76 5 31
#> 32 NA NA 8.6 78 6 1
#> 33 NA NA 9.7 74 6 2
#> 34 NA NA 16.1 67 6 3
#> 35 NA NA 9.2 84 6 4
#> 36 NA NA 8.6 85 6 5
#> 37 NA NA 14.3 79 6 6
#> 38 29 NA 9.7 82 6 7
#> 39 NA NA 6.9 87 6 8
#> 40 71 NA 13.8 90 6 9
#> 41 39 NA 11.5 87 6 10
#> 42 NA NA 10.9 93 6 11
#> 43 NA NA 9.2 92 6 12
#> 44 23 NA 8.0 82 6 13
#> 45 NA NA 13.8 80 6 14
#> 46 NA NA 11.5 79 6 15
#> 47 21 NA 14.9 77 6 16
#> 48 37 NA 20.7 72 6 17
#> 49 20 NA 9.2 65 6 18
#> 50 12 NA 11.5 73 6 19
#> 51 13 NA 10.3 76 6 20
#> 52 NA NA 6.3 77 6 21
#> 53 NA NA 1.7 76 6 22
#> 54 NA NA 4.6 76 6 23
#> 55 NA NA 6.3 76 6 24
#> 56 NA NA 8.0 75 6 25
#> 57 NA NA 8.0 78 6 26
#> 58 NA NA 10.3 73 6 27
#> 59 NA NA 11.5 80 6 28
#> 60 NA NA 14.9 77 6 29
#> 61 NA NA 8.0 83 6 30
#> 62 135 NA 4.1 84 7 1
#> 63 49 NA 9.2 85 7 2
#> 64 32 NA 9.2 81 7 3
#> 65 NA NA 10.9 84 7 4
#> 66 64 NA 4.6 83 7 5
#> 67 40 NA 10.9 83 7 6
#> 68 77 NA 5.1 88 7 7
#> 69 97 NA 6.3 92 7 8
#> 70 97 NA 5.7 92 7 9
#> 71 85 NA 7.4 89 7 10
#> 72 NA NA 8.6 82 7 11
#> 73 10 NA 14.3 73 7 12
#> 74 27 NA 14.9 81 7 13
#> 75 NA NA 14.9 91 7 14
#> 76 7 NA 14.3 80 7 15
#> 77 48 NA 6.9 81 7 16
#> 78 35 NA 10.3 82 7 17
#> 79 61 NA 6.3 84 7 18
#> 80 79 NA 5.1 87 7 19
#> 81 63 NA 11.5 85 7 20
#> 82 16 7 6.9 74 7 21
#> 83 NA NA 9.7 81 7 22
#> 84 NA NA 11.5 82 7 23
#> 85 80 NA 8.6 86 7 24
#> 86 108 NA 8.0 85 7 25
#> 87 20 NA 8.6 82 7 26
#> 88 52 NA 12.0 86 7 27
#> 89 82 NA 7.4 88 7 28
#> 90 50 NA 7.4 86 7 29
#> 91 64 NA 7.4 83 7 30
#> 92 59 NA 9.2 81 7 31
#> 93 39 NA 6.9 81 8 1
#> 94 9 24 13.8 81 8 2
#> 95 16 NA 7.4 82 8 3
#> 96 78 NA 6.9 86 8 4
#> 97 35 NA 7.4 85 8 5
#> 98 66 NA 4.6 87 8 6
#> 99 122 NA 4.0 89 8 7
#> 100 89 NA 10.3 90 8 8
#> 101 110 NA 8.0 90 8 9
#> 102 NA NA 8.6 92 8 10
#> 103 NA NA 11.5 86 8 11
#> 104 44 NA 11.5 86 8 12
#> 105 28 NA 11.5 82 8 13
#> 106 65 NA 9.7 80 8 14
#> 107 NA NA 11.5 79 8 15
#> 108 22 NA 10.3 77 8 16
#> 109 59 NA 6.3 79 8 17
#> 110 23 NA 7.4 76 8 18
#> 111 31 NA 10.9 78 8 19
#> 112 44 NA 10.3 78 8 20
#> 113 21 NA 15.5 77 8 21
#> 114 9 NA 14.3 72 8 22
#> 115 NA NA 12.6 75 8 23
#> 116 45 NA 9.7 79 8 24
#> 117 168 NA 3.4 81 8 25
#> 118 73 NA 8.0 86 8 26
#> 119 NA NA 5.7 88 8 27
#> 120 76 NA 9.7 97 8 28
#> 121 118 NA 2.3 94 8 29
#> 122 84 NA 6.3 96 8 30
#> 123 85 NA 6.3 94 8 31
#> 124 96 NA 6.9 91 9 1
#> 125 78 NA 5.1 92 9 2
#> 126 73 NA 2.8 93 9 3
#> 127 91 NA 4.6 93 9 4
#> 128 47 NA 7.4 87 9 5
#> 129 32 NA 15.5 84 9 6
#> 130 20 NA 10.9 80 9 7
#> 131 23 NA 10.3 78 9 8
#> 132 21 NA 10.9 75 9 9
#> 133 24 NA 9.7 73 9 10
#> 134 44 NA 14.9 81 9 11
#> 135 21 NA 15.5 76 9 12
#> 136 28 NA 6.3 77 9 13
#> 137 9 24 10.9 71 9 14
#> 138 13 NA 11.5 71 9 15
#> 139 46 NA 6.9 78 9 16
#> 140 18 NA 13.8 67 9 17
#> 141 13 NA 10.3 76 9 18
#> 142 24 NA 10.3 68 9 19
#> 143 16 NA 8.0 82 9 20
#> 144 13 NA 12.6 64 9 21
#> 145 23 14 9.2 71 9 22
#> 146 36 NA 10.3 81 9 23
#> 147 7 NA 10.3 69 9 24
#> 148 14 20 16.6 63 9 25
#> 149 30 NA 6.9 70 9 26
#> 150 NA NA 13.2 77 9 27
#> 151 14 NA 14.3 75 9 28
#> 152 18 NA 8.0 76 9 29
#> 153 20 NA 11.5 68 9 30
To raise an issue, please do so here
Thank you, feedback is always welcome :)