Dev changes for 1.2.0 (#25)

* IRV now handles missing values, and psychsyn handles NA results better (#17) * typos and renamed carelessDataset columns - typo in IRV / psacing in psychsyn - carelessDataset now has cogent column names * changed vignettes removed introduction vignette from package (now hosted online) and fixed typo with linking vignette * changed psychsyn to do a resampling psychsyn comes up with different correlation coefficients dependent on whether the item of an item pair is located at the x or y column. * psychsyn now resamples psychsyn correlation values was dependent on the placement of items of item pairs in the x and y columns. psychsyn now does multiple random (re)placements of the items. * irv now can handle missing values * Updated psychsyn to do up to 10 resamples while encountering NAs In response to Issue #16 * updated IRV to make na.rm optional * updating docs to match francisco's changes * remove vignette * adding suggestion for texlive-fonts-extra * change to travis-ci config to attempt to account for latex issues * trying a different before install command * adding one more * changing repo for texlive * trying http mirror instead * try again * idfk * still not working * another try * one more thing before bed * macos why?!? * again * trying to work around. * no cache * last try * travis is not allowed to macos anymore * feature updates for psychsyn, mahad, irv (#18) * typos and renamed carelessDataset columns - typo in IRV / psacing in psychsyn - carelessDataset now has cogent column names * changed vignettes removed introduction vignette from package (now hosted online) and fixed typo with linking vignette * changed psychsyn to do a resampling psychsyn comes up with different correlation coefficients dependent on whether the item of an item pair is located at the x or y column. * psychsyn now resamples psychsyn correlation values was dependent on the placement of items of item pairs in the x and y columns. psychsyn now does multiple random (re)placements of the items. * irv now can handle missing values * Updated psychsyn to do up to 10 resamples while encountering NAs In response to Issue #16 * updated IRV to make na.rm optional * updated mahad to handle NA properly * Update mahad.R - fixed typo fixed typo * Update .gitignore * Updates to mahaD, psychsyn (#21) * typos and renamed carelessDataset columns - typo in IRV / psacing in psychsyn - carelessDataset now has cogent column names * changed vignettes removed introduction vignette from package (now hosted online) and fixed typo with linking vignette * changed psychsyn to do a resampling psychsyn comes up with different correlation coefficients dependent on whether the item of an item pair is located at the x or y column. * psychsyn now resamples psychsyn correlation values was dependent on the placement of items of item pairs in the x and y columns. psychsyn now does multiple random (re)placements of the items. * irv now can handle missing values * Updated psychsyn to do up to 10 resamples while encountering NAs In response to Issue #16 * updated IRV to make na.rm optional * updated mahad to handle NA properly * Update mahad.R - fixed typo fixed typo * Update .gitignore * updated psychsyn handling of NA improved. Added corresponding test. Co-authored-by: fwilhelm <[email protected]> Co-authored-by: Richard Yentes <[email protected]> * changed psychsyn to default resample * Changed evenodd so it's interpreted similarly to other metrics addresses #19 * fixes to psychsyn doc for resampling * update travis.yml to skip oldrel * Cleaning up dev * Update .travis.yml * removing travis ci * Update DESCRIPTION * updating docs * changes to even-odd (#23) * changes to even-odd 1. Introduced error when calling even odd with just one factor. 2. Introduced warning when variables indicated by factor argument do not match the input data (x). 3. Use own warning for cases when NA arises because of 0 variance in even or odd vectors. 4. refactored coding using apply. * evenodd minor change added an explicit error when factors > x. Adjusted the call. parameter to be FALSE. * added warning to even-odd that computation has changed (#24) Co-authored-by: Francisco Wilhelm <[email protected]> Co-authored-by: fwilhelm <[email protected]>
ryentes · Aug 16, 2020 · a11585c · a11585c
1 parent 4956674
commit a11585c
Show file tree

Hide file tree

Showing 19 changed files with 158 additions and 57 deletions.
diff --git a/.RData b/.RData
diff --git a/.gitignore b/.gitignore
@@ -1,3 +1,8 @@
 *.Rproj
 .Rproj.user
 .Rhistory
+temp.RData
+R/.DS_Store
+.RData
+vignettes/*.html
+
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -1,8 +1,8 @@
 Package: careless
 Type: Package
 Title: Procedures for Computing Indices of Careless Responding
-Version: 1.1.3
-Date: 2018-06-19
+Version: 1.2.0
+Date: 2020-07-25
 Authors@R: c(
             person("Richard", "Yentes" , email = "[email protected]", role = c("cre", "aut")),
             person("Francisco", "Wilhelm", email = "[email protected]", role = c("aut")))
@@ -20,4 +20,4 @@ Suggests:
 Encoding: UTF-8
 LazyData: true
 VignetteBuilder: knitr
-RoxygenNote: 6.0.1
+RoxygenNote: 7.1.1
diff --git a/R/evenodd.R b/R/evenodd.R
@@ -21,8 +21,24 @@
 
 evenodd <- function(x, factors, diag = FALSE) {
   #initialize a result dataset
-  eo <- vector(length = nrow(x), mode = "numeric")
-  eo_missing <- vector(length = nrow(x), mode = "numeric")
+  warning("Computation of even-odd has changed for consistency of interpretation
+          with other indices. This change occurred in version 1.2.0. A higher 
+          score now indicates a greater likelihood of careless responding. If 
+          you have previously written code to cut score based on the output of 
+          this function, you should revise that code accordingly.")
+
+  if(length(factors) == 1) {
+    stop("You have called even-odd with only a single factor. \n The even-odd method requires multiple factors to work correctly.",
+            call. = FALSE) }
+  if(sum(factors) > ncol(x)) {
+    stop("The number of items specified by 'factors' exceeds the number of columns in 'x'.",
+            call. = FALSE) }
+  if(sum(factors) != ncol(x)) {
+    warning("The number of items specified by 'factors' does not match the number of columns in 'x'. \n Please check if this is what you want.",
+            call. = FALSE) }
+
+  # initalize empty list for persons holding the persons even scores and odd scores
+  eo_vals <- vector("list", nrow(x))
 
   # Loop through each Person
   for(i in 1:nrow(x)) {
@@ -43,17 +59,35 @@ evenodd <- function(x, factors, diag = FALSE) {
       f[j,1] <- mean(t(s[e_ind]), na.rm = TRUE)
       f[j,2] <- mean(t(s[o_ind]), na.rm = TRUE)
     }
-
-    # Calculate within-person correlation between even and odd sub-scales
-    # then apply the Spearman-Brown correction for split-half reliability
-    # and store the result in the output vector.
-    eo_missing[i] <- sum(!is.na(apply(f, 1, sum))) #number of even/odd pairs for which no comparison can be computed because of NAs
-    tmp <- stats::cor(f[,1], f[,2], use ="pairwise.complete.obs")
-    tmp <- (2*tmp)/(1+tmp)
-    if(!is.na(tmp) && tmp < -1) tmp <- -1
-    eo[i] <- tmp
-    rm(f)
+    # assign the even and odd values to eo_vals
+    eo_vals[[i]] <- f
   }
-  if(diag == FALSE) {return(eo)}
-  else {return(data.frame(eo, eo_missing))}
+
+
+  #calculate number of even/odd pairs for which no comparison can be computed because of NAs
+  eo_missing <- lapply(eo_vals, function(i) sum(!is.na(apply(i, 1, sum))))
+
+  # scan for persons for which no even-odd can be calculated when all values are same, leading to
+  # a correlation of NA because there is no variance/standard deviation.
+  eo_sdzero <-  lapply(eo_vals, function(i) apply(i, 2, stats::sd))
+  eo_sdzero <- sapply(eo_sdzero, function(i) any(i == 0))
+  if(any(eo_sdzero)) warning("One or more observations have zero variance in even and/or odd values. \nThis results in NA values for these observations.\nIncluding more factors may alleviate this issue.",
+                             call. = FALSE)
+
+  # Calculate within-person correlation between even and odd sub-scales
+  # then apply the Spearman-Brown correction for split-half reliability
+  # and store the result in the output vector.
+  eo_cor <- sapply(eo_vals, function(f) {
+    # suppressWarnings for standard deviation 0 which happens when each value-pairs is same
+    val <- suppressWarnings(stats::cor(f[, 1], f[, 2], use = "pairwise.complete.obs"))
+    val <- (2 * val) / (1 + val) #split-half
+    if (!is.na(val) && val < -1) val <- -1
+    return(val)
+  })
+
+  # transform eo  such that higher scores are indicating carelessness
+  eo_cor = 0 - eo_cor
+
+  if(diag == FALSE) {return(eo_cor)}
+  else {return(data.frame(eo_cor, eo_missing))}
 }
diff --git a/R/irv.R b/R/irv.R
@@ -8,17 +8,18 @@
 #' Marjanovic et al. (2015) propose to mark persons with \emph{high} IRV scores - reflecting highly random responses (see References).
 #'
 #' @param x a matrix of data (e.g. survey responses)
-#' @param split boolean indicating whether to additionally calculate the IRV on subsets of columns (of equal length).
+#' @param na.rm logical indicating whether to calculate the IRV for a person with missing values.
+#' @param split logical indicating whether to additionally calculate the IRV on subsets of columns (of equal length).
 #' @param num.split the number of subsets the data is to be split in.
 #' @author Francisco Wilhelm \email{[email protected]}
 #' @references
 #' Dunn, A. M., Heggestad, E. D., Shanock, L. R., & Theilgard, N. (2018).
 #' Intra-individual Response Variability as an Indicator of Insufficient Effort Responding:
 #' Comparison to Other Indicators and Relationships with Individual Differences.
 #' \emph{Journal of Business and Psychology, 33(1)}, 105-121. \doi{10.1007/s10869-016-9479-0}
-#' 
-#' Marjanovic, Z., Holden, R., Struthers, W., Cribbie, R., & Greenglass, E. (2015). 
-#' The inter-item standard deviation (ISD): An index that discriminates between conscientious and random responders. 
+#'
+#' Marjanovic, Z., Holden, R., Struthers, W., Cribbie, R., & Greenglass, E. (2015).
+#' The inter-item standard deviation (ISD): An index that discriminates between conscientious and random responders.
 #' \emph{Personality and Individual Differences}, 84, 79-83. \doi{10.1016/j.paid.2014.08.021}
 #' @export
 #' @examples
@@ -29,20 +30,21 @@
 #' irv_split <- irv(careless_dataset, split = TRUE, num.split = 4)
 #' boxplot(irv_split$irv4) #produce a boxplot of the IRV for the fourth quarter
 
-irv <- function(x, split = FALSE, num.split = 3) {
-  out <- apply(x, 1, stats::sd)
+irv <- function(x, na.rm = TRUE, split = FALSE, num.split = 3) {
+  out <- apply(x, 1, stats::sd, na.rm = na.rm)
+
   if(split == TRUE) {
     chunk <- function(x,n) split(x, cut(seq_along(x), n, labels = FALSE))
     split_x <- apply(x, 1, chunk, num.split)
     out_split <- t(replicate(nrow(x), rep(NA, num.split)))
     colnames(out_split) <- paste0("irv",1:num.split)
     for(k in 1:nrow(out_split)) {
       split_x_single <- split_x[[k]]
-      out_split[k,] <- unlist(lapply(split_x_single, stats::sd), use.names = FALSE)
+      out_split[k,] <- unlist(lapply(split_x_single, stats::sd, na.rm = na.rm), use.names = FALSE)
     }
       out_split <- data.frame(out, out_split)
       colnames(out_split)[1] <- "irvTotal"
-      return(out_split)} else {
+      return(out_split)} else { #split subsection end
       return(out)
     }
 }
diff --git a/R/longstring.R b/R/longstring.R
@@ -6,7 +6,7 @@
 #' identical responses is returned. Additionally, can return the average length of uninterrupted string of identical responses.
 #'
 #' @param x a matrix of data (e.g. item responses)
-#' @param avg a boolean indicating whether to additionally return the average length of identical consecutive responses
+#' @param avg logical indicating whether to additionally return the average length of identical consecutive responses
 #' @author Richard Yentes \email{[email protected]}, Francisco Wilhelm \email{[email protected]}
 #' @references
 #' Johnson, J. A. (2005). Ascertaining the validity of individual protocols

diff --git a/R/mahad.R b/R/mahad.R
@@ -22,11 +22,25 @@
 #' mahad_flags <- mahad(careless_dataset, flag = TRUE, confidence = 0.999) #Apply a strict criterion
 
 mahad <- function(x, plot = TRUE, flag = FALSE, confidence = 0.99, na.rm = TRUE) {
-  raw <- as.numeric(psych::outlier(x, plot, bad = 0, na.rm = na.rm))
+  if(na.rm == FALSE) {
+    if(any(is.na(x)) == TRUE) {stop("Some values are NA. Mahalanobis distance was not computed.
+                                      Use na.rm = TRUE to use available cases.", call. = FALSE)}
+    }
+  #remove rows with all NA and issue warning
+  complete.na <- apply(x, 1, function(y) { all(is.na(y)) } )
+  if(any(complete.na)) {
+    warning("Some cases contain only NA values. The Mahalanobis distance will be calculated using available cases.",
+            call. = FALSE) }
+  x_filtered <- x[!complete.na,]
+
+  maha_data <- as.numeric(psych::outlier(x_filtered, plot, bad = 0, na.rm = na.rm))
+  maha_data_merge <- rep_len(NA, nrow(x_filtered))
+  maha_data_merge[!complete.na] <- maha_data
+
   if(flag == TRUE) {
-    cut <- stats::qchisq(confidence, ncol(x))
-    flagged <- (raw > cut)
-    return(data.frame(raw, flagged))
+    cut <- stats::qchisq(confidence, length(maha_data))
+    flagged <- (maha_data_merge > cut)
+    return(data.frame(maha_data_merge, flagged))
   }
-  else{ return(raw) }
+  else{ return(maha_data_merge) }
 }
diff --git a/R/psychsyn.R b/R/psychsyn.R
@@ -15,6 +15,7 @@
 #' @param anto determines whether psychometric anonyms are returned instead of
 #' psychometric synonyms. Defaults to \code{FALSE}
 #' @param diag additionally return the number of item pairs available for each observation. Useful if dataset contains many missing values.
+#' @param resample_na if psychsyn returns NA for a respondent resample to attempt getting a non-NA result.
 #' @author Richard Yentes \email{[email protected]}, Francisco Wilhelm \email{[email protected]}
 #' @references
 #' Meade, A. W., & Craig, S. B. (2012). Identifying careless responses in survey data.
@@ -32,21 +33,22 @@
 #' synonyms <- psychsyn(careless_dataset, .60, diag = TRUE)
 #' antonyms <- psychant(careless_dataset2, .50, diag = TRUE)
 
-psychsyn <- function(x, critval=.60, anto=FALSE, diag=FALSE) {
+psychsyn <- function(x, critval=.60, anto=FALSE, diag=FALSE, resample_na=TRUE) {
   x <- as.matrix(x)
   item_pairs <- get_item_pairs(x, critval, anto)
-  synonyms <- apply(x,1,syn_for_one, item_pairs)
+
+  synonyms <- apply(x,1,syn_for_one, item_pairs, resample_na)
   synonyms_df <- as.data.frame(aperm(synonyms))
   colnames(synonyms_df) <- c("numPairs", "cor")
+
   if(diag==TRUE) { return(synonyms_df) }
   else { return(synonyms_df$cor) }
 }
 
 # Helper function that identifies psychometric synonyms in a given dataset
 get_item_pairs <- function(x, critval=.60, anto=FALSE) {
-  x <- as.matrix(x)
   critval <- abs(critval) #Dummy Proofing
-
+  
   correlations <- stats::cor(x, use = "pairwise.complete.obs")
   correlations[upper.tri(correlations, diag=TRUE)] <- NA
   correlations <- as.data.frame(as.table(correlations))
@@ -71,15 +73,32 @@ get_item_pairs <- function(x, critval=.60, anto=FALSE) {
 }
 
 # Helper function to calculate the within person correlation for a single individual
-syn_for_one <- function(x, item_pairs) {
+syn_for_one <- function(x, item_pairs, resample_na) {
   item_pairs_omit_na <- which(!(is.na(x[item_pairs[,1]]) | is.na(x[item_pairs[,2]])))
   sum_item_pairs <- length(item_pairs_omit_na)
-
+  #only execute if more than two item pairs
   if(sum_item_pairs > 2) {
-      itemvalues <- cbind(as.numeric(x[as.numeric(item_pairs[,1])]), as.numeric(x[as.numeric(item_pairs[,2])]))
-      synvalue <- suppressWarnings(stats::cor(itemvalues, use = "pairwise.complete.obs", method = "pearson")[1,2])
+    itemvalues <- cbind(as.numeric(x[as.numeric(item_pairs[,1])]), as.numeric(x[as.numeric(item_pairs[,2])]))
+
+    # helper that calculates within-person correlation
+    psychsyn_cor <- function(x) {
+      suppressWarnings(stats::cor(x, use = "pairwise.complete.obs", method = "pearson")[1,2])
+    }
+
+    # if resample_na == TRUE, re-calculate psychsyn should a result return NA
+    if(resample_na == TRUE) {
+      counter <- 1
+      synvalue <- psychsyn_cor(itemvalues)
+      while(counter <= 10 & is.na(synvalue)) {
+        itemvalues <- t(apply(itemvalues, 1, sample, 2, replace = F))
+        synvalue <- psychsyn_cor(itemvalues)
+        counter = counter+1
+      }
+    } else {
+      synvalue <- psychsyn_cor(itemvalues) # executes if resample_na == FALSE
+      }
 
-  } else {synvalue = NA}
+  } else {synvalue <- NA} # executes if insufficient item pairs
 
   return(c(sum_item_pairs, synvalue))
-  }
+}
diff --git a/README.md b/README.md
@@ -11,7 +11,7 @@ Careless or insufficient effort responding in surveys, i.e. responding to items
 
 ## Current Version
 
-Current version is 1.1.1
+Current version is 1.2.0
 
 ## Installing from source
 
@@ -32,7 +32,7 @@ library('careless')
 ```
 
 ## APA Citation
-Yentes, R.D., & Wilhelm, F. (2018) careless: Procedures for computing indices of careless responding. R packages version 1.1.0 url: https://github.com/ryentes/careless
+Yentes, R.D., & Wilhelm, F. (2018) careless: Procedures for computing indices of careless responding. R packages version 1.2.0 url: https://github.com/ryentes/careless
 
 ## License
 

diff --git a/man/careless.Rd b/man/careless.Rd
diff --git a/man/careless_dataset.Rd b/man/careless_dataset.Rd
diff --git a/man/careless_dataset2.Rd b/man/careless_dataset2.Rd
diff --git a/man/irv.Rd b/man/irv.Rd
diff --git a/man/longstring.Rd b/man/longstring.Rd
diff --git a/man/psychsyn.Rd b/man/psychsyn.Rd
diff --git a/tests/testthat/Rplots.pdf b/tests/testthat/Rplots.pdf
diff --git a/tests/testthat/test-mahad.R b/tests/testthat/test-mahad.R
@@ -0,0 +1,3 @@
+careless_dataset_na <- careless_dataset
+careless_dataset_na[c(5:8),] <- NA
+data_careless_maha <- mahad(careless_dataset_na)
diff --git a/tests/testthat/test-psychsyn.R b/tests/testthat/test-psychsyn.R
@@ -0,0 +1,13 @@
+# test 1: calculate psych syn on a dataset with missings
+
+# first, create a dataset with missings
+dataset_na <- careless_dataset
+replacements <- 500
+random_row <- sample(1:nrow(dataset_na), replacements, replace = TRUE)
+random_col <- sample(1:ncol(dataset_na), replacements, replace = TRUE)
+
+for(i in 1:replacements) {
+  dataset_na[random_row[i], random_col[i]] <- NA
+}
+
+synonyms <- psychsyn(dataset_na, .60)