Skip to content

Commit

Permalink
Dev changes for 1.2.0 (#25)
Browse files Browse the repository at this point in the history
* IRV now handles missing values, and psychsyn handles NA results better (#17)

* typos and renamed carelessDataset columns

- typo in IRV / psacing in psychsyn
- carelessDataset now has cogent column names

* changed vignettes

removed introduction vignette from package (now hosted online) and fixed typo with linking vignette

* changed psychsyn to do a resampling

psychsyn comes up with different correlation coefficients dependent on whether the item of an item pair is located at the x or y column.

* psychsyn now resamples

psychsyn correlation values was dependent on the placement of items of item pairs in the x and y columns. psychsyn now does multiple random (re)placements of the items.

* irv now can handle missing values

* Updated psychsyn to do up to 10 resamples while encountering NAs

In response to Issue #16

* updated IRV to make na.rm optional

* updating docs to match francisco's changes

* remove vignette

* adding suggestion for texlive-fonts-extra

* change to travis-ci config to attempt to account for latex issues

* trying a different before install command

* adding one more

* changing repo for texlive

* trying http mirror instead

* try again

* idfk

* still not working

* another try

* one more thing before bed

* macos why?!?

* again

* trying to work around.

* no cache

* last try

* travis is not allowed to macos anymore

* feature updates for psychsyn, mahad, irv (#18)

* typos and renamed carelessDataset columns

- typo in IRV / psacing in psychsyn
- carelessDataset now has cogent column names

* changed vignettes

removed introduction vignette from package (now hosted online) and fixed typo with linking vignette

* changed psychsyn to do a resampling

psychsyn comes up with different correlation coefficients dependent on whether the item of an item pair is located at the x or y column.

* psychsyn now resamples

psychsyn correlation values was dependent on the placement of items of item pairs in the x and y columns. psychsyn now does multiple random (re)placements of the items.

* irv now can handle missing values

* Updated psychsyn to do up to 10 resamples while encountering NAs

In response to Issue #16

* updated IRV to make na.rm optional

* updated mahad to handle NA properly

* Update mahad.R - fixed typo

fixed typo

* Update .gitignore

* Updates to mahaD, psychsyn (#21)

* typos and renamed carelessDataset columns

- typo in IRV / psacing in psychsyn
- carelessDataset now has cogent column names

* changed vignettes

removed introduction vignette from package (now hosted online) and fixed typo with linking vignette

* changed psychsyn to do a resampling

psychsyn comes up with different correlation coefficients dependent on whether the item of an item pair is located at the x or y column.

* psychsyn now resamples

psychsyn correlation values was dependent on the placement of items of item pairs in the x and y columns. psychsyn now does multiple random (re)placements of the items.

* irv now can handle missing values

* Updated psychsyn to do up to 10 resamples while encountering NAs

In response to Issue #16

* updated IRV to make na.rm optional

* updated mahad to handle NA properly

* Update mahad.R - fixed typo

fixed typo

* Update .gitignore

* updated psychsyn

handling of NA improved. Added corresponding test.

Co-authored-by: fwilhelm <[email protected]>
Co-authored-by: Richard Yentes <[email protected]>

* changed psychsyn to default resample

* Changed evenodd so it's interpreted similarly to other metrics addresses #19

* fixes to psychsyn doc for resampling

* update travis.yml to skip oldrel

* Cleaning up dev

* Update .travis.yml

* removing travis ci

* Update DESCRIPTION

* updating docs

* changes to even-odd (#23)

* changes to even-odd

1. Introduced error when calling even odd with just one factor.
2. Introduced warning when variables indicated by factor argument do not match the input data (x).
3. Use own warning for cases when NA arises because of 0 variance in even or odd vectors.
4. refactored coding using apply.

* evenodd minor change

added an explicit error when factors > x. Adjusted the call. parameter to be FALSE.

* added warning to even-odd that computation has changed (#24)

Co-authored-by: Francisco Wilhelm <[email protected]>
Co-authored-by: fwilhelm <[email protected]>
  • Loading branch information
3 people authored Aug 16, 2020
1 parent 4956674 commit a11585c
Show file tree
Hide file tree
Showing 19 changed files with 158 additions and 57 deletions.
Binary file added .RData
Binary file not shown.
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
*.Rproj
.Rproj.user
.Rhistory
temp.RData
R/.DS_Store
.RData
vignettes/*.html

6 changes: 3 additions & 3 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
Package: careless
Type: Package
Title: Procedures for Computing Indices of Careless Responding
Version: 1.1.3
Date: 2018-06-19
Version: 1.2.0
Date: 2020-07-25
Authors@R: c(
person("Richard", "Yentes" , email = "[email protected]", role = c("cre", "aut")),
person("Francisco", "Wilhelm", email = "[email protected]", role = c("aut")))
Expand All @@ -20,4 +20,4 @@ Suggests:
Encoding: UTF-8
LazyData: true
VignetteBuilder: knitr
RoxygenNote: 6.0.1
RoxygenNote: 7.1.1
62 changes: 48 additions & 14 deletions R/evenodd.R
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,24 @@

evenodd <- function(x, factors, diag = FALSE) {
#initialize a result dataset
eo <- vector(length = nrow(x), mode = "numeric")
eo_missing <- vector(length = nrow(x), mode = "numeric")
warning("Computation of even-odd has changed for consistency of interpretation
with other indices. This change occurred in version 1.2.0. A higher
score now indicates a greater likelihood of careless responding. If
you have previously written code to cut score based on the output of
this function, you should revise that code accordingly.")

if(length(factors) == 1) {
stop("You have called even-odd with only a single factor. \n The even-odd method requires multiple factors to work correctly.",
call. = FALSE) }
if(sum(factors) > ncol(x)) {
stop("The number of items specified by 'factors' exceeds the number of columns in 'x'.",
call. = FALSE) }
if(sum(factors) != ncol(x)) {
warning("The number of items specified by 'factors' does not match the number of columns in 'x'. \n Please check if this is what you want.",
call. = FALSE) }

# initalize empty list for persons holding the persons even scores and odd scores
eo_vals <- vector("list", nrow(x))

# Loop through each Person
for(i in 1:nrow(x)) {
Expand All @@ -43,17 +59,35 @@ evenodd <- function(x, factors, diag = FALSE) {
f[j,1] <- mean(t(s[e_ind]), na.rm = TRUE)
f[j,2] <- mean(t(s[o_ind]), na.rm = TRUE)
}

# Calculate within-person correlation between even and odd sub-scales
# then apply the Spearman-Brown correction for split-half reliability
# and store the result in the output vector.
eo_missing[i] <- sum(!is.na(apply(f, 1, sum))) #number of even/odd pairs for which no comparison can be computed because of NAs
tmp <- stats::cor(f[,1], f[,2], use ="pairwise.complete.obs")
tmp <- (2*tmp)/(1+tmp)
if(!is.na(tmp) && tmp < -1) tmp <- -1
eo[i] <- tmp
rm(f)
# assign the even and odd values to eo_vals
eo_vals[[i]] <- f
}
if(diag == FALSE) {return(eo)}
else {return(data.frame(eo, eo_missing))}


#calculate number of even/odd pairs for which no comparison can be computed because of NAs
eo_missing <- lapply(eo_vals, function(i) sum(!is.na(apply(i, 1, sum))))

# scan for persons for which no even-odd can be calculated when all values are same, leading to
# a correlation of NA because there is no variance/standard deviation.
eo_sdzero <- lapply(eo_vals, function(i) apply(i, 2, stats::sd))
eo_sdzero <- sapply(eo_sdzero, function(i) any(i == 0))
if(any(eo_sdzero)) warning("One or more observations have zero variance in even and/or odd values. \nThis results in NA values for these observations.\nIncluding more factors may alleviate this issue.",
call. = FALSE)

# Calculate within-person correlation between even and odd sub-scales
# then apply the Spearman-Brown correction for split-half reliability
# and store the result in the output vector.
eo_cor <- sapply(eo_vals, function(f) {
# suppressWarnings for standard deviation 0 which happens when each value-pairs is same
val <- suppressWarnings(stats::cor(f[, 1], f[, 2], use = "pairwise.complete.obs"))
val <- (2 * val) / (1 + val) #split-half
if (!is.na(val) && val < -1) val <- -1
return(val)
})

# transform eo such that higher scores are indicating carelessness
eo_cor = 0 - eo_cor

if(diag == FALSE) {return(eo_cor)}
else {return(data.frame(eo_cor, eo_missing))}
}
18 changes: 10 additions & 8 deletions R/irv.R
Original file line number Diff line number Diff line change
Expand Up @@ -8,17 +8,18 @@
#' Marjanovic et al. (2015) propose to mark persons with \emph{high} IRV scores - reflecting highly random responses (see References).
#'
#' @param x a matrix of data (e.g. survey responses)
#' @param split boolean indicating whether to additionally calculate the IRV on subsets of columns (of equal length).
#' @param na.rm logical indicating whether to calculate the IRV for a person with missing values.
#' @param split logical indicating whether to additionally calculate the IRV on subsets of columns (of equal length).
#' @param num.split the number of subsets the data is to be split in.
#' @author Francisco Wilhelm \email{[email protected]}
#' @references
#' Dunn, A. M., Heggestad, E. D., Shanock, L. R., & Theilgard, N. (2018).
#' Intra-individual Response Variability as an Indicator of Insufficient Effort Responding:
#' Comparison to Other Indicators and Relationships with Individual Differences.
#' \emph{Journal of Business and Psychology, 33(1)}, 105-121. \doi{10.1007/s10869-016-9479-0}
#'
#' Marjanovic, Z., Holden, R., Struthers, W., Cribbie, R., & Greenglass, E. (2015).
#' The inter-item standard deviation (ISD): An index that discriminates between conscientious and random responders.
#'
#' Marjanovic, Z., Holden, R., Struthers, W., Cribbie, R., & Greenglass, E. (2015).
#' The inter-item standard deviation (ISD): An index that discriminates between conscientious and random responders.
#' \emph{Personality and Individual Differences}, 84, 79-83. \doi{10.1016/j.paid.2014.08.021}
#' @export
#' @examples
Expand All @@ -29,20 +30,21 @@
#' irv_split <- irv(careless_dataset, split = TRUE, num.split = 4)
#' boxplot(irv_split$irv4) #produce a boxplot of the IRV for the fourth quarter

irv <- function(x, split = FALSE, num.split = 3) {
out <- apply(x, 1, stats::sd)
irv <- function(x, na.rm = TRUE, split = FALSE, num.split = 3) {
out <- apply(x, 1, stats::sd, na.rm = na.rm)

if(split == TRUE) {
chunk <- function(x,n) split(x, cut(seq_along(x), n, labels = FALSE))
split_x <- apply(x, 1, chunk, num.split)
out_split <- t(replicate(nrow(x), rep(NA, num.split)))
colnames(out_split) <- paste0("irv",1:num.split)
for(k in 1:nrow(out_split)) {
split_x_single <- split_x[[k]]
out_split[k,] <- unlist(lapply(split_x_single, stats::sd), use.names = FALSE)
out_split[k,] <- unlist(lapply(split_x_single, stats::sd, na.rm = na.rm), use.names = FALSE)
}
out_split <- data.frame(out, out_split)
colnames(out_split)[1] <- "irvTotal"
return(out_split)} else {
return(out_split)} else { #split subsection end
return(out)
}
}
2 changes: 1 addition & 1 deletion R/longstring.R
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
#' identical responses is returned. Additionally, can return the average length of uninterrupted string of identical responses.
#'
#' @param x a matrix of data (e.g. item responses)
#' @param avg a boolean indicating whether to additionally return the average length of identical consecutive responses
#' @param avg logical indicating whether to additionally return the average length of identical consecutive responses
#' @author Richard Yentes \email{[email protected]}, Francisco Wilhelm \email{[email protected]}
#' @references
#' Johnson, J. A. (2005). Ascertaining the validity of individual protocols
Expand Down
24 changes: 19 additions & 5 deletions R/mahad.R
Original file line number Diff line number Diff line change
Expand Up @@ -22,11 +22,25 @@
#' mahad_flags <- mahad(careless_dataset, flag = TRUE, confidence = 0.999) #Apply a strict criterion

mahad <- function(x, plot = TRUE, flag = FALSE, confidence = 0.99, na.rm = TRUE) {
raw <- as.numeric(psych::outlier(x, plot, bad = 0, na.rm = na.rm))
if(na.rm == FALSE) {
if(any(is.na(x)) == TRUE) {stop("Some values are NA. Mahalanobis distance was not computed.
Use na.rm = TRUE to use available cases.", call. = FALSE)}
}
#remove rows with all NA and issue warning
complete.na <- apply(x, 1, function(y) { all(is.na(y)) } )
if(any(complete.na)) {
warning("Some cases contain only NA values. The Mahalanobis distance will be calculated using available cases.",
call. = FALSE) }
x_filtered <- x[!complete.na,]

maha_data <- as.numeric(psych::outlier(x_filtered, plot, bad = 0, na.rm = na.rm))
maha_data_merge <- rep_len(NA, nrow(x_filtered))
maha_data_merge[!complete.na] <- maha_data

if(flag == TRUE) {
cut <- stats::qchisq(confidence, ncol(x))
flagged <- (raw > cut)
return(data.frame(raw, flagged))
cut <- stats::qchisq(confidence, length(maha_data))
flagged <- (maha_data_merge > cut)
return(data.frame(maha_data_merge, flagged))
}
else{ return(raw) }
else{ return(maha_data_merge) }
}
39 changes: 29 additions & 10 deletions R/psychsyn.R
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
#' @param anto determines whether psychometric anonyms are returned instead of
#' psychometric synonyms. Defaults to \code{FALSE}
#' @param diag additionally return the number of item pairs available for each observation. Useful if dataset contains many missing values.
#' @param resample_na if psychsyn returns NA for a respondent resample to attempt getting a non-NA result.
#' @author Richard Yentes \email{[email protected]}, Francisco Wilhelm \email{[email protected]}
#' @references
#' Meade, A. W., & Craig, S. B. (2012). Identifying careless responses in survey data.
Expand All @@ -32,21 +33,22 @@
#' synonyms <- psychsyn(careless_dataset, .60, diag = TRUE)
#' antonyms <- psychant(careless_dataset2, .50, diag = TRUE)

psychsyn <- function(x, critval=.60, anto=FALSE, diag=FALSE) {
psychsyn <- function(x, critval=.60, anto=FALSE, diag=FALSE, resample_na=TRUE) {
x <- as.matrix(x)
item_pairs <- get_item_pairs(x, critval, anto)
synonyms <- apply(x,1,syn_for_one, item_pairs)

synonyms <- apply(x,1,syn_for_one, item_pairs, resample_na)
synonyms_df <- as.data.frame(aperm(synonyms))
colnames(synonyms_df) <- c("numPairs", "cor")

if(diag==TRUE) { return(synonyms_df) }
else { return(synonyms_df$cor) }
}

# Helper function that identifies psychometric synonyms in a given dataset
get_item_pairs <- function(x, critval=.60, anto=FALSE) {
x <- as.matrix(x)
critval <- abs(critval) #Dummy Proofing

correlations <- stats::cor(x, use = "pairwise.complete.obs")
correlations[upper.tri(correlations, diag=TRUE)] <- NA
correlations <- as.data.frame(as.table(correlations))
Expand All @@ -71,15 +73,32 @@ get_item_pairs <- function(x, critval=.60, anto=FALSE) {
}

# Helper function to calculate the within person correlation for a single individual
syn_for_one <- function(x, item_pairs) {
syn_for_one <- function(x, item_pairs, resample_na) {
item_pairs_omit_na <- which(!(is.na(x[item_pairs[,1]]) | is.na(x[item_pairs[,2]])))
sum_item_pairs <- length(item_pairs_omit_na)

#only execute if more than two item pairs
if(sum_item_pairs > 2) {
itemvalues <- cbind(as.numeric(x[as.numeric(item_pairs[,1])]), as.numeric(x[as.numeric(item_pairs[,2])]))
synvalue <- suppressWarnings(stats::cor(itemvalues, use = "pairwise.complete.obs", method = "pearson")[1,2])
itemvalues <- cbind(as.numeric(x[as.numeric(item_pairs[,1])]), as.numeric(x[as.numeric(item_pairs[,2])]))

# helper that calculates within-person correlation
psychsyn_cor <- function(x) {
suppressWarnings(stats::cor(x, use = "pairwise.complete.obs", method = "pearson")[1,2])
}

# if resample_na == TRUE, re-calculate psychsyn should a result return NA
if(resample_na == TRUE) {
counter <- 1
synvalue <- psychsyn_cor(itemvalues)
while(counter <= 10 & is.na(synvalue)) {
itemvalues <- t(apply(itemvalues, 1, sample, 2, replace = F))
synvalue <- psychsyn_cor(itemvalues)
counter = counter+1
}
} else {
synvalue <- psychsyn_cor(itemvalues) # executes if resample_na == FALSE
}

} else {synvalue = NA}
} else {synvalue <- NA} # executes if insufficient item pairs

return(c(sum_item_pairs, synvalue))
}
}
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ Careless or insufficient effort responding in surveys, i.e. responding to items

## Current Version

Current version is 1.1.1
Current version is 1.2.0

## Installing from source

Expand All @@ -32,7 +32,7 @@ library('careless')
```

## APA Citation
Yentes, R.D., & Wilhelm, F. (2018) careless: Procedures for computing indices of careless responding. R packages version 1.1.0 url: https://github.com/ryentes/careless
Yentes, R.D., & Wilhelm, F. (2018) careless: Procedures for computing indices of careless responding. R packages version 1.2.0 url: https://github.com/ryentes/careless

## License

Expand Down
1 change: 0 additions & 1 deletion man/careless.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 3 additions & 1 deletion man/careless_dataset.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 3 additions & 1 deletion man/careless_dataset2.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

10 changes: 6 additions & 4 deletions man/irv.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/longstring.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 3 additions & 1 deletion man/psychsyn.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Binary file added tests/testthat/Rplots.pdf
Binary file not shown.
3 changes: 3 additions & 0 deletions tests/testthat/test-mahad.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
careless_dataset_na <- careless_dataset
careless_dataset_na[c(5:8),] <- NA
data_careless_maha <- mahad(careless_dataset_na)
13 changes: 13 additions & 0 deletions tests/testthat/test-psychsyn.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# test 1: calculate psych syn on a dataset with missings

# first, create a dataset with missings
dataset_na <- careless_dataset
replacements <- 500
random_row <- sample(1:nrow(dataset_na), replacements, replace = TRUE)
random_col <- sample(1:ncol(dataset_na), replacements, replace = TRUE)

for(i in 1:replacements) {
dataset_na[random_row[i], random_col[i]] <- NA
}

synonyms <- psychsyn(dataset_na, .60)
Loading

0 comments on commit a11585c

Please sign in to comment.