--- title: "Getting Started with contentValidity" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Getting Started with contentValidity} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(contentValidity) ``` ## Background When developing a new questionnaire, scale, or test, researchers typically ask a panel of subject-matter experts to rate each candidate item for relevance to the construct being measured. The expert ratings are then summarized into **content validity indices** that quantify how well the items represent the intended construct. The `contentValidity` package implements the standard set of content validity indices used in nursing, education, psychology, and health sciences research: - **I-CVI** — Item-level Content Validity Index (Lynn, 1986) - **S-CVI/Ave** — Scale-level CVI, average method (Polit & Beck, 2006) - **S-CVI/UA** — Scale-level CVI, universal agreement (Polit & Beck, 2006) - **Modified κ\*** — I-CVI adjusted for chance agreement (Polit, Beck, & Owen, 2007) - **Aiken's V** — uses the full rating scale (Aiken, 1985) - **Lawshe's CVR** — Content Validity Ratio for "essential" judgments (Lawshe, 1975), with corrected critical values from Wilson, Pan, and Schumsky (2012) ## The example dataset The package ships with `cvi_example`, a simulated set of expert ratings for a 10-item depression screening instrument, with 6 expert raters using a 4-point relevance scale (1 = not relevant, 4 = highly relevant). ```{r} data(cvi_example) head(cvi_example) ``` ## Item-level analysis The simplest place to start is `icvi()`, which gives the proportion of experts rating each item as 3 or 4: ```{r} icvi(cvi_example) ``` By Polit and Beck (2006), I-CVI ≥ 0.78 is considered excellent with six or more experts. Items 5 and 9 in our example (0.67 and 0.50) would be flagged for revision. Plain I-CVI doesn't correct for chance agreement. With small panels, a high I-CVI can be partly luck. **Modified kappa** addresses this: ```{r} mod_kappa(cvi_example) ``` Notice that item 9 drops sharply (0.50 → 0.27) — its I-CVI was inflated by chance agreement among only six raters. **Aiken's V** uses the full rating scale rather than dichotomizing relevant/not-relevant. A "4" contributes more than a "3": ```{r} aiken_v(cvi_example, lo = 1, hi = 4) ``` ## Scale-level analysis Two scale-level indices summarize content validity across all items: ```{r} scvi_ave(cvi_example) # average of I-CVIs scvi_ua(cvi_example) # proportion of items with universal agreement ``` Polit and Beck (2006) recommend reporting both. S-CVI/Ave ≥ 0.90 indicates excellent overall content validity; S-CVI/UA gives a stricter view of how many items achieved unanimous endorsement. ## All indices at once `content_validity()` is the workhorse function for routine analysis. It returns the complete set of item-level and scale-level indices in one tidy structure: ```{r} result <- content_validity(cvi_example) result ``` The result is an object you can subset, just like a list: ```{r} result$items result$scale ``` ## Publication-ready tables `apa_table()` formats the result for journal manuscripts: ```{r} apa_table(result) ``` For R Markdown output (HTML, PDF, Word), use the appropriate format argument. The function returns a `knitr::kable()` object that renders correctly in your document: ```{r, results = "asis"} apa_table(result, format = "markdown") ``` ## Lawshe's CVR CVR uses a different rating convention: each expert classifies items as **essential**, **useful but not essential**, or **not necessary**. Use Lawshe-style coding (1 = essential, 2 = useful, 3 = not necessary) and call `cvr()` directly: ```{r} # 10 experts rating 3 items on Lawshe's scale lawshe_ratings <- matrix( c(1, 1, 1, 1, 1, 1, 1, 1, 2, 2, # 8 of 10 essential 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, # 3 of 10 essential 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), # 10 of 10 essential nrow = 10, dimnames = list(NULL, paste0("item", 1:3)) ) cvr(lawshe_ratings) ``` Compare each item's CVR to the critical value for the panel size, using the corrected Wilson, Pan, and Schumsky (2012) thresholds: ```{r} cvr_critical(n_experts = 10) # one-tailed alpha = 0.05 cvr_critical(n_experts = 10, alpha = 0.01) ``` In this example, only items 1 and 3 (CVR = 0.6 and 1.0) reach the critical value of 0.8 at α = 0.05. Item 2 would be revised or dropped. ## What's new in v0.2.0 ### Bootstrap confidence intervals All six relevance-scale indices and Lawshe's CVR now accept an optional `ci = TRUE` argument that returns bootstrap confidence intervals alongside the point estimate. The CI is the percentile bootstrap by default (Efron & Tibshirani, 1993); `ci_method = "bca"` requests the bias-corrected accelerated interval (DiCiccio & Efron, 1996), which is preferable when the bootstrap distribution is skewed (common for I-CVI near 1.0). Default 2000 replicates, configurable via `n_boot`. The resampling unit is the expert (row), not the item (column), matching the standard inferential frame for inter-rater reliability analyses (Gwet, 2014). ```{r} icvi(cvi_example, ci = TRUE, n_boot = 1000, seed = 1) ``` ### Gwet's AC1 and AC2 Two new chance-corrected agreement coefficients are available: `gwet_ac1()` for binary classification (dichotomized at the relevance threshold) and `gwet_ac2()` for the full ordinal scale with a weight matrix. Both use Gwet's marginal-adjusted chance-correction, which differs from Polit's modified kappa (fixed p = 0.5 null) and gives substantively different answers when the prevalence of "relevant" ratings is far from 0.5 — the common case in content-validity work. ```{r} gwet_ac1(cvi_example) gwet_ac2(cvi_example, categories = 1:4) ``` For AC2, **always pass the full theoretical rating scale** via `categories` (e.g., `1:4` for a standard 4-point relevance scale). If omitted, the function infers categories from the observed ratings, which can silently collapse the weight matrix and give incorrect results when extreme categories are unused. The implementation matches `irrCAC::gwet.ac1.raw()` (by Kilem Gwet, the original author of AC1/AC2) bit-for-bit on the same inputs. ### Sample-size planning `cv_sample_size_icvi()` answers "how many expert raters do I need to estimate I-CVI within a given confidence-interval half-width?" — a question that has been answered only by rule-of-thumb in the content-validity literature (Lynn, 1986; Polit & Beck, 2006). ```{r} # Anticipating I-CVI ≈ 0.85 with target half-width ≤ 0.10 cv_sample_size_icvi(expected = 0.85, half_width = 0.10) # Sensitivity table across plausible expected I-CVI values sapply(seq(0.70, 0.95, by = 0.05), function(p) { cv_sample_size_icvi(expected = p, half_width = 0.10) }) ``` A useful caveat: the function typically recommends 20+ experts for realistic targets, well above Lynn's rule-of-thumb minimum of 6 — worth flagging in study protocols and grant applications. ### Multi-dimensional / subscale analysis For instruments structured into subscales (e.g., a depression scale with cognitive, somatic, and behavioral domains), `content_validity()` now accepts a `subscale` argument that maps items to subscales and computes scale-level indices per subscale in addition to the overall scale. ```{r} # Treat items 1-5 as subscale "Cognitive" and 6-10 as "Somatic" result_multi <- content_validity( cvi_example, subscale = c(rep("Cognitive", 5), rep("Somatic", 5)) ) result_multi$subscales ``` The items data frame also carries the subscale assignment, which makes it easy to filter or facet downstream analyses. ### Visualization `plot.content_validity()` produces a scatter of I-CVI against an agreement index (modified kappa by default; choose `gwet_ac1`, `gwet_ac2`, or `aiken_v` via `y_index`). Reference lines mark the adequacy region and items outside it are highlighted in red and labeled. ```{r, fig.width = 6, fig.height = 4} plot(result_multi, y_index = "gwet_ac2") ``` By default, items are flagged ("Below I-CVI or AC2 threshold") if they fail *either* criterion. This is the conservative "needs any review" default. When the plot is presenting one index specifically, you may prefer to flag only items that fail on that axis: ```{r, fig.width = 6, fig.height = 4} # Flag only items below the AC2 threshold (ignores I-CVI verdict) plot(result_multi, y_index = "gwet_ac2", flag_logic = "y_index") # Flag only items below the I-CVI threshold (ignores AC2 verdict) plot(result_multi, y_index = "gwet_ac2", flag_logic = "icvi") ``` The legend always names the criterion that drives the flag, so the plot stays unambiguous about why an item is highlighted. ### Per-index interpretation in APA tables `apa_table()` accepts `interpretation_index` to choose which agreement index drives the verdict column ("Excellent" / "Good" / etc.). The interpretation column is positioned immediately adjacent to its source column to avoid confusion when the table contains multiple indices. ```{r} apa_table(result_multi, interpretation_index = "gwet_ac2") ``` ## Citing the package If you use `contentValidity` in published research, please run: ```{r, eval = FALSE} citation("contentValidity") ``` to get a current citation block in BibTeX or plain-text form. ## References Aiken, L. R. (1985). Three coefficients for analyzing the reliability and validity of ratings. *Educational and Psychological Measurement*, 45(1), 131–142. Lawshe, C. H. (1975). A quantitative approach to content validity. *Personnel Psychology*, 28(4), 563–575. Lynn, M. R. (1986). Determination and quantification of content validity. *Nursing Research*, 35(6), 382–385. Polit, D. F., & Beck, C. T. (2006). The content validity index: Are you sure you know what's being reported? Critique and recommendations. *Research in Nursing & Health*, 29(5), 489–497. Polit, D. F., Beck, C. T., & Owen, S. V. (2007). Is the CVI an acceptable indicator of content validity? Appraisal and recommendations. *Research in Nursing & Health*, 30(4), 459–467. Wilson, F. R., Pan, W., & Schumsky, D. A. (2012). Recalculation of the critical values for Lawshe's content validity ratio. *Measurement and Evaluation in Counseling and Development*, 45(3), 197–210. Gwet, K. L. (2008). Computing inter-rater reliability and its variance in the presence of high agreement. *British Journal of Mathematical and Statistical Psychology*, 61(1), 29–48. Gwet, K. L. (2014). *Handbook of inter-rater reliability* (4th ed.). Advanced Analytics, LLC. Wongpakaran, N., Wongpakaran, T., Wedding, D., & Gwet, K. L. (2013). A comparison of Cohen's Kappa and Gwet's AC1 when calculating inter-rater reliability coefficients. *BMC Medical Research Methodology*, 13(1), 61. Efron, B., & Tibshirani, R. J. (1993). *An introduction to the bootstrap*. Chapman and Hall. DiCiccio, T. J., & Efron, B. (1996). Bootstrap confidence intervals. *Statistical Science*, 11(3), 189–228. Newcombe, R. G. (1998). Two-sided confidence intervals for the single proportion. *Statistics in Medicine*, 17(8), 857–872. Altman, D. G. (1991). *Practical statistics for medical research*. Chapman and Hall.