This article provides a comprehensive guide for researchers and drug development professionals on identifying, addressing, and preventing Questionable Research Practices (QRPs).
This article provides a comprehensive guide for researchers and drug development professionals on identifying, addressing, and preventing Questionable Research Practices (QRPs). Covering the latest research, including a newly defined inventory of 40 QRPs, the article explores foundational concepts, advanced detection methodologies, practical prevention strategies, and validation techniques. It addresses pressing issues like p-hacking, HARKing, and publication bias, while offering actionable solutions such as preregistration, open data, and AI-assisted screening tools. Designed for the biomedical and clinical research community, this guide synthesizes current evidence to foster research integrity, improve replicability, and build a more trustworthy scientific foundation for drug development and clinical applications.
Answer: Questionable Research Practices (QRPs) are procedures or decisions in the research process that are not transparent, ethical, or fair, and are likely to produce misleading conclusions, typically in the interest of the researcher [1] [2] [3].
It is crucial to distinguish QRPs from other forms of poor scientific practice. The table below clarifies these differences.
Table 1: Defining QRPs in the Context of Research Misconduct
| Category | Definition | Key Differentiator |
|---|---|---|
| QRPs | Methodologically unsound practices that threaten the validity and reliability of science, often motivated by a desire for positive results [1] [4] [5]. | Ethical ambiguity; often not officially prohibited but pose a major threat to cumulative science [4] [5]. |
| Research Misconduct | Clearly prohibited and proscribed practices, such as fabrication, falsification, and plagiarism [1] [3] [5]. | Universally recognized as unacceptable and unethical. |
| Researcher Error | Non-motivated, accidental mistakes (e.g., accidental data loss) [1] [3]. | Lack of intent to mislead. |
Answer: QRPs can occur at nearly every stage of the research lifecycle. The "Bestiary of Questionable Research Practices," a community-consensus project, has identified and categorized 40 different QRPs [1] [3] [6]. The most common and impactful ones are listed below.
Table 2: Common Questionable Research Practices (QRPs) and Their Impact
| QRP Category | Specific QRP | Description | Impact on Research |
|---|---|---|---|
| Reporting & Analysis | HARKing (Hypothesizing After the Results are Known) | Formulating hypotheses after results are known to fit the data [7] [4] [2]. | Undermines the hypothesis-testing nature of science; creates false positive findings [7] [2]. |
| P-hacking | Conducting multiple analyses or selectively reporting outcomes to produce a statistically significant result (p < 0.05) [7] [2] [8]. | Inflates false-positive rates, skewing the scientific literature [2] [8]. | |
| Selective Reporting / "Cherry-picking" | Reporting only results or studies that are significant or consistent with predictions, while omitting negative or insignificant results [7] [4] [2]. | Presents a biased picture of an intervention's true effectiveness; known as the "file drawer problem" [7]. | |
| Data Collection & Management | Inadequate Record Keeping | Failing to keep careful, detailed records of the research process, including decisions and protocols [2] [5]. | Makes replication impossible and obscures the research trail [2]. |
| Optional Stopping | Monitoring data collection and stopping only after a significant result is attained, without a pre-defined sample size [8]. | Increases the likelihood of a false positive finding [8]. | |
| Collaboration & Authorship | Gift Authorship | Demanding or accepting authorship for work that does not meet established contribution criteria [4] [5]. | Undermines credit and accountability. |
| Insufficient Supervision | Failing to adequately mentor and oversee junior coworkers [5]. | A highly prevalent mispractice that threatens both trust and truth in science [5]. |
The following diagram maps out how these QRPs can infiltrate different stages of a typical research workflow, highlighting critical points where integrity is at risk.
Answer: Surveys indicate QRPs are unfortunately common. A meta-analytic study found about 12.5% of researchers admitted to engaging in at least one QRP, while other surveys have reported much higher prevalence rates, with some estimates suggesting one in two researchers has engaged in a QRP in the last three years [4] [2]. The motivation is often a combination of systemic pressure and individual factors.
Table 3: Factors Contributing to Engagement in Questionable Research Practices
| Factor Category | Specific Factor | Explanation |
|---|---|---|
| Systemic & Institutional | "Publish or Perish" Culture | Pressure to publish frequently in high-impact journals for hiring, promotion, and funding [4] [2]. |
| Bias Toward Significant Results | Journals preferentially publish novel, statistically significant results, creating a bias against null findings [4] [8]. | |
| Individual & Cognitive | Researcher Degrees of Freedom | The many flexible decisions researchers must make during a study (e.g., on data exclusion, analysis choices) can be exploited to obtain desired outcomes [7] [4]. |
| Competing Contingencies | Researcher behavior is shaped by both the desire for valid scientific contributions and the tangible benefits of publication (prestige, job security). QRPs can emerge when the latter dominates [7]. | |
| Competency Shortfalls | Lack of sufficient training in research ethics, methodology, or data analysis can lead to engagement in QRPs, sometimes unintentionally [4] [2]. |
Answer: Detecting QRPs relies on a combination of statistical techniques, methodological scrutiny, and cross-verification. The table below outlines key detection protocols.
Table 4: Experimental Protocols for Detecting Questionable Research Practices
| Detection Method | Protocol Description | QRPs It Can Help Identify |
|---|---|---|
| Statistical Consistency Checks | Examining the distribution of p-values in a literature; a surplus of p-values just below 0.05 (p-value clustering) can indicate p-hacking [8]. | P-hacking, optional stopping. |
| Replication Studies | Directly repeating a prior study's methodology to see if the same results are obtained. The inability to replicate findings can be a red flag [7] [8]. | Selective reporting, HARKing, various QRPs. |
| Comparison with Unpublished Literature | Systematically comparing effect sizes from published studies with those from unpublished theses or dissertations. Larger effects in published work suggest a file drawer problem [7]. | Selective reporting, publication bias. |
| Power Analysis Evaluation | Assessing the statistical power of a series of studies. An unrealistically high rate of significant results across multiple underpowered studies suggests non-reported null findings [8]. | Selective reporting, the file drawer problem. |
Adopting improved research practices and tools is the most effective way to prevent QRPs. The following table details key "reagents" for ensuring robust and transparent science.
Table 5: Research Reagent Solutions for Preventing QRPs
| Tool / Practice | Function | How It Mitigates QRPs |
|---|---|---|
| Pre-registration | Publicly documenting a study's hypotheses, design, and analysis plan before data collection begins [2]. | Directly counters HARKing and p-hacking by creating a time-stamped, unchangeable record of intent. |
| Registered Reports | A publication format where journals peer-review and provisionally accept studies before data is collected, based on the proposed methodology [4]. | Removes publication bias against null results, reducing the incentive for selective reporting and p-hacking. |
| Open Data & Code | Publicly sharing de-identified raw data and analysis scripts alongside the publication. | Allows for full independent verification of results, deters selective reporting and flexible data analysis. |
| Citation Manager | Using software (e.g., Zotero, Mendeley) to organize and format references [2]. | Helps prevent improper referencing and citation plagiarism. |
| Detailed Lab Protocols | Maintaining standardized, detailed records of all research procedures and decisions [2]. | Prevents inadequate record keeping and makes the research process transparent and replicable. |
| Contributorship Model | Using explicit criteria (e.g., CRediT taxonomy) to define and disclose each author's specific contributions [2]. | Helps eliminate gift and ghost authorship by ensuring accountability. |
Answer: Moving from questionable to improved research practices requires a conscious shift in methodology. The key is to prioritize transparency and pre-commitment.
Questionable Research Practices (QRPs) are activities that, while not necessarily classified as outright fraud, violate principles of research transparency, ethics, and rigor [2]. They occupy a concerning gray area in scientific research, situated between deliberate misconduct (such as data fabrication) and honest error. The prevalence of QRPs is a significant concern; one estimate suggests that one in two researchers has engaged in at least one QRP over a three-year period [2]. These practices have been identified as a key contributor to the "replication crisis" observed in numerous scientific fields, where subsequent studies fail to reproduce the findings of originally published research [7] [2]. This erosion of reliability threatens the integrity of the scientific record and can lead to wasted resources and misguided clinical or policy decisions.
Navigating the landscape of research integrity requires a clear understanding of the distinctions between different types of problematic practices. The following table outlines the core definitions that form this spectrum.
Table 1: Categorizing Problematic Research Practices
| Category | Definition | Key Characteristics | Examples |
|---|---|---|---|
| Research Misconduct (FFP) | Fabrication, falsification, or plagiarism in proposing, performing, or reviewing research, or in reporting research results [9]. | Defined by U.S. federal policy; involves clear, intentional deception. | Making up data (fabrication); manipulating research materials or omitting data to misrepresent results (falsification); appropriating another's ideas or words without credit (plagiarism) [9]. |
| Questionable Research Practices (QRPs) | Decisions during the research process that raise questions regarding the work's rigor and precision, often motivated by pressure to publish [2]. | Often reside in ethical gray areas; can be challenging to detect; may be unintentional but have detrimental effects. | Selective reporting of results; p-hacking; HARKing; failing to share data; not accurately documenting the research process [7] [2]. |
| Detrimental Research Practices | A broader category of actions that violate research values and can damage the research enterprise [9]. | Encompasses a wider range of behaviors than QRPs, including poor mentorship and abusive work environments. | Refusing to share research materials or data with other researchers; inappropriate authorship; inadequate supervision or mentorship [9]. |
| Honest Error | Unintentional mistakes in the recording, selection, or analysis of data [9]. | Not intentional; does not constitute misconduct. | Errors of judgment; slips in data entry; differences of opinion in data interpretation [9]. |
The diagram below illustrates the functional relationship between external pressures, researcher behavior, and the resulting impact on the scientific literature.
This section is structured as a technical support guide to help researchers identify, diagnose, and resolve common QRPs in their work.
Q1: I've collected some data, but the results are not statistically significant. I noticed one outlier that seems to be skewing the results. Is it acceptable to remove it?
Q2: My data suggest an interesting relationship I hadn't initially predicted. Can I write my paper's introduction and hypothesis as if I had predicted this all along?
Q3: I ran three experiments, but only one produced clear, positive results. Can I just write up and submit the successful one?
Q4: My research process evolved during the project. Do I need to document every little change?
The scale of the issue is significant, as evidenced by large-scale studies. The following table summarizes findings from one such analysis of randomized controlled trials.
Table 2: Indicators of Questionable Research Practices in 163,129 Randomized Controlled Trials [10]
| Indicator of QRP | Definition | Prevalence in Studied RCTs |
|---|---|---|
| Implausible Baseline Characteristics | A statistical indicator suggesting that reported baseline data may be too perfect to be realistic. | Identified in a portion of the analyzed trials, suggesting potential selective reporting or data manipulation. |
| Inconsistent Reporting | Discrepancies in data reported between the main text, tables, and figures of a publication. | A common issue found across a significant number of trials, impacting the reliability of the published record. |
Adopting structured frameworks at the outset of a research project is one of the most effective ways to prevent QRPs.
A well-constructed research question is the foundation of rigorous science. Using frameworks like PICO and FINER ensures all critical components are considered a priori [11] [12].
Table 3: The PICO Framework for Structuring a Research Question [11] [12]
| Component | Definition | Example: "Good" | Example: "Best" |
|---|---|---|---|
| P (Patient/Population) | The subjects of interest. | Adult patients with type II diabetes. | Adult patients (18-64 years old) with uncontrolled type II diabetes (A1c >7%) in a primary care setting. |
| I (Intervention/Exposure) | The action or exposure being studied. | Pharmacist-led education. | Three visits over 12 months with an ambulatory care pharmacist providing nutritional guidance and medication adjustments. |
| C (Comparison) | The alternative to compare against. | Usual care. | Three visits over 12 months with a primary care physician. |
| O (Outcome) | The effect being evaluated. | Change in blood glucose. | Percent change in A1c from baseline at 3, 6, and 12 months. |
The FINER criteria (Feasible, Interesting, Novel, Ethical, Relevant) help evaluate the practical aspects and value of a research question [11]. The diagram below illustrates the workflow for developing a robust research plan using these tools.
Beyond laboratory reagents, a modern scientist's toolkit must include resources that support methodological rigor and transparency.
Table 4: Key Resources for Implementing Improved Research Practices
| Tool / Resource | Category | Primary Function in Mitigating QRPs |
|---|---|---|
| Pre-registration Platforms (e.g., OSF, AsPredicted) | Protocol Planning | Allows researchers to publicly archive their hypotheses, methods, and analysis plan before data collection, combating HARKing and p-hacking [2]. |
| Citation Managers (e.g., Zotero, Mendeley) | Writing & Dissemination | Helps ensure accurate and complete referencing, preventing improper citation and plagiarism [2]. |
| Version Control Systems (e.g., Git, SVN) | Data & Code Management | Tracks all changes to code and documentation, creating a transparent and auditable research trail [2]. |
| Registered Reports | Publishing Format | A journal format where the study design and proposed analyses are peer-reviewed before data collection, reducing publication bias against null results [2]. |
Distinguishing between research misconduct, QRPs, and honest errors is critical for upholding scientific integrity. While misconduct (FFP) represents clear and intentional violations, QRPs often stem from a complex interplay of external pressures and researcher degrees of freedom [7] [2]. The good news is that the scientific community has developed powerful "tools" to combat QRPs, including pre-registration, open science frameworks, and structured guidelines like PICO and FINER. By integrating these improved research practices into their daily workflow, researchers and drug development professionals can protect their work from questionable practices, enhance its validity and reproducibility, and actively contribute to a more trustworthy and reliable scientific literature.
Questionable Research Practices (QRPs) are activities that exist in an ethical grey area between sound scientific conduct and outright scientific misconduct (fabrication, falsification, and plagiarism) [4]. These practices threaten scientific integrity by undermining the reliability and validity of scientific knowledge, contributing to what is often termed the "replication crisis" in science [2] [4]. QRPs are concerning due to their high prevalence; one survey found that 51.3% of academic researchers engaged in at least one QRP frequently over a three-year period [13].
The most common QRPs include p-hacking, HARKing, and selective reporting, which collectively distort the scientific literature by inflating false positive rates and creating a skewed representation of research findings [14] [4]. These practices are often driven by a "publish or perish" culture that incentivizes researchers to produce statistically significant, novel results for publication [14] [2]. This technical guide provides identification methods, consequences, and solutions for these QRPs to support research integrity in biomedical sciences.
Definition: p-hacking occurs when researchers repeatedly analyze data in different ways until they obtain a statistically significant result (p < 0.05) [14] [15]. Also known as "data dredging" or "data snooping," this practice involves testing multiple hypotheses without proper statistical correction [16].
Common Manifestations:
Table 1: p-Hacking Detection Indicators
| Indicator | Description | Statistical Consequence |
|---|---|---|
| Multiple testing without correction | Running many statistical tests but only reporting significant ones [16] | Increased false positive rate; with α=0.05, 1 in 20 tests will be significant by chance alone [14] |
| Optional stopping | Collecting data incrementally and stopping when significance is reached [16] | Substantial inflation of Type I error rates [14] |
| Post-hoc data exclusion | Removing outliers after seeing their impact on results [16] | Altered statistical significance that doesn't reflect true effect [14] |
| Covariate manipulation | Adding or removing covariates to achieve significance [14] | Increased likelihood of false positive findings [14] |
| Selective reporting of outcomes | Measuring multiple variables but only reporting those with significant results [2] | Skewed literature with inflated effect sizes [14] |
Experimental Protocol for Identification:
Definition: HARKing involves presenting a post-hoc hypothesis (developed after seeing the results) as if it were an a priori hypothesis (developed before the study) [15] [4].
Common Manifestations:
Table 2: HARKing vs. Appropriate Practice
| Aspect | HARKing (QRP) | Appropriate Practice |
|---|---|---|
| Hypothesis timing | Presented as formulated before data collection | Clearly stated as developed after data inspection [15] |
| Interpretation | Findings presented as confirmatory | Findings explicitly labeled as exploratory or hypothesis-generating [15] |
| Context in paper | Introduction written/reworked to imply a priori prediction | Discussion acknowledges post-hoc nature and need for confirmation [15] |
| Statistical interpretation | P-values interpreted as confirming hypothesis | P-values recognized as potentially reflecting chance findings [15] |
Experimental Protocol for Prevention:
Definition: Selective reporting involves presenting only favorable results that support the researcher's hypothesis while concealing unfavorable or non-significant findings [2] [15].
Common Manifestations:
Experimental Protocol for Mitigation:
Table 3: QRP Prevalence Among Researchers
| QRP Type | Prevalence | Study Context |
|---|---|---|
| Fabrication | 4.3% (95% CI: 2.9, 5.7) [13] | Dutch academic researchers over 3 years |
| Falsification | 4.2% (95% CI: 2.8, 5.6) [13] | Dutch academic researchers over 3 years |
| Any frequent QRP | 51.3% (95% CI: 50.1, 52.5) [13] | Dutch academic researchers over 3 years |
| Selective reporting of outcomes | Up to 66% in specific scenarios [17] | Biomedical doctoral students facing dilemmas |
| Excluding data after analysis | 39% [17] | Psychology researchers |
| Failing to report all variables | Approximately 50% [17] | Psychology researchers |
Table 4: Factors Associated with QRP Engagement
| Factor | Impact on QRPs | Evidence Source |
|---|---|---|
| Career stage | PhD candidates/junior researchers had increased odds of frequent QRP engagement (OR: 1.59, 95% CI: 1.32, 1.92) [13] | Dutch national survey |
| Gender | Male researchers had higher odds of frequent QRP engagement (OR: 1.33, 95% CI: 1.18, 1.50) [13] | Dutch national survey |
| Publication pressure | Associated with more frequent QRPs (OR: 1.22, 95% CI: 1.14, 1.30) [13] | Dutch national survey |
| Scientific norm subscription | Associated with less research misconduct (OR: 0.79; 95% CI: 0.63, 1.00) [13] | Dutch national survey |
| Perceived detection likelihood | Reviewer detection associated with less misconduct (OR: 0.62, 95% CI: 0.44, 0.88) [13] | Dutch national survey |
Q1: What's the difference between exploratory analysis and p-hacking? A1: Exploratory analysis explicitly acknowledges its hypothesis-generating nature and treats findings as preliminary, requiring confirmation. p-hacking conceals the analytical flexibility and presents results as confirmatory without acknowledging multiple testing [15] [16]. The key distinction is transparency about the analytical process and appropriate statistical interpretation.
Q2: Is it ever acceptable to analyze data without pre-specified hypotheses? A2: Yes, exploratory data analysis is valid when properly framed as hypothesis-generating rather than confirmatory [15]. The critical requirement is clear communication that findings are preliminary and require independent validation, with statistical interpretations adjusted for multiple testing [15].
Q3: How can I prevent selective reporting in my lab? A3: Implement study pre-registration, establish standardized data collection protocols, maintain comprehensive lab notebooks documenting all experiments (including failures), and create a culture that values negative results as much as positive findings [2]. Some labs establish "lab journals" where all experimental attempts are recorded regardless of outcome.
Q4: What are the most effective safeguards against HARKing? A4: Study pre-registration (particularly registered reports) is the most effective defense [2]. Additional safeguards include timestamped electronic records of hypothesis generation, explicit labeling of exploratory analyses in manuscripts, and separate sections for confirmatory versus exploratory findings in publications [15].
Q5: Are QRPs always intentional misconduct? A5: No, research suggests QRPs exist on a spectrum from intentional misconduct to unintentional poor practices [18]. Many researchers engage in QRPs without full awareness of their consequences due to inadequate training, cognitive biases, or organizational pressures [14] [4]. However, the consequences are similarly damaging regardless of intent [18].
Table 5: Essential Resources for QRP Identification and Prevention
| Resource Type | Specific Tools | Function and Application |
|---|---|---|
| Pre-registration platforms | Open Science Framework (OSF), ClinicalTrials.gov, BMJ Open | Document hypotheses, methods, and analysis plans before data collection to prevent HARKing and selective reporting [2] |
| Statistical power tools | Superpower, pwr package in R | Conduct a priori power analysis to ensure adequate sample sizes and prevent data collection manipulation [2] |
| Data documentation tools | Electronic lab notebooks, version control (Git) | Maintain comprehensive records of all research decisions and data manipulations [2] |
| Multiple testing correction | Bonferroni, False Discovery Rate, permutation tests | Adjust significance thresholds for multiple comparisons to control false positive rates [14] |
| Data sharing platforms | Dryad, Zenodo, institutional repositories | Share complete datasets to enable verification and transparency [2] |
| Registered Reports | Journal format available at Cortex, Comprehensive Results in Social Psychology | Peer review of methods before results are known, guaranteeing publication regardless of findings [4] |
p-Hacking, HARKing, and selective reporting represent three prevalent QRPs that significantly impact biomedical research validity. Evidence indicates these practices are widespread, with over 50% of researchers frequently engaging in at least one QRP [13]. The resulting distortion of the scientific literature undermines research reproducibility, wastes resources, and erodes public trust in science [4].
Successful mitigation requires both individual and systemic approaches, including enhanced education about QRPs, widespread adoption of pre-registration, implementation of open science practices, and cultural shifts within research institutions to reward rigorous methodology rather than solely novel, positive results [2] [4] [13]. By implementing the troubleshooting guides and protocols outlined in this document, researchers can significantly reduce QRP prevalence and enhance the reliability of biomedical research.
1. What are Questionable Research Practices (QRPs)? Questionable Research Practices (QRPs) are activities during the research process that are not fully transparent, ethical, or fair, and thus threaten the integrity and reproducibility of scientific findings [2]. They are not always technically illegal or considered outright misconduct, but they dangerously undermine the credibility of research. A 2012 study suggested that an estimated one in two researchers has engaged in at least one QRP in a three-year period [2].
2. What is the "replication crisis"? The replication crisis, also known as the reproducibility crisis, refers to the growing observation across many scientific fields that a substantial number of published study results cannot be reproduced or replicated by other researchers [19]. This is a fundamental problem because the reproducibility of empirical results is a cornerstone of the scientific method [19]. High-profile projects, such as the Open Science Collaboration's attempt to replicate 100 psychology studies, found that only about 39% of the replicated effects were consistent with the original claims [7].
3. How do QRPs directly contribute to irreproducible findings? QRPs inflate the rate of false-positive results, making it seem like an effect exists when it does not [2]. When research findings are a product of selective reporting or statistical manipulation rather than a true underlying effect, subsequent studies will inevitably fail to reproduce them. This creates a scientific literature filled with false leads, wasting resources and eroding public trust [20]. A large-scale analysis of 163,129 randomized controlled trials found direct indicators of these questionable practices [10].
4. What is the difference between "reproducibility" and "replicability"? While sometimes used interchangeably, these terms have distinct meanings [21].
| Term | Description | Significance |
|---|---|---|
| Repeatability | The same researchers obtain the same result using the same methods, conditions, and location multiple times. | Measures precision under repeated, identical conditions. |
| Replicability | A different set of researchers arrives at the same results using the same methods and conditions as the original study. | The result is not due to chance or a local experimental artifact. |
| Reproducibility | A different group of researchers arrives at the same results using their own data and methods to verify the original analysis. | Answers whether the data and analysis support the published result and conclusions. |
5. Besides QRPs, what other factors cause the replication crisis? Multiple interconnected factors are at play:
This guide helps you identify common symptoms of QRPs in research outcomes and provides step-by-step protocols to address the root causes.
Symptoms: The body of published literature on a topic shows overwhelmingly positive results, but your own attempts to reproduce them fail. Systematic reviews find that published studies have larger effect sizes than unpublished dissertations or pre-registered reports [7].
Root Cause: The tendency to only submit, or for journals to only accept, studies with "positive" or statistically significant results, while withholding null or negative results [20]. This distorts the scientific record, making an intervention seem more effective than it is.
Experimental Protocol to Mitigate Selective Reporting:
Pre-register your study.
Adopt Registered Reports.
Publish all results.
Symptoms: A study reports a just-significant p-value (e.g., p = 0.048) and the result feels fragile. The authors may have tested multiple variables or analytical paths but only reported the one that worked.
Root Cause: The practice of repeatedly analyzing data in different ways (e.g., excluding outliers, combining variables, trying different covariates) until a statistically significant result is found [20] [2]. This dramatically increases the false-positive rate.
Experimental Protocol to Mitigate P-hacking:
Define your analysis plan upfront.
Determine sample size a priori.
Report all analyses transparently.
Symptoms: A research paper's introduction presents a compelling, post-hoc explanation for a complex finding as if it were the driving hypothesis all along. The discussion may over-interpret exploratory results.
Root Cause: Presenting unexpected or post-hoc findings as if they were predicted a priori [7] [20]. This misleads readers about the strength of the evidence and the hypothetico-deductive process that was actually followed.
Experimental Protocol to Mitigate HARKing:
Clearly separate hypotheses.
Use holdout samples.
| Item | Function in Ensuring Rigor & Reproducibility |
|---|---|
| Pre-registration Platforms (e.g., OSF, AsPredicted) | Creates a time-stamped, public record of the research plan to combat HARKing, p-hacking, and selective reporting [2]. |
| Citation Managers (e.g., Zotero, Mendeley) | Helps ensure proper and accurate attribution of ideas and techniques, avoiding improper referencing which is a common QRP [2]. |
| Open Data Repositories (e.g., Zenodo, Figshare) | Provides a platform to share raw data, making findings reproducible and allowing for reanalysis, which helps detect errors and QRPs [22]. |
| Version Control Systems (e.g., Git) | Tracks every change made to code and analysis scripts, providing a complete audit trail and ensuring computational reproducibility [22]. |
| Electronic Lab Notebooks (ELNs) | Facilitates rigorous and detailed record-keeping of all procedures, materials, and decisions, which is foundational for reproducibility [2]. |
The following diagram illustrates the logical relationship between systemic pressures, researcher decisions, and the ultimate crisis of irreproducibility.
Understanding the scope and nature of Questionable Research Practices (QRPs) requires robust quantitative methods. Prevalence studies and surveys are essential tools for systematically measuring how often these practices occur, identifying risk factors, and evaluating the effectiveness of interventions designed to improve research integrity. This guide provides methodologies and resources for conducting such quantitative investigations into researcher behaviors.
You can employ several quantitative designs, each with distinct strengths for investigating QRPs [24]. The choice depends on your research question, resources, and ethical considerations.
The table below summarizes the key designs:
| Research Design | Goal | Best Scenarios for QRPs Research | Key Limitations |
|---|---|---|---|
| Cross-Sectional Survey [24] [25] | To provide a "snapshot" of the prevalence of QRPs at a single point in time. | Gauging self-reported frequencies of QRPs across a wide population of researchers; assessing opinions and experiences surrounding QRPs. | Cannot establish causality; prone to sampling bias and social desirability bias where researchers may under-report QRPs. |
| Longitudinal Research [25] | To track data and behaviors, such as QRP engagement, over an extended period. | Studying the long-term effects of integrity training programs; observing how QRP prevalence shifts throughout researchers' careers or in response to policy changes. | Can be expensive and time-consuming; subject to participant attrition over time. |
| Correlational Research [25] [26] | To examine relationships between variables without implying causation. | Investigating associations between QRP engagement and factors like career stage, publication pressure, research field, or specific laboratory environments. | Correlation does not prove causation; results can be misleading due to confounding variables. |
| Cohort Studies [24] | To understand the causes or temporal associations of an outcome by following groups over time. | Following a cohort of early-career researchers prospectively to identify predictors of later QRP engagement. | Evidence does not ensure causality; requires large samples and is vulnerable to attrition. |
| Experimental / Quasi-Experimental [24] [25] | To establish cause-and-effect relationships by manipulating variables. | Testing the efficacy of different educational interventions (e.g., ethics training, blinded data analysis) in reducing QRPs among research labs. | True experiments may not be feasible for ethical or practical reasons; quasi-experiments have threats to internal validity. |
QRPs are behaviors that compromise the replicability and validity of research findings [7]. While not exhaustive, the following table details common QRPs identified in the literature, which can form the basis of your survey items or data extraction criteria.
| Questionable Research Practice | Description | Potential Quantitative Measure |
|---|---|---|
| Selective Data Reporting [7] | Selectively reporting positive or statistically significant results while omitting negative or insignificant results. | Percentage of researchers who admit to excluding non-significant data points from a manuscript. |
| p-hacking [7] | Selectively conducting data analyses to produce or enhance positive/statistically significant outcomes. | Frequency of trying various statistical tests until a significant p-value is obtained. |
| HARKing (Hypothesizing After the Results are Known) [7] | Formulating hypotheses after study outcomes are known to "fit" the data. | Self-reported incidence of presenting a post-hoc hypothesis as if it were a priori. |
| Selective Procedural Reporting [7] | Omitting possible confounds from procedural descriptions that could explain positive outcomes. | Rate of failing to report a key methodological detail that could explain the observed effect. |
| Selective Outcome Reporting [7] | Writing a paper's abstract or discussion to selectively downplay undesirable results and/or emphasize desired results. | Prevalence of downplaying non-significant findings in the discussion section of a paper. |
| Selective Recruiting [7] | Selectively recruiting participants into a treatment condition who are more likely to show positive effects and not reporting this. | Incidence of assigning participants to groups based on their pre-test responses to ensure a desired outcome. |
Maximizing validity is challenging when researching sensitive topics like QRPs. Key threats and their mitigations are listed below [24].
| Threat to Validity | Definition | Impact on QRPs Research | Recommended Mitigations |
|---|---|---|---|
| Selection Bias | Systematic differences between groups before the study. | Your sample may over-represent certain types of researchers (e.g., those already concerned about integrity), skewing prevalence rates. | Use stratified random sampling to ensure representation across fields, career stages, and regions. Statistically check for baseline differences. |
| Social Desirability Bias | Participants responding in a way they believe is socially acceptable rather than truthfully. | Researchers are likely to under-report their engagement in QRPs, leading to underestimation of prevalence. | Use anonymous, confidential surveys. Phrase questions neutrally and assure participants that honesty is valued for improving science. |
| Attrition (Mortality) | Participant dropout affects group representativeness. | In longitudinal studies, researchers who engage in QRPs may be more likely to drop out, biasing later results. | Maintain strong engagement with participants; use tracking methods and reminders. |
| Instrumentation | Changes in measurement tools or procedures. | If you modify your survey midway through a study, you cannot compare results from the two versions. | Keep survey instruments and procedures consistent throughout the data collection period. |
| History | External events affecting the study. | A high-profile case of research misconduct during your study could temporarily influence how participants respond. | Be aware of the research climate and consider its potential influence when interpreting data. |
For every QRP, there is an improved, "integrity-positive" practice. Your research can measure the adoption of these improved practices as a positive outcome.
Problem: You are not receiving a sufficient number of survey responses, threatening the external validity of your prevalence estimates.
Solution:
Problem: You are unsure how to structure your study to produce reliable, generalizable data on QRPs.
Solution: Follow this workflow to design a robust prevalence study. The process involves defining your scope, choosing a design, developing your instrument, and executing your plan with careful attention to sampling and validity.
Problem: You observe a correlation (e.g., between high publication pressure and QRP engagement) and are tempted to state it implies causation.
Solution:
The following table details key "reagents" or essential components for conducting rigorous quantitative research into QRPs.
| Item / Solution | Function in QRPs Research |
|---|---|
| Validated Survey Instruments | Pre-existing, psychometrically tested questionnaires (e.g., on scientific misconduct, perceived pressure) provide reliable and comparable measures across studies. |
| Statistical Analysis Software (e.g., R, Python, SPSS) | Essential for conducting descriptive statistics (e.g., prevalence rates), inferential tests (e.g., t-tests, ANOVAs to compare groups), and modeling relationships (e.g., regression analysis). |
| Online Survey Platforms (e.g., Qualtrics, REDCap) | Facilitate the efficient distribution of surveys to a wide audience, enable anonymous data collection, and often have built-in tools for basic data analysis. |
| Sample Frame (e.g., Professional Directory) | A comprehensive list of the population from which to draw your sample (e.g., membership lists of scientific societies, university faculty directories) to ensure proper sampling. |
| Data Management Plan | A formal plan outlining how data will be handled during and after the project, ensuring integrity, security, and future reproducibility of your research on research. |
| Preregistration Template | A template for preregistering your study's hypotheses and analysis plan on a platform like the Open Science Framework (OSF), demonstrating your commitment to improved practices. |
1. What are the most common signs that I might be engaging in a Questionable Research Practice (QRP)?
You may be engaging in a QRP if you find yourself:
2. My results are not significant. Is it acceptable to collect more data until they become significant?
No, this is a form of p-hacking and is considered a QRP. Stopping data collection once a desired p-value is reached, or continuing to collect data until significance is achieved, dramatically increases the rate of false positives [2]. The solution is to determine your sample size a priori using a power analysis before you begin data collection and to adhere to this plan [2]. For ongoing data analysis, established methods like sequential analyses should be used instead of this ad-hoc approach [29].
3. I had to exclude some data. How can I ensure this is transparent and not a QRP?
The key to transparency is pre-specification. You must create a set of clearly defined, justified exclusion criteria before you begin data collection or analysis [2]. These criteria and the justification for them should be outlined in a pre-registration document or study protocol. When you write up your results, you must explicitly state which data were excluded and reference the pre-specified rule that justified the exclusion [2].
4. What is the single most effective practice to protect my research from QRPs?
Pre-registration is widely regarded as one of the most effective guards against QRPs [2] [28]. By creating a time-stamped, detailed analysis plan publicly available before conducting the study, you commit to your hypotheses and methods. This prevents later, post-hoc decisions from being presented as a priori predictions, thereby combating HARKing, p-hacking, and selective reporting [28].
5. Are there publication formats that reward rigorous methods over significant results?
Yes, Registered Reports (RRs) are a publication format designed specifically for this purpose [27] [28]. In an RR, the study introduction and methods are peer-reviewed before data is collected. If the protocol is sound, the journal provisionally accepts the paper for publication regardless of the eventual results. This format eliminates publication bias against null findings and aligns incentives toward methodological rigor [27].
Issue: Suspected Selective Reporting in a Research Literature
Problem: A literature review in your field reveals an overwhelming proportion of studies with statistically significant results, and you suspect that studies with null findings are not being published.
Diagnosis and Resolution Protocol:
Issue: Combating P-hacking in Data Analysis
Problem: A researcher runs multiple statistical tests on a dataset, only reporting the one that yielded a significant p-value, thereby inflating the false positive rate.
Diagnosis and Resolution Protocol:
Table 1: Documented Prevalence and Impact of Questionable Research Practices
| QRP | Documented Prevalence | Primary Impact on Research |
|---|---|---|
| Selective Reporting | Up to 94% of psychologists admit to some form [27]. >25% of pre-registered clinical trials in psychiatry show evidence of it [27]. | Distorts the evidence base; creates an inflated sense of effect certainty; contributes to replication failure [27] [28]. |
| P-hacking | A survey found one in two researchers engaged in at least one QRP in the last 3 years [2]. | Inflates false-positive rates; makes effects appear stronger than they are [2] [28]. |
| HARKing | 96% of standard articles report supported hypotheses vs. 44% of pre-registered studies, suggesting widespread HARKing [27]. | Creates a literature of false, post-hoc hypotheses that cannot be reliably tested, hindering theoretical progress [27]. |
| Publication Bias | 98% of positive antidepressant trials were published vs. 48% of negative trials [27]. Statistically significant findings in psychiatry receive >2x the citations [27]. | Renders the published literature unrepresentative of actual research findings; misinforms meta-analyses and policy [27] [28]. |
Table 2: Effectiveness of Open Science Interventions
| Intervention | Key Documented Outcome | Advantage for Researcher |
|---|---|---|
| Registered Reports (RRs) | 60% of RRs report null results (5x the rate in regular articles) [27]. RRs are perceived as higher quality and are cited similarly to or more than standard articles [27]. | Eliminates publication bias; guarantees publication; reduces anxiety about results; focuses peer review on methodology [27]. |
| Pre-registration | Creates a public record of hypotheses and analysis plans, making it easy to detect HARKing and deviations from the plan [2] [28]. | Defends against accusations of QRPs; strengthens the credibility of your findings; improves study design. |
| Replication Studies | In economics, 61% of replication effect sizes were within a 95% prediction interval, but only 20% of replications had a p-value <0.05 for the original effect [28]. | Provides a direct measure of the reliability of foundational findings in a field; corrects the scientific record. |
Protocol 1: Assessing the Prevalence of HARKing in a Literature
Protocol 2: A P-curve Analysis to Detect P-hacking in a Meta-Analysis
pcurve package in R).
Diagram 1: How incentives drive QRPs and potential solutions.
Diagram 2: Registered Reports workflow.
Table 3: Key Research Reagent Solutions for Preventing QRPs
| Tool / Reagent | Primary Function | Role in QRP Mitigation |
|---|---|---|
| Pre-registration Platforms (e.g., OSF, AsPredicted, ClinicalTrials.gov) | Provides a time-stamped, public record of a study's hypotheses, design, and analysis plan. | Directly combats HARKing and p-hacking by creating an irreversible record of intent [2] [28]. |
| Registered Report Format | A journal publication format where the study protocol is peer-reviewed and accepted before data collection. | Eliminates publication bias and selective reporting by guaranteeing publication based on methodological rigor, not results [27] [28]. |
Power Analysis Software (e.g., pwr package in R, G*Power) |
Calculates the minimum sample size required to detect an effect with a given probability (power). | Prevents underpowered studies, which are a major contributor to false negatives and a motivator for p-hacking [2]. |
| Data & Code Repositories (e.g., OSF, Zenodo, GitHub) | Provides a platform for publicly sharing research data and analysis code. | Enables transparency, reproducibility, and post-publication review, making it harder to hide selective reporting or analytical flexibility [27]. |
| Citation Managers (e.g., Zotero, Mendeley) | Software to organize references and automatically format bibliographies. | Helps avoid improper referencing, a QRP that can constitute a form of plagiarism [2]. |
Problem: A meta-analysis shows conflicting results between studies, making it difficult to draw a reliable overall conclusion.
Diagnosis: Significant between-study inconsistency not explained by sampling error alone.
Solution:
Problem: A large longitudinal dataset, such as child growth records, contains values that are technically possible but highly unlikely given an individual's previous measurements and known biology.
Diagnosis: Presence of population outliers and/or longitudinal outliers.
Solution:
Problem: A cancer registry or other database containing categorical patient information needs to verify record plausibility without feasible manual checks.
Diagnosis: Implausible records due to reporting or data entry errors.
Solution:
FAQ 1: What is the difference between heterogeneity and inconsistency in meta-analysis?
While often used interchangeably, heterogeneity is frequently paired with the random-effects model assumption that true effect sizes vary normally around a grand mean. Inconsistency is a broader term used to inclusively cover all types of between-study discrepancies, including those arising from subgroup effects or a few outlying studies, without assuming a normal distribution [30].
FAQ 2: Are all statistically significant findings likely to be true?
Not necessarily. Several Questionable Research Practices (QRPs), even if employed honestly, can increase the likelihood of false positives. These include:
FAQ 3: What are the most effective ways to prevent honest yet unacceptable research practices?
Promoting a culture of open science is key. Recommended strategies include [33] [18]:
| Tool/Method | Data Type | Primary Function | Key Strength |
|---|---|---|---|
| Cochran's Q Test [30] | Summary (Meta-analysis) | Tests the null hypothesis that all studies in a meta-analysis share a common effect size. | Standard, widely accepted test for between-study heterogeneity. |
| Alternative Q-like & Hybrid Tests [30] | Summary (Meta-analysis) | Tests for inconsistency with higher power under non-normal between-study distributions (e.g., heavy-tailed, skewed). | More robust and flexible than the conventional Q test in many realistic scenarios. |
| I² Statistic [30] | Summary (Meta-analysis) | Quantifies the percentage of total variability in a meta-analysis due to between-study inconsistency. | Intuitive interpretation (e.g., I² of 50% indicates moderate inconsistency). |
| Population Outlier (PO) Identification [31] | Raw (Anthropometric) | Flags values that are biologically implausible for a reference population using pre-defined z-score cutoffs. | Relies on established, external standards (e.g., WHO growth charts). |
| Longitudinal Outlier (LO) Identification [31] | Raw (Longitudinal) | Flags values that are implausible given an individual's own data trajectory using model residuals. | Captures errors that cross-sectional population methods might miss. |
| FindFPOF [32] | Raw (Categorical) | An unsupervised pattern-based anomaly detection method for categorical data. | Does not require labeled data or pre-defined rules for implausibility. |
| Autoencoder [32] | Raw (Categorical/Numerical) | An unsupervised neural network that detects anomalies based on data compression and reconstruction error. | Can find complex, non-obvious error patterns in high-dimensional data. |
| Concept | Definition | Potential Impact on Research |
|---|---|---|
| Questionable Research Practices (QRPs) [33] [18] | "Ways of producing, maintaining, sharing, analyzing, or interpreting data that are likely to produce misleading conclusions, typically in the interest of the researcher." Often deliberate. | Inflated effect sizes, reduced replicability, biased error rates, and compromised generalizability of findings [33]. |
| Honest Yet Unacceptable Research Practices [18] | Unintentional mistakes or weaknesses in research conception, design, or reporting. | Despite being unintentional, these practices are wide-spread and can collectively damage scientific credibility and public trust [18]. |
| P-hacking [33] [34] | A family of data manipulation practices (e.g., selectively excluding data, adding covariates) to achieve a statistically significant p-value. | Increases false-positive rates, leading to a literature filled with non-replicable findings. |
| HARKing [33] [34] | Hypothesizing After the Results are Known; presenting a post-hoc hypothesis as if it was defined a priori. | Creates a misleading narrative of strong confirmatory evidence and undermines the hypothetico-deductive process. |
| Publication Bias [18] [34] | The tendency for journals to publish only studies with statistically significant or "positive" results, while "negative" or null results remain unpublished. | Skews meta-analyses and systematic reviews, overestimating the true effect of an intervention. |
This protocol is based on methods proposed in BMC Medical Research Methodology (2025) [30].
This protocol adapts the real-world evaluation from BMC Medical Research Methodology (2023) [32].
| Item | Function in Data Detection |
|---|---|
| Reference Standards (e.g., WHO Growth Charts) [31] | Provides a benchmark of biological plausibility for identifying population outliers in anthropometric and clinical data. |
| Linear Mixed-Effects Models [31] | Statistical models used to analyze longitudinal data by accounting for both fixed effects (e.g., age) and random individual-specific effects, enabling the calculation of residuals to flag longitudinal outliers. |
| Restricted Cubic Splines [31] | A flexible mathematical tool used within regression models to capture non-linear relationships (e.g., between age and growth) without assuming a strict linear or polynomial form. |
| Parametric Resampling (Bootstrapping) [30] | A computational procedure used to simulate the null distribution of a complex test statistic (e.g., for a hybrid test), allowing for accurate calculation of empirical P-values. |
| One-Hot Encoding [32] | A data pre-processing technique that converts categorical variables into a binary (0/1) matrix format, allowing them to be used by machine learning algorithms like autoencoders. |
The replication crisis in psychological science and other fields has highlighted the detrimental effects of Questionable Research Practices (QRPs) [7]. QRPs are defined as "ways of producing, maintaining, sharing, analyzing, or interpreting data that are likely to produce misleading conclusions, typically in the interest of the researcher" [33]. In response, the research community has developed a suite of statistical forensic tools to detect inconsistencies in published research, thus helping to identify potential QRPs [35]. This technical support center provides detailed guidance on implementing three key analytical techniques—GRIM, SPRITE, and p-curve—enabling researchers, scientists, and drug development professionals to assess the trustworthiness of scientific findings.
1. What are the main limitations of the p-curve technique? While p-curve was a pioneering forensic tool, recent technical critiques have identified significant statistical weaknesses. The formal hypothesis tests within the p-curve framework (e.g., for "evidential value") can exhibit properties such as inadmissibility and non-monotonicity. Furthermore, p-curve's average power estimator is inconsistent and can be substantially biased upward when the set of studies being analyzed has heterogeneous effect sizes or sample sizes [36]. For most applications, the z-curve method is now recommended as a more robust alternative, as it explicitly models heterogeneity and provides reliable confidence intervals [36].
2. The GRIM test shows an inconsistency. Does this prove data fabrication? No, an inconsistency discovered by the GRIM test does not, on its own, prove fabrication. The GRIM test is designed to evaluate the consistency of reported summary statistics [37]. A failed test means the reported mean is mathematically impossible given the stated sample size and scale granularity. This indicates a reporting error, which could range from a simple typo or rounding error to more serious issues like selective reporting or data manipulation [38] [37]. It is a "flag" that warrants further investigation and clarification from the original authors [38].
3. When should I use SPRITE over the GRIM test? Use the GRIM test as an initial, quick check when you only have access to a reported mean and sample size. The SPRITE technique is a more powerful follow-up when you have additional summary statistics, such as a standard deviation (SD). While GRIM can only determine if a specific mean is possible, SPRITE can be used to explore whether any plausible dataset exists that could simultaneously satisfy the reported mean, SD, and sample size [37]. SPRITE is therefore more comprehensive for identifying deeper inconsistencies.
4. Are these forensic techniques only applicable to psychology? Not at all. While they were largely developed and popularized within social psychology, the underlying logic is based on mathematical principles that are universal. The GRIM test, for instance, applies to any research involving small samples and data composed of whole numbers or Likert-type scales [37]. These methods have been applied to research in medicine, biology, and other fields where summary statistics are reported [10].
Problem: The calculated GRIM value is inconclusive.
Problem: I am unsure if the GRIM test is appropriate for the data type.
| Data Type | Suitable for GRIM? | Notes |
|---|---|---|
| Likert Scales (e.g., 1-7) | Yes | The primary use case. Data are integers. |
| Counts (e.g., cells, incidents) | Yes | Data are whole numbers. |
| Age (reported in whole years) | Yes | Data are integers. |
| Continuous Measures (e.g., height, concentration) | No | Data are not necessarily integers, so the mean can have any value. |
| Percentages | Yes, with care | Can be treated as a mean on a 0-100 scale. |
Problem: SPRITE cannot reconstruct a plausible dataset from the reported summary statistics.
Problem: The SPRITE analysis is running slowly.
Problem: p-curve results are unreliable due to heterogeneous studies.
Problem: I have multiple p-values per study; which one should I include in a p-curve/z-curve?
The following diagram illustrates the general decision-making process for a forensic metascientific analysis, from initial flag to final conclusion.
Forensic Metascience Analysis Workflow [38]
The table below lists key statistical tools and resources essential for conducting forensic metascientific analyses.
| Tool Name | Type | Primary Function | Key Considerations |
|---|---|---|---|
| GRIM | Statistical Test | Checks if a reported mean is mathematically consistent with its sample size [37]. | Best for small-N studies with integer data (e.g., Likert scales). A simple, first-pass check. |
| SPRITE | Statistical Test | Generates plausible datasets that fit reported mean, SD, and N. Checks if any such dataset can exist [37]. | More powerful than GRIM. Used when SD is available to test deeper consistency. |
| P-Curve | Meta-Analytic Tool | Analyzes the distribution of statistically significant p-values in a set of studies to assess evidential value and estimate average power [36]. | Has known statistical weaknesses; sensitive to heterogeneity. Use with caution [39] [36]. |
| Z-Curve | Meta-Analytic Tool | Models the distribution of z-statistics to estimate expected replication rates and average power, accounting for heterogeneity [36]. | Recommended modern alternative to p-curve. Provides robust estimates with confidence intervals [36]. |
| StatCheck | Software Tool | Scans documents for statistical reporting inconsistencies, particularly mismatches between p-values and test statistics [40]. | Useful for automated, large-scale checks of many papers at once. |
| R/Python | Programming Language | Provides environments for implementing custom forensic analyses and using specialized packages. | Essential for flexibility. Many forensic techniques (like SPRITE) require custom scripting [37]. |
FAQ 1: Our AI tool flagged a journal as 'questionable,' but it is listed in a major database. What could be the reason for this discrepancy? This discrepancy can arise from several factors. The AI model may have detected specific features on the journal's website that are strong indicators of questionable practices, even if the journal has managed to get listed in a database. These features can include a very high volume of published articles, an overly broad scope, or an editorial board listing that includes reputable researchers without their consent [41]. It is crucial to use the AI's output as a reference for further investigation, not as an absolute judgment. The final decision should involve a human expert who can manually verify the journal's peer-review policy, editorial board authenticity, and other quality markers [42].
FAQ 2: We are encountering a high rate of false positives with our current journal screening tool. How can we improve its accuracy? High false-positive rates often indicate that the model's training data is not representative or its feature extraction is too sensitive. To improve accuracy:
FAQ 3: How can we validate the performance of a new AI-based screening system before full deployment? A robust validation protocol should be followed:
Protocol 1: Automated Identification of Questionable Journals
This protocol outlines the methodology for building an AI system to screen journals, based on systems described in recent research [41] [42].
Diagram: Workflow for Automated Journal Screening
Protocol 2: Detecting AI-Generated Text in Manuscript Submissions
This protocol addresses the challenge of identifying machine-generated text, a growing concern for research integrity [44].
The following table summarizes quantitative data on the performance of various AI tools used in research integrity and screening tasks, as reported in the search results.
Table: Performance Metrics of AI Tools for Research Integrity
| Tool / System Name | Primary Function | Reported Performance / Key Metric | Source / Context |
|---|---|---|---|
| CU Boulder AI System | Identifies questionable journals | Flagged ~1,400 journals as potentially problematic from a list of ~15,200. After human review, over 1,000 were confirmed questionable [41]. | Academic Study |
| Elicit | Data extraction for systematic reviews | Achieved 99.4% accuracy (1,502 correct out of 1,511 data points) in one systematic review [43]. | Company Website / User Report |
| Proofig AI | Image duplication and manipulation detection | Adopted by major journals/publishers (e.g., American Association for Cancer Research, Science family of journals) for pre-publication checks [45]. | News Article |
| AI Text Detectors | Differentiate HWT from MGT | Performance is highly variable. One detector (Copyleaks) showed relative resistance to adversarial attacks, but no tool is completely reliable. Accuracy can be genre-dependent [44]. | Research Perspective |
Table: Research Reagent Solutions for AI-Driven Screening
| Tool / Resource | Type / Category | Primary Function in Screening |
|---|---|---|
| Directory of Open Access Journals (DOAJ) | Data Source / Whitelist | Provides a vetted list of legitimate open-access journals for training AI models and for manual verification [42]. |
| Beall's List / Stop Predatory Journals | Data Source / Blacklist | Historical and community-maintained lists of predatory publishers and journals; used as a source of negative examples for model training [42]. |
| Bag-of-Words & TF-IDF | Algorithm / Feature Extraction | Converts unstructured text from journal websites into a structured, quantifiable format for machine learning models [42]. |
| Diff Score Calculation | Algorithm / Feature Enhancement | Identifies words and phrases that are statistically overrepresented in questionable journals, sharpening the model's predictive power [42]. |
| Random Forest / Naive Bayes | Algorithm / Classifier | Machine learning models that use the extracted features to perform the final classification of a journal as legitimate or questionable [42]. |
| Proofig AI | Software Tool | An AI-powered platform that checks for image duplication, manipulation, and reuse within scientific papers [45]. |
| Scite | Software Tool | Uses AI to analyze how scientific papers are cited (e.g., as supporting or contrasting evidence), helping to assess the reliability of claims [46]. |
The following diagram illustrates the logical decision pathway an AI model might follow when analyzing a journal, based on the key features identified in the research [41] [42].
Diagram: AI Journal Screening Decision Logic
Single-case experimental designs (SCEDs) represent a family of research methods that use experimental procedures to study the effects of interventions on individual cases. Unlike group comparison research, SCEDs rely on repeated measurements over time with the individual case serving as its own control [47] [48]. The replication of intervention effects within and/or across cases provides the foundation for establishing causal inferences [47].
Questionable research practices (QRPs) have been identified as a significant contributor to the replication crisis across multiple scientific fields [7] [49]. While initially discussed primarily in the context of group comparison research with null-hypothesis statistical testing, QRPs present equally serious concerns for SCED research, though they may manifest differently due to the methodological distinctions of single-case methodology [7]. Researchers have identified the need to specifically examine how QRPs occur in SCEDs and to develop improved research practices as alternatives [7] [49].
Table 1: Common Questionable Research Practices in Single-Case Experimental Designs
| QRP Category | Description | SCED-Specific Manifestations |
|---|---|---|
| Selective Data Reporting | Omitting data that do not support hypotheses or show weak effects [7] | Excluding entire participants, specific dependent variables, or data points that demonstrate unstable responding or weak treatment effects [7] [49] |
| Graphical Manipulation | Altering visual representation of data to enhance apparent effects | Modifying scaling of x- or y-axes, omitting data points indicating instability, selectively combining dependent variables in graphs [7] |
| Procedural Omission | Failing to report potential confounds that could explain effects | Not documenting implementation fidelity, environmental variables, or concurrent treatments that might influence outcomes [7] |
| Selective Outcome Reporting | Emphasizing desirable results while downplaying undesirable findings in abstracts and discussions [7] | Writing discussions that highlight positive visual analysis while minimizing interpretations of unstable data or limited functional relations |
| Flexible Design Execution | Exploiting researcher degrees of freedom during study implementation | Making unplanned changes to phase change criteria, altering intervention intensity without documentation, changing measurement procedures mid-study |
Multiple lines of evidence suggest QRPs occur in SCED research. Systematic reviews comparing published and unpublished SCED studies have found that published studies typically show larger effect sizes than unpublished studies [7] [49]. For example, Sham and Smith (2014) found published studies on pivotal response treatment had larger treatment effects than unpublished studies [7]. Similarly, Dowdy et al. (2020) reported larger effect sizes for published studies of response interruption and redirection compared to unpublished works [49].
Comparative analyses of dissertations and their corresponding journal articles provide further evidence. In one examination of 124 dissertation-article pairs, 12.4% of articles omitted one or more participants and/or dependent variables from the corresponding dissertation [49]. Published studies also showed a higher proportion of experimental effects to non-effects and larger effect sizes compared to dissertations [49].
Table 2: Methodological Approaches for Detecting QRPs in SCEDs
| Detection Method | Implementation Protocol | Indicators of Potential QRPs |
|---|---|---|
| Publication Bias Analysis | Compare effect sizes between published and unpublished studies on similar topics using systematic review methods [7] [49] | Significant discrepancy between published and unpublished effect sizes; absence of small-effect studies in literature |
| Source Document Comparison | Compare dissertations/theses with resulting publications for completeness of reporting [49] | Omitted participants, conditions, or dependent variables; enhanced effect sizes in publications |
| Visual Analysis Verification | Apply standardized visual analysis criteria to published graphs; check for graphical integrity [7] [48] | Inconsistent scaling; missing data points; altered axis proportions; discrepancies between visual analysis statements and graphed data |
| Methodological Consistency Assessment | Compare reported methods with SCED quality standards and check for internal consistency [7] [48] | Unreported changes in procedures; insufficient data points per phase; lack of demonstrated experimental control |
| Replication Failure Analysis | Examine direct and systematic replication attempts for consistency of effects [7] | Inconsistent outcomes across similar participants, settings, or implementations |
Aim: To develop a standardized protocol for identifying potential QRPs in published SCED research.
Materials: Sample of published SCED studies, corresponding unpublished documents (dissertations, conference presentations, registrations) when available, standardized coding manual, multiple trained coders.
Procedure:
Validation: Establish inter-rater reliability among coders; validate findings through comparison with author self-reports when possible; triangulate across multiple detection methods.
Table 3: Research Reagent Solutions for QRP Detection and Prevention
| Tool Category | Specific Tools/Resources | Function and Application |
|---|---|---|
| Reporting Standards | SCRIBE, CONSORT extensions for SCEDs [48] | Standardized reporting checklists to improve transparency and completeness |
| Design Quality Assessment | What Works Clearinghouse Standards, RoBiNT Scale [48] | Quality appraisal tools to evaluate methodological rigor of SCEDs |
| Data Analysis Tools | Visual analysis protocols, effect size calculators, statistical packages for SCEDs [47] [50] | Complementary analysis methods to reduce reliance on subjective interpretation |
| Transparency Enhancements | Open Science Framework repositories, preregistration templates [7] | Platforms for sharing protocols, data, and materials to enable verification |
| Methodological Guides | Design standards texts, methodological tutorials [47] [48] | Resources for proper design implementation and reporting |
Q1: How can I distinguish between intentional selective reporting and practical space limitations in publications?
A1: Intentional selective reporting typically follows a systematic pattern where omitted data consistently show weaker effects. Practical space limitations should result in random or justified omissions. Check whether authors mention complete data availability elsewhere or provide justification for omissions. Corresponding dissertations often provide the clearest comparison [49].
Q2: What are the most sensitive indicators of graphical manipulation in SCED displays?
A2: Key indicators include: (1) inconsistent axis scaling across similar graphs in the same paper, (2) y-axes that do not begin at zero without clear justification, (3) missing data points that aren't explicitly acknowledged, and (4) disproportionate spacing of time units that may visually exaggerate effects [7].
Q3: How can researchers balance flexibility in SCED implementation with prevention of QRPs?
A3: Pre-registration of design decisions and phase change criteria provides the optimal balance. Document all a priori decisions about conditions for phase changes, number of data points, and handling of unstable data. Any deviations should be explicitly documented with justifications [7] [48].
Q4: What methodological features provide the strongest protection against unintentional QRPs in SCEDs?
A4: Multiple baseline designs across participants, settings, or behaviors provide inherent protection through staggered intervention implementation. Randomization of phase start points, blind data collectors, and interobserver agreement assessment further strengthen validity [47] [48].
Q5: How can the field distinguish between researcher degrees of freedom and legitimate adaptive design changes?
A5: Legitimate adaptations typically: (1) respond to ethical concerns, (2) address unexpected participant needs, (3) are thoroughly documented with rationale, and (4) maintain the experimental integrity of the design. Changes made solely to enhance effects without these characteristics may represent QRPs [7].
The identification of QRPs in SCEDs has prompted the development of improved research practices as alternatives. Recent initiatives have identified 64 pairs of questionable and improved research practices in SCED across different stages of the research process [7] [49]. These improved practices emphasize:
Future directions include developing standardized QRP detection protocols, creating automated tools for identifying graphical inconsistencies, establishing certification processes for SCED methodology, and enhancing educational curricula to emphasize improved research practices. The integration of causal mediation analysis methods represents another promising avenue for understanding how interventions produce change in SCEDs [50].
FAQ 1: What is the core difference between z-curve and the criticized "post-hoc power analysis"? This is a fundamental conceptual distinction. Z-curve does not estimate effect sizes or produce power estimates for individual studies, which is the flaw in "observed" power methods. Instead, z-curve estimates the population mean power of a set of studies that have been selected for statistical significance. It uses the distribution of p-values to estimate the mean probability that these significant studies would successfully reject the null hypothesis again if replicated, without ever calculating an effect size for a single study [51].
FAQ 2: My dataset contains a mix of t-tests, F-tests, and correlations. Can I use z-curve? Yes, but you must be aware of the approximation involved. Z-curve transforms all test statistics into two-sided p-values and then into equivalent z-scores, treating every test as if it were a two-sided z-test. This works well when per-study sample sizes are sufficiently large (typically N > 30), as the t- and F-distributions then approximate the normal distribution. However, if your dataset has many studies with small sample sizes (N < 20-30) and a small total number of studies (k < 20-30), this transformation can bias power estimates, typically leading to underestimation when true power is high [52].
FAQ 3: The reviewer said "power is not a property of a completed study." Is z-curve based on a misunderstanding? No, this criticism stems from a misconception of the z-curve model. In the z-curve framework, every study in a population of studies has a true power, defined by its design, procedure, and subject population. This power is a frequentist probability—the long-term relative frequency of rejection based on hypothetical repeated sampling. Z-curve aims to estimate the mean of these true power values specifically for the sub-population of studies that obtained statistically significant results, which roughly corresponds to the mean power of published results in a field [51].
FAQ 4: What is the minimum number of studies needed for a reliable z-curve analysis? While there is no absolute minimum, simulation results indicate that the approximation used in the p-value to z-score transformation becomes more reliable as the number of studies increases. Datasets with a small number of studies (approximately ≤ 30) combined with small per-study sample sizes are particularly problematic and may produce biased estimates. For more robust results, aim for larger sets of studies (k ≥ 100), which help smooth out the effect of the approximation [52].
Problem: Low Estimated Mean Power and Suspected Bias
Problem: Bias from Small Sample Sizes and Statistical Transformation
t-curve, which fits a mixture model using non-central t-distributions instead of normal distributions [52].weightr package) might be more appropriate [52].Problem: Inconsistent Test Statistics and Incompatible Input Data
Z = qnorm(1 - p/2) [52].Core Conceptual Model: Z-curve operates on a "coin-tossing" model. Designing a study is like manufacturing a biased coin with a probability of "heads" (success) equal to its true power. Running the study is tossing the coin. Studies that show "tails" (non-significant results) are discarded. Z-curve's goal is to estimate the average bias (mean power) of the coins that showed "heads" in the file-drawer-aware population of published, significant results [51].
Data Preparation Protocol:
Z = qnorm(1 - p/2). This uses the quantile function of the standard normal distribution.Analysis Execution: The following table summarizes the core quantitative outputs from a z-curve analysis and their interpretation.
Table 1: Key Output Metrics from Z-Curve Analysis
| Metric | Description | Interpretation |
|---|---|---|
| Mean Power (Expected Replication Rate) | The estimated average probability that the significant studies in the set would again produce a significant result in an exact replication [51]. | The primary indicator of literature reliability. Lower values (< 0.5) indicate a high risk of false positives and unreliable findings. |
| File Drawer Ratio | An estimate of the ratio of non-significant studies filed away to the significant studies published. | A higher ratio suggests more severe publication bias. |
| The Z-Curve Plot | A density plot of the significant z-scores, overlayed with the modeled distribution. | Visualizes the distribution of evidence. A "lump" of z-scores just above 1.96 suggests p-hacking. |
The following diagram illustrates the sequential steps for performing a z-curve analysis, from data collection to interpretation.
Table 2: Essential Software and Conceptual Tools for Credibility Assessment
| Tool / Resource | Function | Relevance to QRPs Identification |
|---|---|---|
Z-Curve R Package (zcurve) |
Implements the z-curve algorithm for estimating mean power and other diagnostics from a sample of p-values [51]. | The primary software for conducting the analysis described here. |
| Two-Sided P-Values | The standardized input for z-curve, derived from various test statistics [52]. | Ensures consistency when analyzing heterogeneous literature. Ignoring sign allows for analysis across studies with effects in opposing directions. |
| File Drawer Model | The conceptual model that accounts for the selection process where non-significant results are not published [51]. | Central to understanding and quantifying publication bias, a major category of QRPs. |
| Statistical Power (True Power) | A property of a study's design and population, defined as the probability of rejecting a false null hypothesis [51]. | The core quantity z-curve seeks to estimate for the population of published studies. Distinguishing this from "observed power" is critical. |
| Inverse Normal CDF | The mathematical function (qnorm in R) that converts a p-value to its corresponding z-score [52]. |
The technical mechanism for creating a common scale (z-scores) from diverse statistical tests. |
Q1: What are Questionable Research Practices (QRPs) and why is detecting them during peer review critical? Questionable Research Practices (QRPs) are actions that undermine research integrity but often occupy an ethical gray area, falling short of outright fraud like fabrication or plagiarism [53] [4]. Common examples include data manipulation, selective reporting of outcomes, hypothesizing after results are known (HARKing), and problematic authorship [53] [4]. Detecting them during peer review is crucial because these practices, while sometimes seemingly minor in isolation, cumulatively skew the scientific literature, undermine its reliability, and can lead to the canonization of erroneous claims, wasted resources, and eroded public trust in science [53] [4].
Q2: What are the most common factors that lead researchers to engage in QRPs? The engagement in QRPs is influenced by a combination of individual and systemic factors [4]. Key drivers include:
Q3: What is the reviewer's fundamental responsibility when checking for QRPs? The reviewer's primary duty is to act as a gatekeeper for scientific quality and integrity. This involves moving beyond simply assessing the novelty and apparent impact of a study to critically evaluating the methodological rigor, data analysis, and interpretation to ensure the findings are valid, reliable, and honestly presented [54]. This service is essential for maintaining trust in the scientific literature [54].
Q4: Can you provide specific red flags for selective reporting (cherry-picking) in a manuscript? Yes, several red flags can indicate selective reporting:
Problem: A reviewer suspects that data may have been manipulated or fabricated, but has no direct access to the original raw data to confirm.
Solution: A Step-by-Step Diagnostic Protocol
Problem: A manuscript reports a positive result, but the statistical approach or choice of reported results appears designed to find statistical significance (p-hacking).
Solution: A Methodological Interrogation Workflow
Table: Diagnostic Checklist for Suspected p-Hacking
| Checkpoint | Question for the Reviewer to Ask | Action if Anomaly is Detected |
|---|---|---|
| Pre-registration | Is an analysis plan publicly available and followed? | Note deviations in review; downgrade confidence in results. |
| Multiple Testing | Are many comparisons made without appropriate corrections (e.g., Bonferroni)? | Request statistical correction; interpret "significant" p-values with caution. |
| Outcome Switching | Are the analyzed outcomes different from those stated in the introduction or methods? | Query authors on the change; suspect selective reporting. |
| Data Peeking | Does the stopping rule for data collection seem arbitrary or undefined? | Question the sampling procedure; request a justification. |
Problem: The author list includes individuals who may not have contributed substantially (honorary/gift authorship) or omits individuals who should be included (ghost authorship).
Solution: An Attribution Verification Protocol
This section provides a detailed methodology for integrating QRP checks into the standard peer review workflow.
Protocol 1: The QRP Screening Workflow for Reviewers
Objective: To systematically screen a manuscript for signs of Questionable Research Practices during the initial review phase.
Materials:
Procedure:
Table: QRP Screening Checklist for Peer Review
| Category | Item to Check | Status (Pass/Flag/NA) | Notes for Editor/Authors |
|---|---|---|---|
| Transparency | Pre-registration documented and adhered to? | ||
| Data/Code availability statement provided? | |||
| Methods | Sample size justification provided? | ||
| All described measures/conditions reported? | |||
| Data exclusion criteria pre-defined and followed? | |||
| Results | All primary and secondary outcomes reported? | ||
| Statistical tests appropriate and correctly applied? | |||
| Figures and tables consistent with text? | |||
| Authorship | Author contributions seem appropriate and justified? | ||
| Overall | Any suspicion of data fabrication/falsification? |
Protocol 2: A Pre-Submission Self-Audit for Authors to Prevent QRPs
Objective: To provide authors with a concrete procedure for auditing their own manuscript prior to submission, reducing the likelihood of unintentional QRPs and strengthening the manuscript's integrity.
Materials:
Procedure:
The following diagram illustrates a practical workflow for integrating QRP checks into the standard peer review process, providing a visual guide for both editors and reviewers.
This table details key resources and tools that researchers and reviewers can use to identify and prevent Questionable Research Practices.
Table: Research Reagent Solutions for QRP Identification and Prevention
| Tool / Resource Name | Type | Primary Function in QRP Context | Relevance to Protocol |
|---|---|---|---|
| Pre-Registration Platforms (e.g., OSF, AsPredicted, ClinicalTrials.gov) | Protocol Repository | Provides a time-stamped, public record of hypotheses, methods, and analysis plans before data collection, combating HARKing and selective reporting. | Critical for Protocol 1 (Alignment Check) and Protocol 2 (Self-Audit). |
| Statistical Screening Tools (e.g., statcheck for p-values, GRIM Test) | Software Tool | Automatically scans manuscripts for inconsistencies in reported statistics (e.g., p-values that don't match test statistics) or possible data-level impossibilities. | Aids in Protocol 1, Results-First Analysis and Consistency Verification. |
| Image Forensics Software (e.g., ImageTwin, Forensics) | Software Tool | Detects image duplications, manipulations, and splicing in figures, helping to identify data fabrication/falsification. | Used in Scenario 1 (Data Manipulation), Step 3. |
| Reporting Guidelines (e.g., CONSORT, ARRIVE, PRISMA) | Reporting Standard | Provides a checklist of essential information that must be included in a manuscript to ensure methodological transparency and completeness. | Forms the basis for the QRP Screening Checklist in Protocol 1. |
| Contributorship Taxonomies (e.g., CRediT) | Classification System | Provides a standardized list of 14 roles to describe each author's specific contribution, clarifying authorship and reducing gift/ghost authorship. | Central to Scenario 3 (Authorship Issues) and Protocol 2 (Self-Audit). |
| Data & Code Repositories (e.g., Zenodo, Dryad, GitHub) | Data Repository | Enables public sharing of raw data and analysis code, allowing for independent verification of results and enhancing transparency. | Checked in Protocol 1, Transparency category of the screening checklist. |
Questionable Research Practices (QRPs) are a range of actions that undermine research integrity without always constituting outright fraud. They often thrive in low-transparency environments and contribute to the reproducibility crisis [55]. The table below summarizes common QRPs, their descriptions, and prevalence based on researcher self-reports.
Table 1: Common Questionable Research Practices (QRPs) and Their Prevalence
| QRP Category | Specific Practice | Description | Reported Prevalence |
|---|---|---|---|
| Data Collection & Analysis | P-hacking [56] [55] | Manipulating data analysis until a statistically significant result (p < 0.05) is achieved. | Over 30% of psychologists admitted to QRPs [56]. |
| Selective Reporting [53] | Reporting only favorable results while omitting negative or inconclusive findings. | Nearly 20% of researchers admitted to modifying data for presentation [53]. | |
| Hypothesizing & Reporting | HARKing [56] [55] | Hypothesizing After the Results are Known; presenting a post-hoc hypothesis as if it were a priori. | A survey in applied linguistics found 94% engaged in one or more QRPs [55]. |
| Overhyping Results [18] | Exaggerating the significance or impact of research findings in written reports. | Prevalent in health disciplines due to cultural pressures [18]. | |
| Publication & Authorship | Salami Slicing [18] | Unjustifiably splitting results from a single study into multiple papers to increase publication count. | Often incentivized and rewarded in academic systems [18]. |
| Ghost or Honorary Authorship [53] | Denying authorship to contributors (ghost) or granting authorship to non-contributors (honorary). | A common form of QRP across disciplines [53]. |
Open Science practices introduce transparency and rigor at key stages of the research lifecycle, directly countering QRPs. The workflow below illustrates how these practices integrate into a robust research process.
This section addresses common challenges researchers face when implementing Open Science practices.
Q1: My preregistration has a serious error, or my plan changed after I started. What should I do?
Q2: Does preregistration mean I cannot do any unplanned, exploratory analyses? No. A core purpose of preregistration is to distinguish between confirmatory and exploratory analyses, not to eliminate exploration [57]. Exploratory analysis is crucial for discovery and hypothesis generation. The key is to clearly label which analyses were planned (confirmatory) and which were unplanned (exploratory) in your final report, so readers can interpret the evidence appropriately [57].
Q3: Can I preregister a study if I am using a pre-existing dataset? Yes, but with strict conditions to preserve the confirmatory nature of the analysis. You must justify how you will avoid bias from prior knowledge of the data [57]. The levels of eligibility are:
Q4: I have concerns about sharing data due to participant confidentiality or commercial sensitivity. How can I still be transparent?
Q5: My code is messy and I'm embarrassed to share it. What are my options?
Q6: How can I practice Open Science when my lab, supervisor, or field does not support it?
This table lists key digital and methodological "reagents" needed to conduct transparent, reproducible research.
Table 2: Research Reagent Solutions for Open Science
| Tool / Resource | Category | Primary Function | Key Antidote to QRPs |
|---|---|---|---|
| Open Science Framework (OSF) [58] | Registry Platform | A free, open-source project management repository to preregister studies, share data, code, and materials. | Centralizes all practices; counters opacity, selective reporting. |
| Preregistration Templates [57] | Methodology | Standardized forms (e.g., from OSF, AsPredicted) to guide the creation of a comprehensive research plan. | Prevents HARKing and P-hacking by locking in hypotheses and analysis plans. |
| R / Python with RMarkdown/ Jupyter | Computational Tool | Languages and literate programming environments that combine code, output, and narrative in a single document. | Makes data analysis fully reproducible, countering analytical flexibility. |
| Git / GitHub | Version Control | A system to track changes in code and manuscripts, facilitating collaboration and documenting the evolution of a project. | Creates an audit trail, counters post-hoc decisions and file management issues. |
| Figshare / Zenodo | Data Repository | General-purpose public data repositories for publishing and sharing research datasets with a persistent identifier (DOI). | Enables data sharing mandates, counters low scrutiny and prevents data loss. |
Preregistration is a powerful tool to counter HARKing and p-hacking. The diagram below details the key steps and decision points in creating a effective preregistration.
Experimental Protocol: Creating a Preregistration
Objective: To create a time-stamped, uneditable research plan that clearly distinguishes between confirmatory and exploratory research components [57] [59].
Materials:
Methodology:
Registration Phase:
Post-Registration Phase:
What are Questionable Research Practices (QRPs) in experimental design? Questionable Research Practices (QRPs) are procedures that compromise the replicability, validity, and integrity of research conclusions. Unlike outright fraud, QRPs often occupy a gray area but significantly threaten the reliability of scientific findings. Examples include selective reporting of results, hypothesizing after the results are known (HARKing), and p-hacking [7] [60]. In single-case experimental designs (SCED), this might manifest as altering graph scales to make visual effects appear more robust [7].
Why is it critical to replace QRPs with improved practices? QRPs can lead to a literature filled with non-replicable findings, wasted resources, and potentially countertherapeutic (iatrogenic) effects if clinical treatments are based on unreliable evidence [7]. Adopting improved practices strengthens the scientific process, enhances the credibility of your work, and ensures that conclusions are valid and dependable.
What are some common motivations behind researchers using QRPs? Researcher behavior is influenced by multiple contingencies. While one motivation is to contribute to a valid and robust research literature, another powerful motivator is the pressure to publish, which confers career advancement, prestige, and funding. When the latter dominates, QRPs become more likely [7].
The table below outlines specific QRPs paired with actionable, improved research practices you can implement immediately.
Table 1: Questionable and Improved Research Practices in Data Collection and Analysis
| Issue Area | Questionable Research Practice (QRP) | Improved Research Practice | Key Benefit |
|---|---|---|---|
| Data Reporting | Selective Data Reporting: Omitting negative or statistically insignificant results [7]. | Pre-register analysis plans and commit to reporting all collected data, regardless of outcome [7]. | Reduces file drawer effect, ensures a complete picture of the research. |
| Data Analysis | p-hacking: Collecting more data after seeing if results are significant or selectively conducting analyses to produce positive results [7]. | Establish a fixed data collection stopping rule and pre-specify all primary data analyses before examining the data [7]. | Prevents artificial inflation of effect sizes and false positives. |
| Hypothesizing | HARKing (Hypothesizing After the Results are Known): Presenting unexpected findings as if they were predicted all along [7]. | Clearly distinguish between confirmatory (hypothesis-driven) and exploratory (data-driven) analyses in the research report [7]. | Maintains intellectual honesty and helps others correctly interpret findings. |
| Graphical Presentation (SCED) | Manipulating graph axes or omitting data points to enhance the appearance of a visual effect [7]. | Adhere to standard graphing conventions, maintain consistent axis scales, and include all data points in visual analyses [7]. | Ensures visual analysis is objective and not misleading. |
| Procedural Reporting | Omitting possible confounds from the methodology section that could explain the results [7]. | Provide a complete and transparent account of the procedure, including any potential limitations or confounding variables [7]. | Allows for accurate replication and critical evaluation of the study's validity. |
Table 2: Questionable and Improved Research Practices in Study Design and Publication
| Issue Area | Questionable Research Practice (QRP) | Improved Research Practice | Key Benefit |
|---|---|---|---|
| Participant Recruitment | Selectively recruiting participants into a treatment condition who are more likely to show positive effects [7]. | Use random assignment to conditions and report the recruitment process and participant characteristics in full [7]. | Minimizes selection bias and increases the generalizability of findings. |
| Outcome Reporting | Writing abstracts or discussions to downplay undesirable results and overemphasize desired findings [7]. | Present results in a balanced manner, giving appropriate weight to all primary outcomes, including those that are null or contrary to the hypothesis [7]. | Provides a truthful summary of the research and prevents misleading readers. |
| Survey Design & Measurement | Using poorly constructed surveys with leading questions, double-barreled items, or unclear measurement scales [61] [62]. | Follow systematic questionnaire development: define the purpose, pilot-test items, ensure clarity, and establish reliability and validity [62] [63]. | Reduces measurement error and increases the accuracy of collected data. |
| Coverage & Sampling | Using an unrepresentative sample or an inaccurate list (frame) to draw respondents, limiting generalizability [61] [62]. | Carefully define the target population and use a large, random sampling method from a current and accurate list to minimize frame and selection error [62]. | Improves external validity, meaning results are more likely to apply to the broader population of interest. |
Pre-registration is a foundational improved practice that involves publicly documenting your research plan before you begin collecting or analyzing data. This combats QRPs like HARKing and p-hacking.
Detailed Methodology:
Table 3: Key Resources for Implementing Improved Research Practices
| Item or Resource | Function |
|---|---|
| Pre-registration Platforms (e.g., OSF, AsPredicted) | Provides a time-stamped, public record of your research plan to distinguish confirmatory from exploratory work [7]. |
| Reporting Guidelines (e.g., CONSORT, COREQ) | Checklists to ensure complete and transparent reporting of study details, which is crucial for replication and evaluation [60]. |
| Open Data Repositories (e.g., OSF, Zenodo) | Platforms to share de-identified research data, enabling verification of results and secondary analysis. |
| Statistical Software with Scripting (e.g., R, Python) | Using scripted analyses ensures that all analytical steps are documented and reproducible, as opposed to opaque point-and-click methods. |
| Digital Lab Notebooks | Securely records procedures, observations, and data in a time-stamped manner, improving the transparency and traceability of the research process. |
The following diagram outlines a robust workflow that integrates improved research practices to mitigate QRPs at every stage.
FAQ 1: What are the primary goals of combining blinded data analysis with pre-registration?
The primary goals are to mitigate confirmation bias and restrict analytical flexibility, which are key drivers of Questionable Research Practices (QRPs) [64] [2]. Pre-registration involves specifying your research questions, hypotheses, methodology, and analysis plan before you observe the research data [65]. This creates a clear, time-stamped record that separates confirmatory (hypothesis-testing) from exploratory (hypothesis-generating) research [66]. Blinded data analysis takes this a step further by ensuring that the researcher conducting the analyses is unaware of which experimental group the data belongs to or the outcomes of initial analyses [67]. Together, these practices reduce the opportunity for p-hacking (exploiting analytical flexibility to obtain significant results) and HARKing (presenting unexpected results as if they were predicted) [64] [2].
FAQ 2: How do confirmation bias and analytical flexibility specifically threaten research integrity?
Confirmation bias is the tendency to seek or interpret evidence in ways that support existing beliefs or expectations [64]. In data analysis, this can manifest as a researcher unconsciously favoring analytical choices that lead to "publishable," statistically significant results. Analytical flexibility, or "researcher degrees of freedom," refers to the many legitimate choices researchers make during data analysis, such as how to handle outliers or which covariates to include [66]. When these choices are made after seeing the data, they can be exploited opportunistically—a practice known as p-hacking [2] [66]. This combination inflates false-positive rates, leads to overestimated effect sizes, and contributes to the replication crisis, ultimately distorting the evidence base [64] [2] [33].
FAQ 3: Aren't these practices only relevant for hypothesis-driven (confirmatory) research?
While pre-registration and blinding are most stringent for confirmatory research, they also provide a robust framework for transparent exploratory research [64] [65]. For non-hypothesis-driven work, you can pre-register your research questions and detailed methodology without stating specific hypotheses. This still provides a record of your planned approach before data collection, enhancing transparency. Furthermore, pre-registration does not forbid unplanned, exploratory analyses; it simply requires researchers to clearly label them as post hoc, preventing them from being presented as confirmatory findings [64] [66].
Issue 1: Prior knowledge of the dataset is making true blinding difficult.
This is a common challenge, especially in secondary data analysis where a researcher may have worked with the same dataset before [64].
Issue 2: Designing a pre-registration that is both flexible and sufficiently specific.
A common pitfall is writing a pre-registration that is too vague, leaving too many researcher degrees of freedom open [66].
Table 1: Effectiveness of Structured vs. Unstructured Preregistration Formats
| Preregistration Format | Key Characteristic | Completeness in Restricting Researcher Degrees of Freedom | Key Finding |
|---|---|---|---|
| Structured (e.g., OSF Preregistration) | Detailed instructions with multiple specific questions [66]. | Higher | Restricts opportunistic use of researcher degrees of freedom significantly better than unstructured formats [66]. |
| Unstructured (e.g., Standard Pre-Data Collection) | Minimal guidance, maximum flexibility for researchers [66]. | Lower | Provides less restriction on researcher degrees of freedom, leading to more potential for analytical flexibility [66]. |
Issue 3: The data violate the assumptions of my pre-registered analysis.
A pre-registered analysis plan might not be appropriate for the collected data, for example, if the data are non-normal when a parametric test was planned [64].
Protocol 1: Implementing a Blinded Analysis Workflow for a Clinical Trial
This protocol details the steps for maintaining blinding during data analysis in a clinical trial setting, which is critical for minimizing observer bias [67].
The following diagram illustrates this workflow and its role in mitigating specific QRPs.
Diagram 1: Blinded analysis workflow and QRP mitigation.
Protocol 2: Pre-registration for Secondary Data Analysis
Analyzing existing datasets presents unique challenges for pre-registration, as researchers may have prior knowledge of the data [64].
Table 2: Key Reagent Solutions for Rigorous Research
| Tool / Resource | Category | Primary Function | Example / Link |
|---|---|---|---|
| Pre-registration Templates | Protocol Planning | Provides a structured workflow to create specific, exhaustive pre-registration plans. | OSF Preregistration, AsPredicted [65] [66] |
| Blinded Analysis Protocol | Methodology | A formal SOP for separating data preparation from analysis to prevent confirmation bias. | Internal lab standard operating procedure (SOP) [67] |
| Registered Reports | Publishing Format | A journal article format where the introduction, methods, and proposed analyses are peer-reviewed and accepted before data collection, guarding against publication bias [65]. | Offered by over 200 journals (e.g., from Springer Nature, Elsevier, Taylor & Francis) [65] |
| Clinical Trial Registries | Registry | Mandatory platforms for registering clinical trial protocols to combat selective reporting and publication bias [65]. | ClinicalTrials.gov, WHO International Clinical Trials Registry Platform [65] |
| Statistical Software & Packages | Data Analysis | To conduct pre-specified analyses and power calculations. Using scripted analyses (vs. point-and-click) ensures reproducibility. | R (with pwr package), Python, SAS, Stata [2] |
Q1: Our experimental results are inconsistent and hard to reproduce. Where should we focus our troubleshooting efforts?
A: Inconsistent results often stem from incomplete or variable experimental protocols. Your first step should be to verify that your protocol description includes all necessary and sufficient information for another researcher to replicate your work exactly. A detailed checklist is the most effective tool for this. Focus on the following key areas [68]:
Q2: We suspect questionable research practices (QRPs) might be affecting our field. What are the most common QRPs related to methodology?
A: Questionable research practices can compromise the validity of scientific conclusions across various research designs. Being aware of them is the first step toward prevention. The table below summarizes some common QRPs identified in the literature [7]:
| Questionable Research Practice | Description |
|---|---|
| Selective Data Reporting | Reporting only positive or statistically significant results while omitting negative or insignificant findings [7]. |
| Selective Procedural Reporting | Omitting details about possible confounds or procedural deviations that could explain the outcomes [7]. |
| p-hacking | Conducting multiple data analyses or selectively choosing data points to produce or enhance statistically significant outcomes [7]. |
| HARKing | Hypothesizing After the Results are Known; formulating a hypothesis to fit the data after the study is complete [7]. |
| Selective Outcome Reporting | Writing abstracts or discussions to downplay undesirable results and emphasize desired results [7]. |
Q3: A key assay in our drug discovery pipeline is producing unexpected results. What is a systematic approach to troubleshooting this?
A: Follow a disciplined, step-by-step troubleshooting process to efficiently identify the root cause [69] [70]:
Q4: How can we proactively minimize the need for troubleshooting in our laboratory operations?
A: A proactive culture of prevention and continuous improvement is more effective than a reactive one. Key strategies include [70]:
| Item | Critical Function & Rationale |
|---|---|
| Validated Antibodies | Ensure specificity for the target protein. Unvalidated antibodies are a major source of irreproducible results; use resources like the Antibody Registry for unique identification [68]. |
| Unique Device Identifiers (UDI) | For medical devices and key equipment, UDIs from databases like the Global Unique Device Identification Database (GUDID) enable accurate reporting and tracing [68]. |
| Standardized Cell Lines | Use well-characterized and authenticated cell lines to prevent misidentification and contamination, which can invalidate years of research. |
| Pharmacological Standards | Certified reference standards with known purity and potency are non-negotiable for assay validation and ensuring consistent drug discovery efforts [72]. |
Table: Key Data Elements for Reproducible Experimental Protocols [68]
| Protocol Section | Essential Data Elements to Report |
|---|---|
| Objectives & Background | Hypothesis, scientific background, and predefined outcomes. |
| Reagents & Materials | Source, catalog number, lot number, purity, concentration, storage conditions. |
| Equipment & Instruments | Manufacturer, model, software version, calibration status. |
| Step-by-Step Procedure | Unambiguous instructions with precise parameters (time, temperature, volumes). |
| Troubleshooting & Hints | Common problems and their solutions, notes on critical steps. |
| Data Analysis Plan | Pre-specified statistical methods and criteria for analysis. |
This workflow provides a logical sequence for investigating unexpected experimental results, helping to ensure no stone is left unturned [69] [70].
This diagram contrasts the detrimental cycle of Questionable Research Practices with the reinforcing benefits of implementing Robust Methodologies, highlighting their opposing impacts on research outcomes [7].
Problem: High volume of low-quality publications is overwhelming the system.
Problem: Misalignment between institutional values and academic rewards.
Problem: Prevalence of Questionable Research Practices compromising research validity.
Q1: What evidence exists that the current academic reward system is failing? A: Surveys indicate widespread recognition of the problem. A Cambridge University Press survey of over 3,000 researchers, publishers, funders, and librarians from 120 countries found that only 33% agreed that academic reward and recognition systems are working well. Furthermore, 64% of respondents believed the current system "fails to fully recognize contributions outside publishing articles in established journals" [73].
Q2: What are common types of Questionable Research Practices (QRPs)? A: Common QRPs have been identified across research fields. The table below summarizes several key QRPs and their descriptions [7]:
| Questionable Research Practice | Description |
|---|---|
| Selective Data Reporting | Selectively reporting positive/statistically significant results while omitting negative/insignificant results. |
| p-hacking | Selectively conducting data analyses to produce and/or enhance positive/statistically significant outcomes. |
| HARKing | Formulating hypotheses after study outcomes are obtained to "fit" the data. |
| Selective Outcome Reporting | Writing a paper's abstract or discussion to selectively downplay undesirable results and/or emphasize desired results. |
Q3: How is the publishing business model linked to these problems? A: The current system creates a conflict of interest. As one analysis notes, "Researchers are incentivised to publish as much as possible and publishers make more money if they publish more papers" [74]. This model diverts public research funds into shareholder profits, with one publisher maintaining a 37% profit margin [76]. The "author pays" open access model has also been co-opted, with authors paying between £2,000 and £10,000 per article, far exceeding the actual production cost [74].
Q4: What concrete reforms are being proposed or tested? A: Several major initiatives are underway:
Q5: What are the key focus areas for modernizing academic rewards? A: Modernization efforts aim to elevate institutional commitment to several key areas [77]:
Objective: To identify and quantify indicators of questionable research practices within a body of literature, such as randomized controlled trials.
Methodology (as implemented in a study of 163,129 RCTs):
Objective: To identify a comprehensive set of questionable and improved research practices within a specific methodological domain (e.g., single-case experimental designs).
Methodology (as implemented in a 2025 study):
The following diagram illustrates the interconnected relationships between institutional pressures, researcher actions, and the resulting impacts on the research ecosystem.
The following table details major programs and frameworks that are essential "reagents" for conducting the experiment of institutional reform.
| Initiative / Framework | Function | Key Features |
|---|---|---|
| MA3 Challenge [75] [77] | Catalyzes institutional change by funding and supporting bold reforms to academic hiring, promotion, and tenure. | - $1.5M in funding from major foundations.- Two funding tiers: $50K (dept.) and $250K (institution).- Focus on implementation, not just planning.- Includes a community of practice for awardees. |
| DORA (Declaration on Research Assessment) [73] | Provides a framework for improving how the quality of research output is evaluated. | - Aims to end the use of journal-based metrics.- Advocates for assessing research on its own merits.- Focuses on reform of research assessment. |
| Ethical Publishing Models [76] [74] | Offers alternative publishing pathways that keep resources within academia and promote open access. | - Non-profit, academic-owned publishers.- Low or reasonable Article Processing Charges (APCs).- Profits are reinvested into the academic community. |
| Open Research Practices [77] | Serves as a core value for realigning incentives towards transparency and collaboration. | - Encourages pre-registration, data sharing.- Rewards transparency and reproducibility.- Aims to make knowledge equitably accessible. |
A1: The most common QRPs are practices that compromise research integrity, often to achieve statistically significant or desired results. The table below details these practices and their impact [7] [2].
| QRP Name | Description | Primary Risk |
|---|---|---|
| Selective Data Reporting | Reporting only positive or statistically significant results while omitting negative or insignificant ones [7]. | Distorts the literature, creates a "file drawer" effect, misleads meta-analyses [7] [2]. |
| P-hacking | Conducting multiple analyses on a dataset to find a statistically significant result, often without a prior hypothesis [2]. | Inflates false positive rates, misrepresents true effects [2]. |
| HARKing (Hypothesizing After the Results are Known) | Presenting a post-hoc hypothesis (created after seeing the results) as if it was an a priori prediction [7]. | Undermines the hypothesis-testing framework, makes findings non-falsifiable [7]. |
| Selective Procedural Reporting | Omitting details of the procedure that could be confounds or could explain the results [7]. | Prevents replication and masks flaws in experimental design [7]. |
| Inadequate Record Keeping | Failing to maintain a detailed, step-by-step record of the research process and analytical decisions [2]. | Makes the work irreproducible and is a gateway to other QRPs [2]. |
A2: The American Statistical Association (ASA) outlines core principles for ethical statistical practice. Adhering to these is a powerful antidote to QRPs [78].
A3: A well-framed question is the first defense against QRPs. For clinical or intervention-based research, the PICO framework is highly effective [79] [80].
Example Research Question: "In children with acute otitis media (P), is cefuroxime (I) more effective than amoxicillin (C) at reducing symptom duration (O)?" [79]
Furthermore, ensure your question meets the FINER criteria: Feasible, Interesting, Novel, Ethical, and Relevant [79].
A4: Proactive planning and transparency are key. The following table outlines common problems and their solutions.
| Research Stage | QRP Risk | Improved Research Practice & Solution |
|---|---|---|
| Study Design | Unplanned analyses leading to p-hacking; unclear hypotheses leading to HARKing. | Pre-registration: Publicly file a detailed research plan, including hypotheses, methods, and analysis strategy, before data collection begins on a platform like the Open Science Framework (OSF) [2]. |
| Data Collection | Stopping data collection early once significance is reached. | A priori Power Analysis: Use tools (e.g., the pwr package in R) before the study to determine the necessary sample size and stick to it [2]. |
| Data Analysis | Excluding data points without justification; running multiple tests. | Pre-specified Analysis Plan: Define all exclusion criteria and primary statistical tests in your pre-registration. Use blinded data analysis where feasible. |
| Reporting | Selective reporting of outcomes, conditions, or studies. | Full Transparency: Report all manipulated variables, all collected measures, and all conducted analyses, regardless of the outcome. Use guidelines like SAMPL for statistical reporting [7]. |
Solution:
Solution:
This table summarizes quantitative data on the prevalence and perceived impact of QRPs, which highlights the importance of rigorous training [7] [34] [2].
| Questionable Research Practice | Reported Prevalence | Key Impact on Literature |
|---|---|---|
| Selective Data Reporting | Estimated that one in two researchers has engaged in at least one QRP in the last three years [2]. | Published studies of an intervention (e.g., Pivotal Response Treatment) show larger effects than unpublished studies, indicating a bias in the available evidence [7]. |
| P-hacking | Some studies suggest prevalence may be smaller than feared; one study of 8,000 psychology articles found only a "small amount" of selective reporting bias [34]. | Creates an inflation of false positives, leading to a literature that overstates true effect sizes [2]. |
| HARKing | Common enough to be a major topic of discussion in the replication crisis; precise prevalence is difficult to ascertain [7]. | Leads to a proliferation of seemingly supported but actually post-hoc hypotheses, making the theoretical landscape fragile [7]. |
This table details key conceptual "reagents" and resources necessary for conducting rigorous and reproducible research.
| Tool / Resource | Category | Function & Purpose |
|---|---|---|
| Pre-registration Platform (e.g., OSF, ClinicalTrials.gov) | Study Design | To pre-specify hypotheses, methods, and analysis plans, preventing p-hacking and HARKing [2]. |
Statistical Power Software (e.g., G*Power, pwr package in R) |
Study Design | To calculate the required sample size a priori, ensuring the study is feasible and has a high probability of detecting a true effect if it exists [2]. |
| Citation Manager (e.g., Zotero, Mendeley) | Reporting | To organize references and ensure proper, accurate attribution of others' work, avoiding plagiarism [2]. |
| Data & Code Repository (e.g., OSF, GitHub) | Dissemination | To share data, code, and materials, enabling replication and scrutiny, which is a core ethical responsibility [78]. |
| Ethical Guidelines (e.g., ASA Ethical Guidelines) | Foundational | To provide a framework for professional integrity, accountability, and responsibilities to all stakeholders in the research process [78]. |
This technical support center provides troubleshooting guides and FAQs to help researchers, scientists, and drug development professionals identify, understand, and address Questionable Research Practices (QRPs) in their work. QRPs are defined as "ways of producing, maintaining, sharing, analyzing, or interpreting data that are likely to produce misleading conclusions, typically in the interest of the researcher" [33].
Q: What exactly are Questionable Research Practices (QRPs), and how do they differ from outright research misconduct?
A: QRPs occupy an ethical gray area between sound scientific conduct and outright scientific misconduct (fabrication, falsification, and plagiarism) [4]. They are often technically permissible but ethically and methodologically risky behaviors that can skew results and inflate significance [33]. While research misconduct like data fabrication is universally condemned and often punishable, QRPs offer considerable latitude for rationalization and may even be accepted in some disciplines, despite producing misleading or false research [81].
Q: Why do researchers engage in QRPs, and what are the main contributing factors?
A: Researchers may engage in QRPs due to a complex interplay of individual, institutional, and systemic pressures [4]. A key driver is the competitive "publish or perish" culture, which rewards volume of publication and statistically significant findings over scientific rigor [2] [4]. Contingencies related to career advancement, funding, and prestige can powerfully shape researcher behavior [49]. At an individual level, factors such as cognitive biases, competency shortfalls in methodology, and lower commitment to the normative ideals of science (e.g., organized skepticism, disinterestedness) have been identified as predictors [4].
Q: How can I detect potential QRPs in my own work or during peer review?
A: Detection can be challenging, but here are key indicators:
Q: What are the concrete consequences of QRPs for scientific progress and public trust?
A: The consequences are severe and far-reaching:
Q: What protocols and tools can I use to prevent QRPs in my research workflow?
A: Adopting open science practices is the most effective way to prevent QRPs [33]. Key protocols include:
Issue: Suspected P-hacking in Data Analysis
Diagnosis: P-hacking, or p-value hacking, occurs when many different analyses are carried out to discover statistically significant results when no real effect exists [2]. Symptoms include repeatedly running analyses with different covariates, excluding outliers without justification, or stopping data collection once significance is reached.
Resolution Protocol:
Issue: Hypothesizing After Results are Known (HARKing)
Diagnosis: This QRP involves presenting a post-hoc hypothesis (developed after seeing the data) as if it were an a priori prediction. This misleads readers about the confirmatory nature of the study [33] [4].
Resolution Protocol:
Issue: Selective Reporting of Outcomes or Studies
Diagnosis: Also known as "cherry-picking," this involves only reporting results, variables, or entire studies that are significant or consistent with predictions, while withholding others [2] [49]. This creates a "file drawer effect" and biases the literature [49].
Resolution Protocol:
The table below summarizes prevalence data for QRPs across various disciplines, illustrating that these practices are a widespread concern [81].
Table 1: Prevalence of Self-Reported Engagement in Questionable Research Practices
| Prevalence | Researcher Population | Reference |
|---|---|---|
| 51% | 807 ecologists and evolutionary biologists | Fraser et al. (2018) [81] |
| 27% | 2,155 U.S. psychologists | John et al. (2012) [81] |
| 58% | 1,166 U.S., European and Australian psychologists | Motyl et al. (2017) [81] |
| 37% | 277 Italian psychologists | Agnoli et al. (2017) [81] |
| 50% | 746 management researchers | Banks et al. (2016) [81] |
| 51% | 6,813 Dutch scientists (across disciplines) | Gopalakrishna et al. (2022) [81] |
The following table contrasts deliberate QRPs with "honest yet unacceptable" research practices, which are unintentional mistakes or weaknesses that are nonetheless damaging due to their wide prevalence [18].
Table 2: Questionable Research Practices vs. Honest Yet Unacceptable Research Practices
| Questionable Research Practices (Deliberate) | Honest Yet Unacceptable Research Practices (Unintentional) |
|---|---|
| Data manipulation or study fabrication [18] | Not submitting negative results for publication [18] |
| P-hacking to lower p-values [18] | Failure to preregister studies [18] |
| Changing hypotheses based on results obtained (HARKing) [18] | Downplaying or omitting references to study limitations [18] |
| Post-hoc deletion of data [18] | Overhyping of results or their significance [18] |
| Salami slicing of papers (dividing results to increase publication count) [18] | Weak attention to statistical power [18] |
| Artificially selecting controls to produce statistical significance [18] | Failure to report study weaknesses [18] |
This table details key solutions and practices to uphold rigor and transparency in research.
Table 3: Research Reagent Solutions for Combating QRPs
| Tool / Solution | Function & Purpose | Example/Platform |
|---|---|---|
| Pre-registration | Time-stamps hypotheses and analysis plans, preventing HARKing and p-hacking. Distinguishes confirmatory from exploratory research. | Open Science Framework (OSF), ClinicalTrials.gov, AsPredicted |
| Registered Reports | A publishing format where peer review happens before data collection. Accepts articles based on scientific rigor, not result significance. | Journals in Cortex, Comprehensive Results in Social Psychology [4] |
| Data & Code Sharing | Enables full transparency, allows others to verify and build upon findings, and facilitates meta-analyses. | OSF, Zenodo, Dataverse, GitHub |
| Citation Manager | Helps organize references and ensures accurate attribution to original sources, avoiding improper referencing. | Zotero, Mendeley [2] |
| Power Analysis Software | Determines the sample size needed to detect an effect a priori, reducing underpowered studies and the incentive for p-hacking. | pwr package in R, G*Power, Superpower [2] |
| Blind Analysis Protocols | A methodology where data analysis is conducted blinded to experimental conditions to reduce confirmation bias. | Internal lab standard operating procedures (SOPs) [33] |
The diagram below visualizes a robust research workflow integrated with safeguards against QRPs, from initial planning to final publication.
Diagram 1: Integrity-focused research workflow with QRP safeguards.
This diagram outlines the logical process for diagnosing a potential QRP and implementing the correct mitigating solution.
Diagram 2: Diagnostic logic for identifying and mitigating common QRPs.
Questionable Research Practices (QRPs) are methodological behaviors that, while not necessarily constituting outright fraud, are likely to produce misleading conclusions, typically in the interest of the researcher [33]. These practices threaten the integrity of scientific findings by compromising the replicability and validity of research [7]. The discussion of QRPs has been prominent in fields relying on group comparison research, but they manifest differently in Single-Case Experimental Designs (SCED) due to substantially different research methods and data analysis strategies [7]. This guide provides a comparative troubleshooting FAQ to help researchers identify, avoid, and remedy common QRPs in both research paradigms.
The table below summarizes how common QRPs manifest differently in group and single-case research designs, aiding in the identification of potential issues in your work.
Table 1: Troubleshooting Guide: QRPs in Group vs. Single-Case Designs
| Research Phase | Questionable Research Practice (QRP) | Common in Group Designs | Common in Single-Case Designs | Primary Consequence |
|---|---|---|---|---|
| Planning | Selective Sampling | Recruiting participants more likely to show positive effects into a treatment group [7]. | Selecting a participant known to be highly responsive to the intervention. | Biased estimate of treatment effect; reduced generalizability. |
| Data Collection | Adding Data After Results | Collecting more data after seeing results to achieve statistical significance [2] [82]. | Adding more intervention sessions until a desired visual effect is achieved. | Inflated Type I error rates; false positives. |
| Data Analysis | P-hacking | Running multiple analyses to find a statistically significant result (e.g., outlier exclusion, model specification) [7] [2] [33]. | Manipulating graphical display (e.g., axis scaling) to enhance the appearance of a visual effect [7]. | Misleading representation of the effect's strength and consistency. |
| Data Analysis | Selective Data Reporting | Reporting only studies, conditions, or dependent variables with positive/statistically significant results [7] [2]. | Omitting data points from graphs that indicate instability or weak effects [7]. | File drawer effect; distorted meta-analytic findings. |
| Writing | HARKing (Hypothesizing After Results are Known) | Formulating or presenting a hypothesis after the results are known to fit the data [7] [2]. | Developing a post-hoc explanation for a functional relation observed in the data. | Compromised theory testing; overfitting of explanations to noise. |
| Publication | Selective Outcome Reporting | Writing abstracts/discussions to downplay undesirable results and emphasize desired ones [7]. | Overinterpreting weak or unstable effects in the visual analysis as being clinically significant. | Misleading readers about the robustness and applicability of findings. |
Visual analysis is not immune to biases. Unlike statistical analysis, there are fewer universal standards, creating "researcher degrees of freedom." Common issues include:
Improved Practice: Establish and pre-specify visual analysis criteria (e.g., What constitutes a change in level? How many data points define a trend?). Use statistical analysis as a supplement to visual inspection to increase objectivity [83].
Not necessarily. Exclusion becomes a QRP when it is done selectively, post-hoc, and without transparent justification to make the results look more favorable.
Improved Practice: Before collecting data, establish and document clear, justified a priori criteria for data exclusion (e.g., pre-defined adherence thresholds, protocol deviations). Report all exclusions transparently in the manuscript, including for participants who showed no effect [2] [82].
This practice, known as p-hacking in group designs, is a serious QRP. It inflates the false positive rate.
Improved Practice: Preregister your analysis plan, including precise specifications of your primary outcome, how it will be measured, and the exact statistical or visual analysis strategy you will use [2] [82] [33]. For exploratory analyses, clearly label them as such.
This is a common situation. The problem is not the discovery, but how it is presented.
The table below outlines key remedies and improved research practices that serve as alternatives to QRPs, applicable to both group and single-case designs.
Table 2: The Scientist's Toolkit: Reagent Solutions for Improved Research Integrity
| Tool / Solution | Function / Purpose | Applies To |
|---|---|---|
| Pre-registration | Documents hypotheses, methods, and analysis plans before a study is conducted. Limits flexibility in analysis and reporting. [2] [82] [33] | Group & Single-Case |
| Blind Data Analysis | Analyzing data with hidden condition labels to reduce confirmation bias. The "answer" is revealed only after analysis choices are finalized. [82] [33] | Group & Single-Case |
| Power Analysis / Sensitivity Analysis | (Group) Determines sample size needed to detect an effect. (Single-Case) Determines the number of measurements needed to detect an effect with a given design. [82] | Primarily Group |
| Data & Code Sharing | Making raw data and analysis code publicly available. Allows for independent verification and reproducibility checks. [33] | Group & Single-Case |
| Registered Reports | A publishing format where peer review happens before data collection. Acceptance is based on the question and methods, not the results. [82] | Group & Single-Case |
| Standard Operating Procedures (SOPs) | A document detailing procedures for common research actions (e.g., outlier handling) to ensure consistency and avoid ad-hoc decisions. [82] | Group & Single-Case |
| The "21-Word Solution" | A statement in the method section: "We report how we determined our sample size, all data exclusions (if any), all manipulations, and all measures in the study." [82] | Group & Single-Case |
| Replication | Conducting direct or conceptual replication studies to determine the reliability of findings. [82] | Group & Single-Case |
The following diagram visualizes a recommended workflow for integrating improved research practices into your project to mitigate QRPs from start to finish.
Q1: What are Questionable Research Practices (QRPs) and how do they directly affect replication? QRPs are suboptimal research practices that occupy an ethical gray area between sound science and outright misconduct (e.g., fabricating data, selective reporting of outcomes, hypothesizing after results are known). They directly reduce the likelihood that a study's findings will replicate because they increase the rate of false positive results and undermine the reliability and validity of the scientific record. [84] [4]
Q2: Why is the base rate of true effects important for replication? The base rate of true effects within a research domain is a major factor determining replication rates. In fields where true effects are rare (e.g., early drug discovery), the relative proportion of false positives will be high, leading to lower replication rates for purely statistical reasons, even in the absence of QRPs. [84]
Q3: What is the difference between replication and direct replication? A replication experiment repeats a measurement under similar conditions to estimate the imprecision or random error of an analytical method. It is a fundamental practice for verifying the reliability of findings. [85] Direct replication specifically refers to an independent study that uses the same methods and procedures to verify a previously published result.
Q4: How can I estimate the imprecision of my method? Perform a replication experiment by analyzing a minimum of 20 samples of the same test material. Calculate the mean, standard deviation (SD), and coefficient of variation (CV). The SD represents the random error or imprecision of your method. [85]
Q5: What constitutes acceptable performance for imprecision? For short-term imprecision (within-run or within-day), the standard deviation should be less than a quarter of the defined total allowable error. For long-term imprecision (total), the standard deviation should be less than one-third of the total allowable error. [85]
Problem: An independent study fails to find a statistically significant effect that was previously reported.
Investigation & Solutions:
| Potential Cause | Diagnostic Checks | Corrective Actions |
|---|---|---|
| Low Statistical Power [84] | Calculate the power of the original and replication studies. | Always conduct an a-priori power analysis before data collection. For the replication, use a larger sample size than the original study. |
| Questionable Research Practices (QRPs) in original study [84] [86] | Check for signs of p-hacking (e.g., multiple testing, selective outlier removal). Check if the result is part of a "too good to be true" series of positive findings. | Perform a reanalysis of the original data if available. Pre-register the replication study's hypothesis and analysis plan to avoid the same pitfalls. |
| Low Base Rate of True Effects [84] | Evaluate the prior probability for the field (e.g., via meta-analyses or prediction markets). | Interpret findings with caution in low base-rate fields. Use Bayesian methods to calculate the posterior probability of the effect being true. |
| Methodological Discrepancies | Carefully compare lab protocols, reagents, equipment, and sample populations between original and replication study. | Directly collaborate with the original authors to align methods. Conduct a "differential replication" to test the effect under varied conditions. |
Problem: A paper reports a series of experiments that all successfully replicate a key finding, but the individual studies have low statistical power, making this series of successes improbable.
Investigation & Solutions:
| Potential Cause | Diagnostic Checks | Corrective Actions |
|---|---|---|
| Selective Reporting (File Drawer Effect) [84] | Calculate the probability of all studies being significant given their individual power. An improbably high success rate suggests unreported failed studies. | Request raw data for all conducted experiments related to the research question. Look for pre-registered study designs as a sign of completeness. |
| Use of QRPs Across Studies [84] | Check for flexibility in data collection and analysis across the studies (e.g., different outliers removed, changes in dependent variables). | Advocate for the publication of all research outcomes, regardless of statistical significance. Use study pre-registration to lock in analysis plans. |
Problem: Data patterns appear unnatural, or findings seem implausible, leading to suspicion of data manipulation.
Investigation & Solutions:
| Potential Cause | Diagnostic Checks | Corrective Actions |
|---|---|---|
| Data Fabrication [86] | Perform data forensics (e.g., check for digit preference, anomalies in distributions). Attempt to replicate the data collection process. | Report concerns to relevant institutional integrity bodies. Independent replication by a separate lab is the strongest test. |
| Data Falsification [86] | Scrutinize lab notebooks and original data records for inconsistencies. Check for selective omission of data points to achieve significance. | Foster an open science culture where data and code are shared. This enables peer scrutiny and deters misconduct. |
Data synthesized from the Open Science Collaboration (2015) and other large-scale replication projects. [84]
| Discipline | Estimated Replication Rate | Key Contributing Factors |
|---|---|---|
| Social Psychology | < 30% | Low base rate of true effects, QRPs, low statistical power. [84] |
| Cognitive Psychology | ~50% | Slightly higher base rate of true effects compared to social psychology. [84] |
| Preclinical Cancer Research | ~11% (6 out of 53 landmark studies) | Low base rate, complex biological systems, potential for QRPs. [84] |
| Economics | ~62% | - |
Data from various anonymous survey studies. [4]
| QRP Example | Estimated Prevalence Range | Notes |
|---|---|---|
| Failing to report all of a study's conditions or dependent measures | Up to 100% for some QRPs in psychology [4] | Based on an anonymous elicitation survey with incentives for truth-telling. [84] |
| HARKing (Hypothesizing After Results are Known) | Commonly self-reported [4] | Considered by some as defensible in certain contexts. [4] |
| Selectively reporting studies that "worked" | Common [84] | Contributes to the "file drawer problem." |
| Rounding off p-values (e.g., from .054 to .05) | - | - |
| At least one QRP in the past three years | ~34% [4] | Based on a meta-analytic study by Fanelli (2009). |
Purpose: To estimate the imprecision (random error) of an analytical method. [85]
Materials:
Procedure:
Purpose: To verify a previously published finding using the same methods and procedures.
Materials:
Procedure:
| Item | Function/Benefit |
|---|---|
| Pre-Registration Templates | Provides a structured format for detailing hypotheses, methods, and analysis plans before data collection, preventing QRPs like HARKing and p-hacking. |
| Open Data Repositories | Platforms for sharing raw research data, enabling independent verification of results and detection of errors or misconduct. |
| Power Analysis Software | Tools to calculate the necessary sample size to detect an effect, ensuring studies are adequately powered and reducing false negatives. |
| Electronic Lab Notebooks | Securely records research procedures and data in a time-stamped, uneditable format, providing a clear audit trail. |
| Statistical Software with Robust Methods | Enables the application of appropriate statistical tests and Bayesian methods to assess the strength of evidence for hypotheses. |
Questionable Research Practices (QRPs) are activities that exist in an ethical grey area between sound scientific conduct and outright misconduct (e.g., fabrication, falsification, and plagiarism) [4]. They are problematic because they undermine the reliability and validity of scientific knowledge, leading to a skewed scientific literature that prolongs support for empirically untenable theories [4]. Common QRPs include [87]:
These practices are concerning due to their prevalence, with an estimated one in two researchers having engaged in at least one QRP over the last three years [87]. When assessing evidence, these practices can lead to false positives and distorted conclusions, making it crucial to use robust assessment tools that can identify potential QRPs [87].
Different quality assessment tools are designed to evaluate specific study methodologies. Using the appropriate tool for each study design is essential for properly assessing potential QRPs [88]. The table below summarizes recommended tools for various study types:
Table: Evidence Assessment Tools for Different Study Designs
| Study Design | Recommended Assessment Tools | Key Elements Assessed |
|---|---|---|
| Randomized Controlled Trials (RCTs) | Cochrane Risk of Bias (ROB) 2.0 [88], CASP RCT Appraisal Tool [88], Jadad Scale [88] | Randomization process, allocation concealment, blinding, outcome data completeness, selective reporting |
| Cohort Studies | Newcastle-Ottawa Scale (NOS) [88], CASP Cohort Studies Checklist [88] | Group selection, comparability, exposure/outcome assessment, follow-up adequacy |
| Case-Control Studies | Newcastle-Ottawa Scale (NOS) [88], CASP Case Control Study [88] | Case and control selection, comparability, exposure measurement |
| Systematic Reviews | AMSTAR Checklist [88], CASP Systematic Review [88] | Search comprehensiveness, study selection criteria, risk of bias assessment, meta-analysis methods |
| Diagnostic Studies | QUADAS-2 [88], CASP Diagnostic Studies [88] | Patient selection, index test, reference standard, flow and timing |
| Qualitative Studies | CASP Qualitative Studies [88], McGill MMAT [88] | Research aims, methodology, design, recruitment, data collection |
When assessing literature for potential QRPs, several methodological warning signs should prompt more careful scrutiny:
Evaluating a study's applicability involves assessing whether results can be validly applied to your specific organization, population, or research context [88]. Key considerations include:
Problem: Different reviewers assign different quality ratings to the same study using the same assessment tool.
Solution:
Table: Protocol for Resolving Assessment Discrepancies
| Discrepancy Level | Resolution Process | Documentation Requirement |
|---|---|---|
| Minor (e.g., 1-point difference on scale) | Discussion between two original reviewers | Note initial scores and rationale for final score |
| Moderate (e.g., different risk of bias categories) | Discussion with reference to codebook examples | Document specific criteria interpretation |
| Major (fundamental disagreement on study validity) | Adjudication by third reviewer with methodology expertise | Record all perspectives and final decision rationale |
Problem: A study appears to report only positive findings, with incomplete outcome data or missing analyses.
Solution:
Flowchart: Addressing Suspected Selective Reporting
Problem: Included studies have substantially different methodological quality, making overall conclusions challenging.
Solution:
Table: Key Resources for Evidence Assessment and QRP Identification
| Tool/Resource | Primary Function | Application Context |
|---|---|---|
| Cochrane Risk of Bias 2.0 | Assess methodological quality of randomized trials | Gold standard for RCT quality assessment; evaluates randomization, deviations, missing data, outcome measurement, selective reporting [88] |
| Newcastle-Ottawa Scale (NOS) | Quality assessment of non-randomized studies | Standard tool for case-control and cohort studies; evaluates selection, comparability, outcome/exposure [88] |
| AMSTAR Checklist | Appraise systematic reviews and meta-analyses | 16-item tool to evaluate systematic review methodology, including search strategy, study selection, data extraction, and synthesis methods [88] |
| GRADE approach | Rate quality of evidence across studies | System for grading confidence in estimates; considers study design, limitations, inconsistency, indirectness, imprecision, publication bias [89] |
| Open Science Framework (OSF) | Protocol registration and data sharing | Platform for pre-registering analysis plans, sharing data and materials; helps identify selective reporting and p-hacking [87] |
| QUADAS-2 | Assess quality of diagnostic accuracy studies | Tool specifically designed for diagnostic studies; evaluates patient selection, index test, reference standard, flow and timing [88] |
Workflow: Evidence Assessment with QRP Screening
Purpose: To minimize bias and errors in quality assessment of included studies through structured independent review.
Materials:
Methodology:
Purpose: To systematically identify potential questionable research practices in the statistical reporting of studies.
Materials:
Methodology:
Q1: Our research team followed all standard methodologies, yet our replication attempts frequently fail. Are we committing Questionable Research Practices (QRPs)?
Failure to replicate is not always a sign of misconduct. The root cause may lie in Questionable Research Fundamentals (QRFs)—the underlying philosophical assumptions of your theories, concepts, and methods [90]. Before suspecting QRPs, investigate your field's base rate of true effects; domains where true effects are rare inherently have lower statistical replicability, even with perfect practices [84]. We recommend a paradigm check of your ontological and epistemological foundations [90].
Q2: We have obtained a statistically significant result (p < 0.05). How can we be more confident it will replicate?
A single p-value is an unreliable indicator of replicability [84]. Focus on strengthening your research fundamentals. This includes pre-registering your study design and analysis plan to curb p-hacking, using larger sample sizes to achieve higher statistical power, and employing more robust statistical methods. Fundamentally, you must critically evaluate whether your measurement approach genuinely captures the psychological phenomenon you intend to study [90].
Q3: What is the most critical but often overlooked step in designing a replicable study?
The most critical step is clearly distinguishing between the study phenomena (e.g., a participant's internal belief) and the means used to explore it (e.g., the questionnaire items). A failure to separate these is a cardinal error in psychology that leads to confusion between what exists and how we measure it, fundamentally undermining replicability [90].
Q4: Our large-scale, multi-site replication study produced ambiguous results. What went wrong?
The problem may be the ergodic fallacy. You are likely applying group-level (nomothetic) findings to explain individual-level processes, which is often statistically invalid [90]. For phenomena that are non-ergodic (where group averages do not represent individuals), consider shifting to case-by-case, person-oriented analyses to establish epistemically justified generalizations [90].
Issue: Suspected Low Replicability in a Research Domain
This guide helps diagnose systemic factors affecting replicability in your field.
Step 1: Gather Information & Identify Symptoms
Step 2: Establish Probable Cause
Step 3: Test a Solution
Step 4: Implement the Solution
Step 5: Verify Functionality
Table: Diagnostic Table for Replication Failure
| Potential Cause | Key Indicators | Supporting Evidence from Literature |
|---|---|---|
| Low Base Rate of True Effects [84] | Low replication rates despite high methodological rigor; discovery-oriented (vs. theory-testing) research. | Base rates estimated at ~9% in social psychology and ~20% in cognitive psychology [84]. |
| Low Statistical Power [84] | Small sample sizes; small effect sizes; wide confidence intervals. | Median statistical power in psychology has been estimated at ~36% [84]. |
| Questionable Research Fundamentals (QRFs) [90] | Reliance on variable-oriented approaches; confusion between ontological concepts and their measurement. | A core argument is that QRFs, not just QRPs, are the root cause of psychology's crises [90]. |
| Inappropriate Generalization (Ergodic Fallacy) [90] | Applying group-level findings to individuals; high intra-individual variability. | Many psychological processes are non-ergodic, making sample-to-individual inferences invalid [90]. |
Issue: Implementing a Robust Multi-Site Replication Project
This guide provides a workflow for conducting large-scale, collaborative replication projects, an established structural solution to the replication crisis [40].
Table: Essential Methodological "Reagents" for Robust Research
| Item / Concept | Function / Explanation | Field Application |
|---|---|---|
| Pre-registration | A time-stamped, immutable plan stating hypotheses, methods, and analysis strategy before data collection. | Mitigates QRPs like HARKing and p-hacking; distinguishes confirmatory from exploratory research [40]. |
| Collaborative Replication Networks | Consortia of labs (e.g., Collaborative Replications and Education Project) that conduct large-scale replications [40]. | Increases sample size and generalizability; provides robust replicability estimates for a field [40]. |
| Person-Oriented Analysis | Analytical approaches that focus on the individual as a functioning whole, rather than on isolated variables [90]. | Avoids the ergodic fallacy; essential for studying non-ergodic, dynamic psychological processes [90]. |
| Contributor Roles Taxonomy (CRediT) | A standardized taxonomy (e.g., CRediT) to transparently document author contributions [40]. | Clarifies authorship and accountability, especially in large collaborative projects [40]. |
| Philosophy of Science Elaboration | The critical process of making explicit the ontological and epistemological foundations of one's research paradigm [90]. | Addresses QRFs by ensuring theories and methods are built on coherent, justified fundamentals [90]. |
1.0 Objective: To independently verify the findings of a high-impact study (the "target") through a pre-registered, multi-laboratory replication effort [40].
2.0 Methodology:
3.0 Diagram: Replication Project Workflow
This technical support center provides resources to help researchers identify and address common issues in studies focused on Questionable Research Practices (QRPs). The guides below offer systematic approaches to troubleshooting methodological challenges.
Problem: Low response rates and potential for socially desirable answers in surveys measuring QRPs. Application: This guide is for researchers encountering poor data quality during investigations into research integrity using self-report questionnaires [92].
| Problem Symptom | Likely Cause | Prerequisites to Check | Resolution Steps |
|---|---|---|---|
| Low survey response rate [93] | Long, cumbersome questionnaire; low perceived anonymity; poorly targeted sample. | Ensure the questionnaire can be completed in an appropriate time frame [93]. | 1. Run a pilot study to get feedback on design and length [93]. 2. Simplify and shorten the questionnaire [93]. 3. Use multiple recruitment channels and send polite reminders. |
| Evidence of Social Desirability Bias (e.g., unrealistically low reports of common QRPs) [92] | Respondents providing socially acceptable rather than truthful answers, especially on sensitive topics [93]. | Check if questions on sensitive behaviors (e.g., data manipulation) are leading or assumptive [92]. | 1. Reassure participants of anonymity and confidentiality [93]. 2. Use neutral, non-judgmental language and avoid leading questions [92] [93]. 3. Consider using indirect questioning techniques. |
| High item non-response or ambiguous answers | Poorly worded questions; complex terminology; confusing format [93]. | Check for technical jargon, ambiguous terms, or double-barreled questions [92] [93]. | 1. Conduct a pilot to check question clarity [93]. 2. Use simple, straightforward language tailored to the audience [93]. 3. Revise or drop problematic items. |
| Inconsistent responses within the same survey | Question order effects where earlier answers influence later ones [93]. | Review the sequence of questions. | 1. Ensure questions flow logically from least to most sensitive [93]. 2. Separate potentially reactive questions [93]. 3. Randomize question blocks where possible. |
Expected Results: Implementation of these steps should lead to improved response rates and data quality, yielding more valid and reliable metrics on QRPs. If the issue persists: Consider using methodological triangulation (e.g., combining survey data with data audits) to validate findings [92].
Problem: An instrument developed to detect QRPs (e.g., in text or data) performs poorly during initial validation. Application: This guide assists researchers during the development and validation phase of a new QRP assessment scale or algorithmic tool [92].
| Problem Symptom | Likely Cause | Prerequisites to Check | Resolution Steps |
|---|---|---|---|
| Low inter-rater reliability (for manual tools) or low accuracy (for algorithmic tools) | Poorly defined operational criteria for QRPs; inadequate training for coders; flawed model training data. | Confirm that the conceptual framework for all QRPs is clear and comprehensive [92]. | 1. Gather content through literature review and expert consultation to refine criteria [92]. 2. Develop a detailed codebook with clear examples. 3. Re-train coders or re-train the algorithm with improved data. |
| Tool fails to generalize to new datasets | Overfitting during development; the tool is too specific to the original sample or context. | Check if the population/context for the original development is similar to the new study [92]. | 1. Collect a more diverse dataset for development. 2. Apply cross-validation techniques. 3. Recalibrate or adapt the tool for the new context [92]. |
| Poor construct validity | The tool does not adequately measure the theoretical construct of a QRP. | Verify that the tool's items/features map directly onto the theoretical construct. | 1. Conduct pilot focus groups to confirm themes and understanding [92]. 2. Perform statistical tests for validity (e.g., convergent, discriminant). 3. Revise the tool's items/features based on analysis. |
Expected Results: A more robust, reliable, and valid tool capable of consistently identifying QRPs across different contexts. If the issue persists: Re-evaluate the underlying definition of the QRP being targeted and consider a fundamental redesign of the detection methodology.
Q1: What is the most effective way to structure a questionnaire to minimize bias when asking about sensitive QRPs? A1: To minimize bias [93]:
Q2: How can I improve the reliability of a protocol for manually coding published papers for QRPs? A2: Improve reliability through rigorous coder training and calibration:
Q3: Our research involves analyzing signaling pathways potentially impacted by QRPs. How can we visually represent these complex relationships clearly? A3: Use standardized diagrams to map out pathways and workflows. Graphviz is an excellent tool for generating clear, reproducible diagrams from text-based code. See the "Mandatory Visualizations" section below for examples and code.
This protocol outlines the steps for developing and validating a new questionnaire designed to assess researchers' awareness and engagement in QRPs [92].
1. Conceptualization and Item Generation
2. Questionnaire Design
3. Pilot Testing
4. Finalization and Validation
This table details key resources and methodologies essential for conducting rigorous research into Questionable Research Practices.
| Item / Solution | Function in QRP Research |
|---|---|
| Validated Questionnaires (e.g., based on established scales) | Provides a reliable and standardized instrument for measuring self-reported attitudes and engagement in QRPs across different populations, allowing for comparability between studies [92] [93]. |
| Pre-Registration Protocol | A detailed plan for study hypotheses, methods, and analysis decisions filed before data collection begins. Serves as a benchmark to detect and prevent HARKing (Hypothesizing After the Results are Known) and selective reporting [94]. |
| Data Auditing Scripts (e.g., in R or Python) | Automated scripts used to screen datasets for statistical anomalies, inconsistencies, or patterns indicative of p-hacking or data fabrication (e.g., digit preference, implausible p-value distributions). |
| Inter-Rater Reliability (IRR) Framework (e.g., Cohen's Kappa calculation) | A statistical method to ensure consistency and agreement between multiple researchers when manually coding qualitative data or published manuscripts for the presence of QRPs. |
| Open Science Framework (OSF) | A collaborative platform to share pre-registrations, data, materials, and code. Promotes transparency and allows for direct examination of the research process, mitigating several QRPs. |
Addressing Questionable Research Practices requires a multifaceted approach that combines clear definitions, robust detection methodologies, proactive prevention strategies, and critical validation of scientific literature. The recent development of a comprehensive inventory of 40 QRPs provides a crucial foundation for standardized identification, while technological advances like AI screening tools offer promising detection capabilities. For biomedical and clinical research, implementing open science practices, preregistration, and transparent reporting represents the most effective path toward mitigating QRPs' damaging effects on scientific credibility and drug development. Future efforts must focus on cultural and institutional reforms that reduce perverse publication incentives while promoting research quality over quantity. By adopting these integrated strategies, the research community can strengthen the integrity of the scientific record, enhance the replicability of findings, and ultimately accelerate the development of reliable medical treatments and interventions.