Questionable Research Practices (QRPs) in Science: A Comprehensive Guide to Identification, Prevention, and Solutions for Researchers

Dylan Peterson Nov 26, 2025 390

This article provides a comprehensive guide for researchers and drug development professionals on identifying, addressing, and preventing Questionable Research Practices (QRPs).

Questionable Research Practices (QRPs) in Science: A Comprehensive Guide to Identification, Prevention, and Solutions for Researchers

Abstract

This article provides a comprehensive guide for researchers and drug development professionals on identifying, addressing, and preventing Questionable Research Practices (QRPs). Covering the latest research, including a newly defined inventory of 40 QRPs, the article explores foundational concepts, advanced detection methodologies, practical prevention strategies, and validation techniques. It addresses pressing issues like p-hacking, HARKing, and publication bias, while offering actionable solutions such as preregistration, open data, and AI-assisted screening tools. Designed for the biomedical and clinical research community, this guide synthesizes current evidence to foster research integrity, improve replicability, and build a more trustworthy scientific foundation for drug development and clinical applications.

Defining the Problem: Understanding Questionable Research Practices and Their Impact on Scientific Integrity

What Are QRPs? From the 'Bestiary of Questionable Researcher Practices' to a Standardized Definition

FAQ 1: What are Questionable Research Practices (QRPs)?

Answer: Questionable Research Practices (QRPs) are procedures or decisions in the research process that are not transparent, ethical, or fair, and are likely to produce misleading conclusions, typically in the interest of the researcher [1] [2] [3].

It is crucial to distinguish QRPs from other forms of poor scientific practice. The table below clarifies these differences.

Table 1: Defining QRPs in the Context of Research Misconduct

Category	Definition	Key Differentiator
QRPs	Methodologically unsound practices that threaten the validity and reliability of science, often motivated by a desire for positive results [1] [4] [5].	Ethical ambiguity; often not officially prohibited but pose a major threat to cumulative science [4] [5].
Research Misconduct	Clearly prohibited and proscribed practices, such as fabrication, falsification, and plagiarism [1] [3] [5].	Universally recognized as unacceptable and unethical.
Researcher Error	Non-motivated, accidental mistakes (e.g., accidental data loss) [1] [3].	Lack of intent to mislead.

FAQ 2: What are the most common QRPs I should look out for?

Answer: QRPs can occur at nearly every stage of the research lifecycle. The "Bestiary of Questionable Research Practices," a community-consensus project, has identified and categorized 40 different QRPs [1] [3] [6]. The most common and impactful ones are listed below.

Table 2: Common Questionable Research Practices (QRPs) and Their Impact

QRP Category	Specific QRP	Description	Impact on Research
Reporting & Analysis	HARKing (Hypothesizing After the Results are Known)	Formulating hypotheses after results are known to fit the data [7] [4] [2].	Undermines the hypothesis-testing nature of science; creates false positive findings [7] [2].
	P-hacking	Conducting multiple analyses or selectively reporting outcomes to produce a statistically significant result (p < 0.05) [7] [2] [8].	Inflates false-positive rates, skewing the scientific literature [2] [8].
	Selective Reporting / "Cherry-picking"	Reporting only results or studies that are significant or consistent with predictions, while omitting negative or insignificant results [7] [4] [2].	Presents a biased picture of an intervention's true effectiveness; known as the "file drawer problem" [7].
Data Collection & Management	Inadequate Record Keeping	Failing to keep careful, detailed records of the research process, including decisions and protocols [2] [5].	Makes replication impossible and obscures the research trail [2].
	Optional Stopping	Monitoring data collection and stopping only after a significant result is attained, without a pre-defined sample size [8].	Increases the likelihood of a false positive finding [8].
Collaboration & Authorship	Gift Authorship	Demanding or accepting authorship for work that does not meet established contribution criteria [4] [5].	Undermines credit and accountability.
	Insufficient Supervision	Failing to adequately mentor and oversee junior coworkers [5].	A highly prevalent mispractice that threatens both trust and truth in science [5].

The following diagram maps out how these QRPs can infiltrate different stages of a typical research workflow, highlighting critical points where integrity is at risk.

FAQ 3: How prevalent are QRPs, and what motivates researchers to use them?

Answer: Surveys indicate QRPs are unfortunately common. A meta-analytic study found about 12.5% of researchers admitted to engaging in at least one QRP, while other surveys have reported much higher prevalence rates, with some estimates suggesting one in two researchers has engaged in a QRP in the last three years [4] [2]. The motivation is often a combination of systemic pressure and individual factors.

Table 3: Factors Contributing to Engagement in Questionable Research Practices

Factor Category	Specific Factor	Explanation
Systemic & Institutional	"Publish or Perish" Culture	Pressure to publish frequently in high-impact journals for hiring, promotion, and funding [4] [2].
	Bias Toward Significant Results	Journals preferentially publish novel, statistically significant results, creating a bias against null findings [4] [8].
Individual & Cognitive	Researcher Degrees of Freedom	The many flexible decisions researchers must make during a study (e.g., on data exclusion, analysis choices) can be exploited to obtain desired outcomes [7] [4].
	Competing Contingencies	Researcher behavior is shaped by both the desire for valid scientific contributions and the tangible benefits of publication (prestige, job security). QRPs can emerge when the latter dominates [7].
	Competency Shortfalls	Lack of sufficient training in research ethics, methodology, or data analysis can lead to engagement in QRPs, sometimes unintentionally [4] [2].

FAQ 4: What are the methodologies for detecting QRPs in research?

Answer: Detecting QRPs relies on a combination of statistical techniques, methodological scrutiny, and cross-verification. The table below outlines key detection protocols.

Table 4: Experimental Protocols for Detecting Questionable Research Practices

Detection Method	Protocol Description	QRPs It Can Help Identify
Statistical Consistency Checks	Examining the distribution of p-values in a literature; a surplus of p-values just below 0.05 (p-value clustering) can indicate p-hacking [8].	P-hacking, optional stopping.
Replication Studies	Directly repeating a prior study's methodology to see if the same results are obtained. The inability to replicate findings can be a red flag [7] [8].	Selective reporting, HARKing, various QRPs.
Comparison with Unpublished Literature	Systematically comparing effect sizes from published studies with those from unpublished theses or dissertations. Larger effects in published work suggest a file drawer problem [7].	Selective reporting, publication bias.
Power Analysis Evaluation	Assessing the statistical power of a series of studies. An unrealistically high rate of significant results across multiple underpowered studies suggests non-reported null findings [8].	Selective reporting, the file drawer problem.

The Scientist's Toolkit: Essential Reagents for Research Integrity

Adopting improved research practices and tools is the most effective way to prevent QRPs. The following table details key "reagents" for ensuring robust and transparent science.

Table 5: Research Reagent Solutions for Preventing QRPs

Tool / Practice	Function	How It Mitigates QRPs
Pre-registration	Publicly documenting a study's hypotheses, design, and analysis plan before data collection begins [2].	Directly counters HARKing and p-hacking by creating a time-stamped, unchangeable record of intent.
Registered Reports	A publication format where journals peer-review and provisionally accept studies before data is collected, based on the proposed methodology [4].	Removes publication bias against null results, reducing the incentive for selective reporting and p-hacking.
Open Data & Code	Publicly sharing de-identified raw data and analysis scripts alongside the publication.	Allows for full independent verification of results, deters selective reporting and flexible data analysis.
Citation Manager	Using software (e.g., Zotero, Mendeley) to organize and format references [2].	Helps prevent improper referencing and citation plagiarism.
Detailed Lab Protocols	Maintaining standardized, detailed records of all research procedures and decisions [2].	Prevents inadequate record keeping and makes the research process transparent and replicable.
Contributorship Model	Using explicit criteria (e.g., CRediT taxonomy) to define and disclose each author's specific contributions [2].	Helps eliminate gift and ghost authorship by ensuring accountability.

FAQ 5: What improved practices can I implement to avoid QRPs?

Answer: Moving from questionable to improved research practices requires a conscious shift in methodology. The key is to prioritize transparency and pre-commitment.

For Study Design: Pre-register your study and analysis plan. This simple act locks in your hypotheses and methods, making HARKing impossible and constraining p-hacking [2]. When possible, opt for a Registered Report to ensure your work is judged on its methodological rigor, not its results [4].
For Data Collection & Analysis: Adhere to your pre-registered plan. Pre-define your sample size using an a priori power analysis and stick to it, avoiding optional stopping [2]. Create and follow a pre-defined set of outlier exclusion criteria to prevent selective data cleaning [2] [8].
For Reporting & Collaboration: Report everything transparently, including all manipulated conditions, collected measures, and statistically non-significant results [2]. Share your data and code openly to enable verification. Use a contributorship model to assign authorship fairly and accurately [2] [5].

Questionable Research Practices (QRPs) are activities that, while not necessarily classified as outright fraud, violate principles of research transparency, ethics, and rigor [2]. They occupy a concerning gray area in scientific research, situated between deliberate misconduct (such as data fabrication) and honest error. The prevalence of QRPs is a significant concern; one estimate suggests that one in two researchers has engaged in at least one QRP over a three-year period [2]. These practices have been identified as a key contributor to the "replication crisis" observed in numerous scientific fields, where subsequent studies fail to reproduce the findings of originally published research [7] [2]. This erosion of reliability threatens the integrity of the scientific record and can lead to wasted resources and misguided clinical or policy decisions.

Defining the Spectrum: From Misconduct to Honest Error

Navigating the landscape of research integrity requires a clear understanding of the distinctions between different types of problematic practices. The following table outlines the core definitions that form this spectrum.

Table 1: Categorizing Problematic Research Practices

Category	Definition	Key Characteristics	Examples
Research Misconduct (FFP)	Fabrication, falsification, or plagiarism in proposing, performing, or reviewing research, or in reporting research results [9].	Defined by U.S. federal policy; involves clear, intentional deception.	Making up data (fabrication); manipulating research materials or omitting data to misrepresent results (falsification); appropriating another's ideas or words without credit (plagiarism) [9].
Questionable Research Practices (QRPs)	Decisions during the research process that raise questions regarding the work's rigor and precision, often motivated by pressure to publish [2].	Often reside in ethical gray areas; can be challenging to detect; may be unintentional but have detrimental effects.	Selective reporting of results; p-hacking; HARKing; failing to share data; not accurately documenting the research process [7] [2].
Detrimental Research Practices	A broader category of actions that violate research values and can damage the research enterprise [9].	Encompasses a wider range of behaviors than QRPs, including poor mentorship and abusive work environments.	Refusing to share research materials or data with other researchers; inappropriate authorship; inadequate supervision or mentorship [9].
Honest Error	Unintentional mistakes in the recording, selection, or analysis of data [9].	Not intentional; does not constitute misconduct.	Errors of judgment; slips in data entry; differences of opinion in data interpretation [9].

The diagram below illustrates the functional relationship between external pressures, researcher behavior, and the resulting impact on the scientific literature.

Troubleshooting Guide: Common QRPs and Solutions

This section is structured as a technical support guide to help researchers identify, diagnose, and resolve common QRPs in their work.

FAQ: What are the most common QRPs and how can I avoid them?

Q1: I've collected some data, but the results are not statistically significant. I noticed one outlier that seems to be skewing the results. Is it acceptable to remove it?

Problem Identified: This is a form of p-hacking, which involves running multiple analyses or manipulating data until a statistically significant result is achieved [2].
Root Cause: The pressure to publish statistically significant findings, which are often perceived as more valuable and publishable than null results [2].
Solution:
- Pre-registration: Create a detailed analysis plan before collecting data. This plan should include pre-defined, justified criteria for excluding data points (e.g., based on pre-specified technical errors) [2].
- Transparency: If you remove data post-hoc, you must report this transparently in your manuscript, explicitly state the reasoning, and conduct sensitivity analyses to show how the inclusion of the data affects the results.

Q2: My data suggest an interesting relationship I hadn't initially predicted. Can I write my paper's introduction and hypothesis as if I had predicted this all along?

Problem Identified: This is known as HARKing (Hypothesizing After the Results are Known) [7].
Root Cause: A desire to present a clean, compelling narrative that makes the research appear more confirmatory and insightful than it actually was [7].
Solution:
- Clearly distinguish between a priori hypotheses and post hoc explanations in your manuscript.
- Frame exploratory findings as such. It is scientifically valid to report unexpected discoveries, but they must be presented as hypothesis-generating rather than hypothesis-testing.

Q3: I ran three experiments, but only one produced clear, positive results. Can I just write up and submit the successful one?

Problem Identified: This is selective reporting or "cherry-picking" [2].
Root Cause: The file drawer effect, where studies with null or negative results are less likely to be submitted for publication, leading to a biased literature [7].
Solution:
- Report on all studies conducted, including failed ones. You can note them in the manuscript or as supplementary information.
- Consider platforms that publish null or negative results.
- For a series of experiments, the entire process should be reported to give an accurate picture of the research trajectory.

Q4: My research process evolved during the project. Do I need to document every little change?

Problem Identified: Not accurately recording the research process [2].
Root Cause: A lack of time, perceived inefficiency, or not recognizing the importance of detailed documentation.
Solution:
- Write detailed protocols: Maintain a detailed lab notebook or digital log that records all steps, decisions, and changes throughout the project [2].
- Use version control: For code and analysis scripts, use version control systems (e.g., Git) to track all changes.
- The standard for documentation is that someone else should be able to understand your process and replicate it exactly.

Quantitative Data on QRP Prevalence

The scale of the issue is significant, as evidenced by large-scale studies. The following table summarizes findings from one such analysis of randomized controlled trials.

Table 2: Indicators of Questionable Research Practices in 163,129 Randomized Controlled Trials [10]

Indicator of QRP	Definition	Prevalence in Studied RCTs
Implausible Baseline Characteristics	A statistical indicator suggesting that reported baseline data may be too perfect to be realistic.	Identified in a portion of the analyzed trials, suggesting potential selective reporting or data manipulation.
Inconsistent Reporting	Discrepancies in data reported between the main text, tables, and figures of a publication.	A common issue found across a significant number of trials, impacting the reliability of the published record.

The Scientist's Toolkit: Frameworks for Improved Research Practices

Adopting structured frameworks at the outset of a research project is one of the most effective ways to prevent QRPs.

Research Question Formulation: The PICO & FINER Frameworks

A well-constructed research question is the foundation of rigorous science. Using frameworks like PICO and FINER ensures all critical components are considered a priori [11] [12].

Table 3: The PICO Framework for Structuring a Research Question [11] [12]

Component	Definition	Example: "Good"	Example: "Best"
P (Patient/Population)	The subjects of interest.	Adult patients with type II diabetes.	Adult patients (18-64 years old) with uncontrolled type II diabetes (A1c >7%) in a primary care setting.
I (Intervention/Exposure)	The action or exposure being studied.	Pharmacist-led education.	Three visits over 12 months with an ambulatory care pharmacist providing nutritional guidance and medication adjustments.
C (Comparison)	The alternative to compare against.	Usual care.	Three visits over 12 months with a primary care physician.
O (Outcome)	The effect being evaluated.	Change in blood glucose.	Percent change in A1c from baseline at 3, 6, and 12 months.

The FINER criteria (Feasible, Interesting, Novel, Ethical, Relevant) help evaluate the practical aspects and value of a research question [11]. The diagram below illustrates the workflow for developing a robust research plan using these tools.

Research Reagent Solutions: Essential Tools for Research Integrity

Beyond laboratory reagents, a modern scientist's toolkit must include resources that support methodological rigor and transparency.

Table 4: Key Resources for Implementing Improved Research Practices

Tool / Resource	Category	Primary Function in Mitigating QRPs
Pre-registration Platforms (e.g., OSF, AsPredicted)	Protocol Planning	Allows researchers to publicly archive their hypotheses, methods, and analysis plan before data collection, combating HARKing and p-hacking [2].
Citation Managers (e.g., Zotero, Mendeley)	Writing & Dissemination	Helps ensure accurate and complete referencing, preventing improper citation and plagiarism [2].
Version Control Systems (e.g., Git, SVN)	Data & Code Management	Tracks all changes to code and documentation, creating a transparent and auditable research trail [2].
Registered Reports	Publishing Format	A journal format where the study design and proposed analyses are peer-reviewed before data collection, reducing publication bias against null results [2].

Distinguishing between research misconduct, QRPs, and honest errors is critical for upholding scientific integrity. While misconduct (FFP) represents clear and intentional violations, QRPs often stem from a complex interplay of external pressures and researcher degrees of freedom [7] [2]. The good news is that the scientific community has developed powerful "tools" to combat QRPs, including pre-registration, open science frameworks, and structured guidelines like PICO and FINER. By integrating these improved research practices into their daily workflow, researchers and drug development professionals can protect their work from questionable practices, enhance its validity and reproducibility, and actively contribute to a more trustworthy and reliable scientific literature.

Questionable Research Practices (QRPs) are activities that exist in an ethical grey area between sound scientific conduct and outright scientific misconduct (fabrication, falsification, and plagiarism) [4]. These practices threaten scientific integrity by undermining the reliability and validity of scientific knowledge, contributing to what is often termed the "replication crisis" in science [2] [4]. QRPs are concerning due to their high prevalence; one survey found that 51.3% of academic researchers engaged in at least one QRP frequently over a three-year period [13].

The most common QRPs include p-hacking, HARKing, and selective reporting, which collectively distort the scientific literature by inflating false positive rates and creating a skewed representation of research findings [14] [4]. These practices are often driven by a "publish or perish" culture that incentivizes researchers to produce statistically significant, novel results for publication [14] [2]. This technical guide provides identification methods, consequences, and solutions for these QRPs to support research integrity in biomedical sciences.

Troubleshooting Guide: Identifying Common QRPs

p-Hacking (Data Dredging)

Definition: p-hacking occurs when researchers repeatedly analyze data in different ways until they obtain a statistically significant result (p < 0.05) [14] [15]. Also known as "data dredging" or "data snooping," this practice involves testing multiple hypotheses without proper statistical correction [16].

Common Manifestations:

Trying different statistical analyses until significance is achieved [15]
Including/excluding covariates to push p-values below 0.05 [14]
Collecting more data after initial analysis shows non-significant results [2]
Experimenting with different cut-off values or outlier removal criteria [15]
Studying different subgroups until a significant effect is found [14]

Table 1: p-Hacking Detection Indicators

Indicator	Description	Statistical Consequence
Multiple testing without correction	Running many statistical tests but only reporting significant ones [16]	Increased false positive rate; with α=0.05, 1 in 20 tests will be significant by chance alone [14]
Optional stopping	Collecting data incrementally and stopping when significance is reached [16]	Substantial inflation of Type I error rates [14]
Post-hoc data exclusion	Removing outliers after seeing their impact on results [16]	Altered statistical significance that doesn't reflect true effect [14]
Covariate manipulation	Adding or removing covariates to achieve significance [14]	Increased likelihood of false positive findings [14]
Selective reporting of outcomes	Measuring multiple variables but only reporting those with significant results [2]	Skewed literature with inflated effect sizes [14]

Experimental Protocol for Identification:

Assess analytical flexibility: Document all potential analytical choices available before data collection [14]
Track decision changes: Record any deviations from planned analysis and their justification [2]
Apply statistical corrections: Use Bonferroni, False Discovery Rate, or other multiple testing corrections for exploratory analyses [14]
Simulate null effects: Conduct simulations where no true effect exists to estimate false positive rates in your analytical approach [14]

HARKing (Hypothesizing After Results are Known)

Definition: HARKing involves presenting a post-hoc hypothesis (developed after seeing the results) as if it were an a priori hypothesis (developed before the study) [15] [4].

Common Manifestations:

Formulating hypotheses based on unexpected significant findings [15]
Reinterpreting study aims after data analysis to align with significant results [4]
Presenting exploratory findings as confirmatory without proper acknowledgment [15]

Table 2: HARKing vs. Appropriate Practice

Aspect	HARKing (QRP)	Appropriate Practice
Hypothesis timing	Presented as formulated before data collection	Clearly stated as developed after data inspection [15]
Interpretation	Findings presented as confirmatory	Findings explicitly labeled as exploratory or hypothesis-generating [15]
Context in paper	Introduction written/reworked to imply a priori prediction	Discussion acknowledges post-hoc nature and need for confirmation [15]
Statistical interpretation	P-values interpreted as confirming hypothesis	P-values recognized as potentially reflecting chance findings [15]

Experimental Protocol for Prevention:

Pre-registration: Register hypotheses, primary outcomes, and analysis plans before data collection [2]
Timestamp documentation: Maintain dated records of hypothesis formulation [2]
Explicit labeling: Clearly distinguish between confirmatory and exploratory analyses in publications [15]
Cross-validation: Test post-hoc hypotheses in independent datasets [15]

Selective Reporting (Cherry-Picking)

Definition: Selective reporting involves presenting only favorable results that support the researcher's hypothesis while concealing unfavorable or non-significant findings [2] [15].

Common Manifestations:

Reporting only statistically significant outcomes from multiple measured variables [2]
Omitting studies or experiments that failed or showed null results [2]
Failing to report conditions that didn't work as intended [2]
Discarding data subsets that contradict desired conclusions [15]

Experimental Protocol for Mitigation:

Comprehensive outcome registration: Pre-specify all primary and secondary outcomes [2]
Data collection standards: Establish clear data inclusion/exclusion criteria before data collection [2]
Full disclosure reporting: Report all conducted analyses and experiments, regardless of outcome [2]
Negative results repository: Utilize platforms for publishing or depositing null findings [2]

Quantitative Evidence on QRP Prevalence

Table 3: QRP Prevalence Among Researchers

QRP Type	Prevalence	Study Context
Fabrication	4.3% (95% CI: 2.9, 5.7) [13]	Dutch academic researchers over 3 years
Falsification	4.2% (95% CI: 2.8, 5.6) [13]	Dutch academic researchers over 3 years
Any frequent QRP	51.3% (95% CI: 50.1, 52.5) [13]	Dutch academic researchers over 3 years
Selective reporting of outcomes	Up to 66% in specific scenarios [17]	Biomedical doctoral students facing dilemmas
Excluding data after analysis	39% [17]	Psychology researchers
Failing to report all variables	Approximately 50% [17]	Psychology researchers

Table 4: Factors Associated with QRP Engagement

Factor	Impact on QRPs	Evidence Source
Career stage	PhD candidates/junior researchers had increased odds of frequent QRP engagement (OR: 1.59, 95% CI: 1.32, 1.92) [13]	Dutch national survey
Gender	Male researchers had higher odds of frequent QRP engagement (OR: 1.33, 95% CI: 1.18, 1.50) [13]	Dutch national survey
Publication pressure	Associated with more frequent QRPs (OR: 1.22, 95% CI: 1.14, 1.30) [13]	Dutch national survey
Scientific norm subscription	Associated with less research misconduct (OR: 0.79; 95% CI: 0.63, 1.00) [13]	Dutch national survey
Perceived detection likelihood	Reviewer detection associated with less misconduct (OR: 0.62, 95% CI: 0.44, 0.88) [13]	Dutch national survey

Frequently Asked Questions (FAQs)

Q1: What's the difference between exploratory analysis and p-hacking? A1: Exploratory analysis explicitly acknowledges its hypothesis-generating nature and treats findings as preliminary, requiring confirmation. p-hacking conceals the analytical flexibility and presents results as confirmatory without acknowledging multiple testing [15] [16]. The key distinction is transparency about the analytical process and appropriate statistical interpretation.

Q2: Is it ever acceptable to analyze data without pre-specified hypotheses? A2: Yes, exploratory data analysis is valid when properly framed as hypothesis-generating rather than confirmatory [15]. The critical requirement is clear communication that findings are preliminary and require independent validation, with statistical interpretations adjusted for multiple testing [15].

Q3: How can I prevent selective reporting in my lab? A3: Implement study pre-registration, establish standardized data collection protocols, maintain comprehensive lab notebooks documenting all experiments (including failures), and create a culture that values negative results as much as positive findings [2]. Some labs establish "lab journals" where all experimental attempts are recorded regardless of outcome.

Q4: What are the most effective safeguards against HARKing? A4: Study pre-registration (particularly registered reports) is the most effective defense [2]. Additional safeguards include timestamped electronic records of hypothesis generation, explicit labeling of exploratory analyses in manuscripts, and separate sections for confirmatory versus exploratory findings in publications [15].

Q5: Are QRPs always intentional misconduct? A5: No, research suggests QRPs exist on a spectrum from intentional misconduct to unintentional poor practices [18]. Many researchers engage in QRPs without full awareness of their consequences due to inadequate training, cognitive biases, or organizational pressures [14] [4]. However, the consequences are similarly damaging regardless of intent [18].

Research Reagent Solutions

Table 5: Essential Resources for QRP Identification and Prevention

Resource Type	Specific Tools	Function and Application
Pre-registration platforms	Open Science Framework (OSF), ClinicalTrials.gov, BMJ Open	Document hypotheses, methods, and analysis plans before data collection to prevent HARKing and selective reporting [2]
Statistical power tools	Superpower, pwr package in R	Conduct a priori power analysis to ensure adequate sample sizes and prevent data collection manipulation [2]
Data documentation tools	Electronic lab notebooks, version control (Git)	Maintain comprehensive records of all research decisions and data manipulations [2]
Multiple testing correction	Bonferroni, False Discovery Rate, permutation tests	Adjust significance thresholds for multiple comparisons to control false positive rates [14]
Data sharing platforms	Dryad, Zenodo, institutional repositories	Share complete datasets to enable verification and transparency [2]
Registered Reports	Journal format available at Cortex, Comprehensive Results in Social Psychology	Peer review of methods before results are known, guaranteeing publication regardless of findings [4]

p-Hacking, HARKing, and selective reporting represent three prevalent QRPs that significantly impact biomedical research validity. Evidence indicates these practices are widespread, with over 50% of researchers frequently engaging in at least one QRP [13]. The resulting distortion of the scientific literature undermines research reproducibility, wastes resources, and erodes public trust in science [4].

Successful mitigation requires both individual and systemic approaches, including enhanced education about QRPs, widespread adoption of pre-registration, implementation of open science practices, and cultural shifts within research institutions to reward rigorous methodology rather than solely novel, positive results [2] [4] [13]. By implementing the troubleshooting guides and protocols outlined in this document, researchers can significantly reduce QRP prevalence and enhance the reliability of biomedical research.

Frequently Asked Questions (FAQs)

1. What are Questionable Research Practices (QRPs)? Questionable Research Practices (QRPs) are activities during the research process that are not fully transparent, ethical, or fair, and thus threaten the integrity and reproducibility of scientific findings [2]. They are not always technically illegal or considered outright misconduct, but they dangerously undermine the credibility of research. A 2012 study suggested that an estimated one in two researchers has engaged in at least one QRP in a three-year period [2].

2. What is the "replication crisis"? The replication crisis, also known as the reproducibility crisis, refers to the growing observation across many scientific fields that a substantial number of published study results cannot be reproduced or replicated by other researchers [19]. This is a fundamental problem because the reproducibility of empirical results is a cornerstone of the scientific method [19]. High-profile projects, such as the Open Science Collaboration's attempt to replicate 100 psychology studies, found that only about 39% of the replicated effects were consistent with the original claims [7].

3. How do QRPs directly contribute to irreproducible findings? QRPs inflate the rate of false-positive results, making it seem like an effect exists when it does not [2]. When research findings are a product of selective reporting or statistical manipulation rather than a true underlying effect, subsequent studies will inevitably fail to reproduce them. This creates a scientific literature filled with false leads, wasting resources and eroding public trust [20]. A large-scale analysis of 163,129 randomized controlled trials found direct indicators of these questionable practices [10].

4. What is the difference between "reproducibility" and "replicability"? While sometimes used interchangeably, these terms have distinct meanings [21].

Term	Description	Significance
Repeatability	The same researchers obtain the same result using the same methods, conditions, and location multiple times.	Measures precision under repeated, identical conditions.
Replicability	A different set of researchers arrives at the same results using the same methods and conditions as the original study.	The result is not due to chance or a local experimental artifact.
Reproducibility	A different group of researchers arrives at the same results using their own data and methods to verify the original analysis.	Answers whether the data and analysis support the published result and conclusions.

5. Besides QRPs, what other factors cause the replication crisis? Multiple interconnected factors are at play:

Perverse Incentives ("Publish or Perish"): A research culture that rewards publication in high-impact journals above all else creates pressure to produce novel, statistically significant results, which can incentivize QRPs [20].
Lack of Transparency: Insufficient detail in published methods, unavailable data and code, and poorly characterized reagents (like antibodies) make it impossible for others to reproduce the work exactly [20] [22].
Misunderstanding of Statistics: A widespread misconception is that a p-value (e.g., p < 0.05) measures the probability that a result is true or repeatable. In reality, it provides no direct information about the likelihood of replication [23].

Troubleshooting Guide: Diagnosing and Fixing QRPs in Your Lab

This guide helps you identify common symptoms of QRPs in research outcomes and provides step-by-step protocols to address the root causes.

Problem 1: Selective Reporting and the File Drawer Effect

Symptoms: The body of published literature on a topic shows overwhelmingly positive results, but your own attempts to reproduce them fail. Systematic reviews find that published studies have larger effect sizes than unpublished dissertations or pre-registered reports [7].

Root Cause: The tendency to only submit, or for journals to only accept, studies with "positive" or statistically significant results, while withholding null or negative results [20]. This distorts the scientific record, making an intervention seem more effective than it is.

Experimental Protocol to Mitigate Selective Reporting:

Pre-register your study.
- Action: Before data collection, submit your research hypotheses, primary and secondary outcome measures, experimental design, and planned analysis strategy to a publicly accessible registry (e.g., OSF Registries, ClinicalTrials.gov) [2].
- Purpose: Creates an irreversible, time-stamped record of your intentions, distinguishing between confirmatory and exploratory analyses.
Adopt Registered Reports.
- Action: Submit your introduction, methods, and proposed analysis plan to a journal for peer-review before you conduct the study.
- Purpose: If the study protocol is sound, the journal commits to publishing the final paper regardless of the outcome (positive, null, or negative), effectively eliminating the file drawer effect [2].
Publish all results.
- Action: Make a concerted effort to publish or share null and negative results through appropriate channels, such as preprint servers, journals dedicated to null results, or institutional repositories.
- Purpose: Provides a complete and honest picture of the research landscape, preventing other scientists from wasting time and resources.

Problem 2: P-hacking and Data Dredging

Symptoms: A study reports a just-significant p-value (e.g., p = 0.048) and the result feels fragile. The authors may have tested multiple variables or analytical paths but only reported the one that worked.

Root Cause: The practice of repeatedly analyzing data in different ways (e.g., excluding outliers, combining variables, trying different covariates) until a statistically significant result is found [20] [2]. This dramatically increases the false-positive rate.

Experimental Protocol to Mitigate P-hacking:

Define your analysis plan upfront.
- Action: As part of your pre-registration, specify exactly how you will handle the following:
  - Primary Outcome: The single most important measure for testing your hypothesis.
  - Data Exclusion Rules: Pre-defined, justified criteria for excluding data points (e.g., technical failure).
  - Covariates: Which covariates will be included and why.
- Purpose: Removes "researcher degrees of freedom" after the data are seen [2].
Determine sample size a priori.
- Action: Conduct a statistical power analysis before beginning your study to determine the sample size needed to detect a realistic effect size with adequate power (typically 80% or more).
- Purpose: Prevents the practice of collecting data until a significant p-value is reached or stopping early because a result is significant [2].
Report all analyses transparently.
- Action: In your manuscript, disclose all statistical tests and variables that were considered, even those that were non-significant.
- Purpose: Allows readers to assess the true robustness of the findings and avoids presenting a cherry-picked story.

Problem 3: HARKing (Hypothesizing After the Results are Known)

Symptoms: A research paper's introduction presents a compelling, post-hoc explanation for a complex finding as if it were the driving hypothesis all along. The discussion may over-interpret exploratory results.

Root Cause: Presenting unexpected or post-hoc findings as if they were predicted a priori [7] [20]. This misleads readers about the strength of the evidence and the hypothetico-deductive process that was actually followed.

Experimental Protocol to Mitigate HARKing:

Clearly separate hypotheses.
- Action: In your writing, use clear language to distinguish between pre-registered, confirmatory hypotheses and exploratory, data-driven hypotheses that emerged from the analysis.
- Purpose: Maintains intellectual honesty and allows readers to appropriately weight the evidence for each claim.
Use holdout samples.
- Action: If you discover an unexpected but interesting pattern in your initial dataset (Dataset A), formulate a new hypothesis and then test it on a completely new, independent dataset (Dataset B).
- Purpose: Validates whether an exploratory finding is a genuine discovery or a statistical fluke.

The Scientist's Toolkit: Key Reagent Solutions for Rigorous Research

Item	Function in Ensuring Rigor & Reproducibility
Pre-registration Platforms (e.g., OSF, AsPredicted)	Creates a time-stamped, public record of the research plan to combat HARKing, p-hacking, and selective reporting [2].
Citation Managers (e.g., Zotero, Mendeley)	Helps ensure proper and accurate attribution of ideas and techniques, avoiding improper referencing which is a common QRP [2].
Open Data Repositories (e.g., Zenodo, Figshare)	Provides a platform to share raw data, making findings reproducible and allowing for reanalysis, which helps detect errors and QRPs [22].
Version Control Systems (e.g., Git)	Tracks every change made to code and analysis scripts, providing a complete audit trail and ensuring computational reproducibility [22].
Electronic Lab Notebooks (ELNs)	Facilitates rigorous and detailed record-keeping of all procedures, materials, and decisions, which is foundational for reproducibility [2].

Visualizing the Pathway from QRPs to the Replication Crisis

The following diagram illustrates the logical relationship between systemic pressures, researcher decisions, and the ultimate crisis of irreproducibility.

Understanding the scope and nature of Questionable Research Practices (QRPs) requires robust quantitative methods. Prevalence studies and surveys are essential tools for systematically measuring how often these practices occur, identifying risk factors, and evaluating the effectiveness of interventions designed to improve research integrity. This guide provides methodologies and resources for conducting such quantitative investigations into researcher behaviors.

FAQ: Quantitative Methods in QRPs Research

What are the primary quantitative research designs for studying QRPs prevalence?

You can employ several quantitative designs, each with distinct strengths for investigating QRPs [24]. The choice depends on your research question, resources, and ethical considerations.

The table below summarizes the key designs:

Research Design	Goal	Best Scenarios for QRPs Research	Key Limitations
Cross-Sectional Survey [24] [25]	To provide a "snapshot" of the prevalence of QRPs at a single point in time.	Gauging self-reported frequencies of QRPs across a wide population of researchers; assessing opinions and experiences surrounding QRPs.	Cannot establish causality; prone to sampling bias and social desirability bias where researchers may under-report QRPs.
Longitudinal Research [25]	To track data and behaviors, such as QRP engagement, over an extended period.	Studying the long-term effects of integrity training programs; observing how QRP prevalence shifts throughout researchers' careers or in response to policy changes.	Can be expensive and time-consuming; subject to participant attrition over time.
Correlational Research [25] [26]	To examine relationships between variables without implying causation.	Investigating associations between QRP engagement and factors like career stage, publication pressure, research field, or specific laboratory environments.	Correlation does not prove causation; results can be misleading due to confounding variables.
Cohort Studies [24]	To understand the causes or temporal associations of an outcome by following groups over time.	Following a cohort of early-career researchers prospectively to identify predictors of later QRP engagement.	Evidence does not ensure causality; requires large samples and is vulnerable to attrition.
Experimental / Quasi-Experimental [24] [25]	To establish cause-and-effect relationships by manipulating variables.	Testing the efficacy of different educational interventions (e.g., ethics training, blinded data analysis) in reducing QRPs among research labs.	True experiments may not be feasible for ethical or practical reasons; quasi-experiments have threats to internal validity.

What are common questionable research practices (QRPs) I should measure?

QRPs are behaviors that compromise the replicability and validity of research findings [7]. While not exhaustive, the following table details common QRPs identified in the literature, which can form the basis of your survey items or data extraction criteria.

Questionable Research Practice	Description	Potential Quantitative Measure
Selective Data Reporting [7]	Selectively reporting positive or statistically significant results while omitting negative or insignificant results.	Percentage of researchers who admit to excluding non-significant data points from a manuscript.
p-hacking [7]	Selectively conducting data analyses to produce or enhance positive/statistically significant outcomes.	Frequency of trying various statistical tests until a significant p-value is obtained.
HARKing (Hypothesizing After the Results are Known) [7]	Formulating hypotheses after study outcomes are known to "fit" the data.	Self-reported incidence of presenting a post-hoc hypothesis as if it were a priori.
Selective Procedural Reporting [7]	Omitting possible confounds from procedural descriptions that could explain positive outcomes.	Rate of failing to report a key methodological detail that could explain the observed effect.
Selective Outcome Reporting [7]	Writing a paper's abstract or discussion to selectively downplay undesirable results and/or emphasize desired results.	Prevalence of downplaying non-significant findings in the discussion section of a paper.
Selective Recruiting [7]	Selectively recruiting participants into a treatment condition who are more likely to show positive effects and not reporting this.	Incidence of assigning participants to groups based on their pre-test responses to ensure a desired outcome.

How can I ensure the validity of my survey on sensitive QRPs?

Maximizing validity is challenging when researching sensitive topics like QRPs. Key threats and their mitigations are listed below [24].

Threat to Validity	Definition	Impact on QRPs Research	Recommended Mitigations
Selection Bias	Systematic differences between groups before the study.	Your sample may over-represent certain types of researchers (e.g., those already concerned about integrity), skewing prevalence rates.	Use stratified random sampling to ensure representation across fields, career stages, and regions. Statistically check for baseline differences.
Social Desirability Bias	Participants responding in a way they believe is socially acceptable rather than truthfully.	Researchers are likely to under-report their engagement in QRPs, leading to underestimation of prevalence.	Use anonymous, confidential surveys. Phrase questions neutrally and assure participants that honesty is valued for improving science.
Attrition (Mortality)	Participant dropout affects group representativeness.	In longitudinal studies, researchers who engage in QRPs may be more likely to drop out, biasing later results.	Maintain strong engagement with participants; use tracking methods and reminders.
Instrumentation	Changes in measurement tools or procedures.	If you modify your survey midway through a study, you cannot compare results from the two versions.	Keep survey instruments and procedures consistent throughout the data collection period.
History	External events affecting the study.	A high-profile case of research misconduct during your study could temporarily influence how participants respond.	Be aware of the research climate and consider its potential influence when interpreting data.

For every QRP, there is an improved, "integrity-positive" practice. Your research can measure the adoption of these improved practices as a positive outcome.

Preregistration: Publicly registering your hypotheses, methods, and analysis plan before conducting the study to prevent HARKing and p-hacking [7].
Data Blinding: During initial data collection and analysis, keeping the conditions blind to avoid conscious or unconscious bias [7].
Open Data & Code: Sharing raw data and analysis code publicly to allow for verification and reproducibility, countering selective data reporting.
Open Materials: Sharing detailed protocols and materials to combat selective procedural reporting.
Registered Reports: A publication format where peer review happens before results are known, focusing on the importance of the research question and soundness of the method, thereby reducing publication bias.
Sample Size Planning: Using power analysis or other methods to determine sample size a priori to avoid collecting data until a significant result is found.

Troubleshooting Guides

Issue: Low Response Rates on QRPs Surveys

Problem: You are not receiving a sufficient number of survey responses, threatening the external validity of your prevalence estimates.

Solution:

Pilot Your Survey: Test your survey with a small group to identify confusing questions, technical issues, and estimate completion time. A shorter, well-designed survey improves participation [25].
Ensure Anonymity: Clearly and prominently state that responses are anonymous and confidential. This is critical for overcoming social desirability bias and encouraging honest reporting of sensitive behaviors.
Use Multiple Channels: Distribute your survey through professional associations, university mailing lists, and relevant social media groups (e.g., on Twitter/X or LinkedIn) to reach a diverse audience.
Offer Incentives: If possible, offer incentives for participation, such as a entry into a lottery for a gift card or a summary of the aggregate findings.
Send Polite Reminders: Send follow-up emails or announcements to non-respondents, reminding them of the importance of the study.

Issue: Designing a Methodologically Sound Prevalence Study

Problem: You are unsure how to structure your study to produce reliable, generalizable data on QRPs.

Solution: Follow this workflow to design a robust prevalence study. The process involves defining your scope, choosing a design, developing your instrument, and executing your plan with careful attention to sampling and validity.

Issue: Differentiating Between Correlation and Causation in Findings

Problem: You observe a correlation (e.g., between high publication pressure and QRP engagement) and are tempted to state it implies causation.

Solution:

Explicitly Acknowledge Limitation: In your report or paper, clearly state that your correlational findings do not prove causation [25] [26].
Consider Alternative Explanations: Actively discuss confounding variables that could explain the observed relationship. For example, is "research environment" a factor that influences both pressure and QRPs?
Use Language Carefully: Use terms like "associated with," "linked to," or "related to," rather than "causes," "leads to," or "results in."
Suggest Future Research: Recommend longitudinal or experimental studies that could better test the causal relationship you identified.

The Scientist's Toolkit: Research Reagent Solutions for QRPs Identification Research

The following table details key "reagents" or essential components for conducting rigorous quantitative research into QRPs.

Item / Solution	Function in QRPs Research
Validated Survey Instruments	Pre-existing, psychometrically tested questionnaires (e.g., on scientific misconduct, perceived pressure) provide reliable and comparable measures across studies.
Statistical Analysis Software (e.g., R, Python, SPSS)	Essential for conducting descriptive statistics (e.g., prevalence rates), inferential tests (e.g., t-tests, ANOVAs to compare groups), and modeling relationships (e.g., regression analysis).
Online Survey Platforms (e.g., Qualtrics, REDCap)	Facilitate the efficient distribution of surveys to a wide audience, enable anonymous data collection, and often have built-in tools for basic data analysis.
Sample Frame (e.g., Professional Directory)	A comprehensive list of the population from which to draw your sample (e.g., membership lists of scientific societies, university faculty directories) to ensure proper sampling.
Data Management Plan	A formal plan outlining how data will be handled during and after the project, ensuring integrity, security, and future reproducibility of your research on research.
Preregistration Template	A template for preregistering your study's hypotheses and analysis plan on a platform like the Open Science Framework (OSF), demonstrating your commitment to improved practices.

Technical Support Center: QRP Identification & Resolution

Frequently Asked Questions (FAQs)

1. What are the most common signs that I might be engaging in a Questionable Research Practice (QRP)?

You may be engaging in a QRP if you find yourself:

Running multiple statistical analyses on a dataset until you obtain a statistically significant result ((p < 0.05)) [2].
Excluding data points (e.g., outliers) from your analysis without a pre-established, justified criterion for doing so [2].
Selectively reporting only the studies or experimental conditions in your research that "worked" or produced positive results, while omitting those that failed or showed null results [2] [27].
Formulating your hypothesis after the results are known (a practice known as HARKing) [2] [27].
Feeling pressure to produce novel, clean, or statistically significant findings to increase your chances of publication, funding, or career advancement [27] [28].

2. My results are not significant. Is it acceptable to collect more data until they become significant?

No, this is a form of p-hacking and is considered a QRP. Stopping data collection once a desired p-value is reached, or continuing to collect data until significance is achieved, dramatically increases the rate of false positives [2]. The solution is to determine your sample size a priori using a power analysis before you begin data collection and to adhere to this plan [2]. For ongoing data analysis, established methods like sequential analyses should be used instead of this ad-hoc approach [29].

3. I had to exclude some data. How can I ensure this is transparent and not a QRP?

The key to transparency is pre-specification. You must create a set of clearly defined, justified exclusion criteria before you begin data collection or analysis [2]. These criteria and the justification for them should be outlined in a pre-registration document or study protocol. When you write up your results, you must explicitly state which data were excluded and reference the pre-specified rule that justified the exclusion [2].

4. What is the single most effective practice to protect my research from QRPs?

Pre-registration is widely regarded as one of the most effective guards against QRPs [2] [28]. By creating a time-stamped, detailed analysis plan publicly available before conducting the study, you commit to your hypotheses and methods. This prevents later, post-hoc decisions from being presented as a priori predictions, thereby combating HARKing, p-hacking, and selective reporting [28].

5. Are there publication formats that reward rigorous methods over significant results?

Yes, Registered Reports (RRs) are a publication format designed specifically for this purpose [27] [28]. In an RR, the study introduction and methods are peer-reviewed before data is collected. If the protocol is sound, the journal provisionally accepts the paper for publication regardless of the eventual results. This format eliminates publication bias against null findings and aligns incentives toward methodological rigor [27].

Troubleshooting Guides

Issue: Suspected Selective Reporting in a Research Literature

Problem: A literature review in your field reveals an overwhelming proportion of studies with statistically significant results, and you suspect that studies with null findings are not being published.

Diagnosis and Resolution Protocol:

Identify the Symptom: Note an excess of statistically significant findings in published meta-analyses or literature reviews. For example, in one survey, 96% of standard psychology/psychiatry articles reported results supporting their hypothesis, compared to only 44% of pre-registered studies [27].
Run a Diagnostic Test: Conduct a funnel plot analysis to visually inspect for asymmetry, which can indicate publication bias. Statistical tests like Egger's regression can formally test for funnel plot asymmetry.
Implement the Fix:
- For your own research: Commit to publishing or archiving all study results, regardless of outcome, in preprint servers or institutional repositories [2].
- As a consumer of research: Critically evaluate meta-analyses that include unpublished data ("grey literature") and pre-registered studies, as they provide a more complete picture.
- As a field: Advocate for the adoption of Registered Reports by journals in your discipline to eliminate publication bias at its source [27].

Issue: Combating P-hacking in Data Analysis

Problem: A researcher runs multiple statistical tests on a dataset, only reporting the one that yielded a significant p-value, thereby inflating the false positive rate.

Diagnosis and Resolution Protocol:

Identify the Symptom: Unusually high rates of marginally significant results (e.g., p-values just below 0.05) in a literature, or a single study reporting a significant effect for one specific analysis while remaining silent on other, similar tests that were conducted.
Run a Diagnostic Test: Use p-curve analysis to detect the presence of p-hacking in a set of studies. A right-skewed p-curve indicates the presence of true effects, while a flat or left-skewed p-curve can indicate p-hacking or a lack of evidential value.
Implement the Fix:
- Pre-register your data analysis plan in detail before observing the data [2] [28].
- Use blind data analysis, where the data is manipulated (e.g., variables are renamed) before analysis to prevent conscious or unconscious bias.
- Correct for multiple comparisons using established methods (e.g., Bonferroni, False Discovery Rate) when conducting multiple hypothesis tests [2].
- Report all analyses conducted in a study, not just the significant ones, to provide full transparency.

Quantitative Data on QRPs and Solutions

Table 1: Documented Prevalence and Impact of Questionable Research Practices

QRP	Documented Prevalence	Primary Impact on Research
Selective Reporting	Up to 94% of psychologists admit to some form [27]. >25% of pre-registered clinical trials in psychiatry show evidence of it [27].	Distorts the evidence base; creates an inflated sense of effect certainty; contributes to replication failure [27] [28].
P-hacking	A survey found one in two researchers engaged in at least one QRP in the last 3 years [2].	Inflates false-positive rates; makes effects appear stronger than they are [2] [28].
HARKing	96% of standard articles report supported hypotheses vs. 44% of pre-registered studies, suggesting widespread HARKing [27].	Creates a literature of false, post-hoc hypotheses that cannot be reliably tested, hindering theoretical progress [27].
Publication Bias	98% of positive antidepressant trials were published vs. 48% of negative trials [27]. Statistically significant findings in psychiatry receive >2x the citations [27].	Renders the published literature unrepresentative of actual research findings; misinforms meta-analyses and policy [27] [28].

Table 2: Effectiveness of Open Science Interventions

Intervention	Key Documented Outcome	Advantage for Researcher
Registered Reports (RRs)	60% of RRs report null results (5x the rate in regular articles) [27]. RRs are perceived as higher quality and are cited similarly to or more than standard articles [27].	Eliminates publication bias; guarantees publication; reduces anxiety about results; focuses peer review on methodology [27].
Pre-registration	Creates a public record of hypotheses and analysis plans, making it easy to detect HARKing and deviations from the plan [2] [28].	Defends against accusations of QRPs; strengthens the credibility of your findings; improves study design.
Replication Studies	In economics, 61% of replication effect sizes were within a 95% prediction interval, but only 20% of replications had a p-value <0.05 for the original effect [28].	Provides a direct measure of the reliability of foundational findings in a field; corrects the scientific record.

Experimental Protocols for QRP Identification Research

Protocol 1: Assessing the Prevalence of HARKing in a Literature

Objective: To quantify the rate of HARKing in a specific research domain by comparing hypotheses stated in pre-registered studies versus non-pre-registered studies.
Materials:
- A sample of published research articles from the target domain from the last 5 years.
- Access to a pre-registration registry (e.g., OSF, ClinicalTrials.gov).
Methodology:
- Step 1: Identify and code a set of pre-registered studies. Record the primary hypothesis as stated in the pre-registration and in the final published paper.
- Step 2: Randomly select a matched set of non-pre-registered studies from the same journals and time period.
- Step 3: For all studies (pre-registered and non-), code whether the main results presented in the paper are presented as a direct test of an a priori hypothesis or as a post-hoc exploration.
- Step 4: Compare the rate of confirmed a priori hypothesis testing between the pre-registered and non-pre-registered groups. A significantly lower rate in the non-pre-registered group indicates probable HARKing.
Analysis: Use a chi-square test to compare the proportion of studies with confirmed a priori hypotheses between the two groups.

Protocol 2: A P-curve Analysis to Detect P-hacking in a Meta-Analysis

Objective: To assess the evidential value and likelihood of p-hacking in a collection of studies on a specific research question.
Materials:
- A set of statistically significant (p < .05) p-values from published studies on the chosen topic.
- Access to p-curve analysis software or code (e.g., the pcurve package in R).
Methodology:
- Step 1: Collect the test statistics and p-values for the key findings of each study in your sample.
- Step 2: Input the significant p-values (e.g., all p-values < .05) into the p-curve analysis tool.
- Step 3: Interpret the resulting p-curve:
  - Right-skewed curve: Suggests the presence of true underlying effects.
  - Flat curve: Suggests no evidential value, and that significant results may be due to p-hacking or selective reporting.
  - Left-skewed curve: Suggests the use of p-hacking or other QRPs.
Analysis: The p-curve tool provides tests for right-skewness (evidence of truth) and flattness (evidence of no evidential value).

Visualizing the Academic Incentive Structure and Solutions

Diagram 1: How incentives drive QRPs and potential solutions.

Diagram 2: Registered Reports workflow.

The Scientist's Toolkit: Essential Reagents for Rigorous Research

Table 3: Key Research Reagent Solutions for Preventing QRPs

Tool / Reagent	Primary Function	Role in QRP Mitigation
Pre-registration Platforms (e.g., OSF, AsPredicted, ClinicalTrials.gov)	Provides a time-stamped, public record of a study's hypotheses, design, and analysis plan.	Directly combats HARKing and p-hacking by creating an irreversible record of intent [2] [28].
Registered Report Format	A journal publication format where the study protocol is peer-reviewed and accepted before data collection.	Eliminates publication bias and selective reporting by guaranteeing publication based on methodological rigor, not results [27] [28].
Power Analysis Software (e.g., `pwr` package in R, G*Power)	Calculates the minimum sample size required to detect an effect with a given probability (power).	Prevents underpowered studies, which are a major contributor to false negatives and a motivator for p-hacking [2].
Data & Code Repositories (e.g., OSF, Zenodo, GitHub)	Provides a platform for publicly sharing research data and analysis code.	Enables transparency, reproducibility, and post-publication review, making it harder to hide selective reporting or analytical flexibility [27].
Citation Managers (e.g., Zotero, Mendeley)	Software to organize references and automatically format bibliographies.	Helps avoid improper referencing, a QRP that can constitute a form of plagiarism [2].

The Detection Toolkit: Methodologies and Techniques for Identifying QRPs in Research

Troubleshooting Guides

Guide 1: Handling Between-Study Inconsistency in Meta-Analysis

Problem: A meta-analysis shows conflicting results between studies, making it difficult to draw a reliable overall conclusion.

Diagnosis: Significant between-study inconsistency not explained by sampling error alone.

Solution:

Calculate Traditional Measures: Begin with the Cochran's (Q) statistic and the (I²) statistic to quantify the percentage of total variation across studies that is due to heterogeneity rather than chance [30].
Apply Advanced Tests: If the number of studies is small or the distribution of effects is suspected to be non-normal (e.g., skewed or heavy-tailed), use alternative (Q)-like statistics or a hybrid test that adaptively combines their strengths for more robust power [30].
Check for Outliers: Visually inspect forest plots and use statistical methods to identify if inconsistency is driven by one or a few outlying studies. The maximum of the standardized deviates can efficiently capture such inconsistency [30].
Model the Data: If heterogeneity is confirmed, use a random-effects model instead of a common-effect model to account for the between-study variance [30].

Guide 2: Identifying Biologically Implausible Values in Longitudinal Data

Problem: A large longitudinal dataset, such as child growth records, contains values that are technically possible but highly unlikely given an individual's previous measurements and known biology.

Diagnosis: Presence of population outliers and/or longitudinal outliers.

Solution:

Flag Population Outliers (POs): Use established external cut-offs (e.g., WHO z-score criteria for child growth data: L/HAZ < -6 and > +6; WAZ < -6 and > +5) to flag and remove values that are implausible for the reference population [31].
Flag Longitudinal Outliers (LOs):
- Action: After removing POs, fit a linear mixed-effects model with restricted cubic splines for age to the longitudinal data [31].
- Action: Calculate the scaled residuals (difference between observed and model-fitted values) [31].
- Action: Flag measurements where the absolute value of the scaled residual exceeds a predefined cutoff (e.g., 3, 4, 5, or 6 standard deviations) [31].
Clean Data: Remove both POs and LOs before final analysis. Note that in large datasets, removing POs often has a more substantial impact on summary statistics, while LOs are critical for accurate trajectory analysis [31].

Guide 3: Detecting Implausible Records in Categorical Data

Problem: A cancer registry or other database containing categorical patient information needs to verify record plausibility without feasible manual checks.

Diagnosis: Implausible records due to reporting or data entry errors.

Solution:

Choose an Unsupervised Anomaly Detection Method:
- Option A (Pattern-based): Use the FindFPOF algorithm, which identifies records that lack frequently occurring patterns in the dataset [32].
- Option B (Compression-based): Use an autoencoder, a neural network that compresses and then reconstructs data; records with high reconstruction error are flagged as anomalous [32].
Validate and Review:
- Action: Apply the chosen method to the full dataset to generate an anomaly score for each record.
- Action: Manually review a sample of the highest-scoring records with a domain expert to confirm implausibility. This can reduce manual effort by approximately 70% compared to random sampling [32].

Frequently Asked Questions (FAQs)

FAQ 1: What is the difference between heterogeneity and inconsistency in meta-analysis?

While often used interchangeably, heterogeneity is frequently paired with the random-effects model assumption that true effect sizes vary normally around a grand mean. Inconsistency is a broader term used to inclusively cover all types of between-study discrepancies, including those arising from subgroup effects or a few outlying studies, without assuming a normal distribution [30].

FAQ 2: Are all statistically significant findings likely to be true?

Not necessarily. Several Questionable Research Practices (QRPs), even if employed honestly, can increase the likelihood of false positives. These include:

P-hacking: Manipulating data or analyses until a statistically significant result is achieved [33] [34].
HARKing: Hypothesizing After the Results are Known, presenting a post-hoc hypothesis as if it were a priori [33].
Selective Reporting: Publishing only studies or outcomes with positive or significant results, also known as publication bias [18] [34].

FAQ 3: What are the most effective ways to prevent honest yet unacceptable research practices?

Promoting a culture of open science is key. Recommended strategies include [33] [18]:

Preregistration: Publishing your research hypotheses, methods, and analysis plan before data collection begins.
Blind Data Analysis: Analyzing data without knowing which group received which intervention to reduce confirmation bias.
Transparent Reporting: Clearly reporting all data exclusions, manipulations, measured variables, and study conditions.
Data and Code Sharing: Making raw data and analysis code publicly available for verification.

Data Detective Toolkit: Methods and Reagents

Table 1: Statistical Tools for Detecting Data Anomalies

Tool/Method	Data Type	Primary Function	Key Strength
Cochran's Q Test [30]	Summary (Meta-analysis)	Tests the null hypothesis that all studies in a meta-analysis share a common effect size.	Standard, widely accepted test for between-study heterogeneity.
Alternative Q-like & Hybrid Tests [30]	Summary (Meta-analysis)	Tests for inconsistency with higher power under non-normal between-study distributions (e.g., heavy-tailed, skewed).	More robust and flexible than the conventional Q test in many realistic scenarios.
I² Statistic [30]	Summary (Meta-analysis)	Quantifies the percentage of total variability in a meta-analysis due to between-study inconsistency.	Intuitive interpretation (e.g., I² of 50% indicates moderate inconsistency).
Population Outlier (PO) Identification [31]	Raw (Anthropometric)	Flags values that are biologically implausible for a reference population using pre-defined z-score cutoffs.	Relies on established, external standards (e.g., WHO growth charts).
Longitudinal Outlier (LO) Identification [31]	Raw (Longitudinal)	Flags values that are implausible given an individual's own data trajectory using model residuals.	Captures errors that cross-sectional population methods might miss.
FindFPOF [32]	Raw (Categorical)	An unsupervised pattern-based anomaly detection method for categorical data.	Does not require labeled data or pre-defined rules for implausibility.
Autoencoder [32]	Raw (Categorical/Numerical)	An unsupervised neural network that detects anomalies based on data compression and reconstruction error.	Can find complex, non-obvious error patterns in high-dimensional data.

Table 2: Key Concepts in Questionable Research Practices (QRPs) Identification

Concept	Definition	Potential Impact on Research
Questionable Research Practices (QRPs) [33] [18]	"Ways of producing, maintaining, sharing, analyzing, or interpreting data that are likely to produce misleading conclusions, typically in the interest of the researcher." Often deliberate.	Inflated effect sizes, reduced replicability, biased error rates, and compromised generalizability of findings [33].
Honest Yet Unacceptable Research Practices [18]	Unintentional mistakes or weaknesses in research conception, design, or reporting.	Despite being unintentional, these practices are wide-spread and can collectively damage scientific credibility and public trust [18].
P-hacking [33] [34]	A family of data manipulation practices (e.g., selectively excluding data, adding covariates) to achieve a statistically significant p-value.	Increases false-positive rates, leading to a literature filled with non-replicable findings.
HARKing [33] [34]	Hypothesizing After the Results are Known; presenting a post-hoc hypothesis as if it was defined a priori.	Creates a misleading narrative of strong confirmatory evidence and undermines the hypothetico-deductive process.
Publication Bias [18] [34]	The tendency for journals to publish only studies with statistically significant or "positive" results, while "negative" or null results remain unpublished.	Skews meta-analyses and systematic reviews, overestimating the true effect of an intervention.

Experimental Protocols for Data Detection

Protocol 1: Implementing a Hybrid Test for Inconsistency in Meta-Analysis

This protocol is based on methods proposed in BMC Medical Research Methodology (2025) [30].

Calculate Standardized Deviates: For each of the (k) studies in the meta-analysis, compute the standardized deviate (di = (yi - \hat{\mu}{CE}) / si), where (yi) is the observed effect size, (si) is its standard error, and (\hat{\mu}_{CE}) is the common-effect estimate from Eq. (1) [30].
Compute Multiple Test Statistics: Calculate a family of alternative (Q)-like statistics based on the sum of absolute values of (d_i) raised to different mathematical powers (e.g., square, cubic, maximum) [30].
Compute P-values: Derive the P-value for each test statistic under its respective null distribution [30].
Perform the Hybrid Test: The hybrid test statistic is the minimum P-value from the various tests. To control the Type I error rate, use a parametric resampling procedure to derive its null distribution and obtain an empirical P-value [30].
Interpretation: A significant hybrid test result indicates the presence of between-study inconsistency that is reliably detected by at least one of the component tests.

Protocol 2: Workflow for Identifying Implausible Electronic Health Records

This protocol adapts the real-world evaluation from BMC Medical Research Methodology (2023) [32].

Research Reagent Solutions

Table 3: Essential Analytical "Reagents" for Data Detection

Item	Function in Data Detection
Reference Standards (e.g., WHO Growth Charts) [31]	Provides a benchmark of biological plausibility for identifying population outliers in anthropometric and clinical data.
Linear Mixed-Effects Models [31]	Statistical models used to analyze longitudinal data by accounting for both fixed effects (e.g., age) and random individual-specific effects, enabling the calculation of residuals to flag longitudinal outliers.
Restricted Cubic Splines [31]	A flexible mathematical tool used within regression models to capture non-linear relationships (e.g., between age and growth) without assuming a strict linear or polynomial form.
Parametric Resampling (Bootstrapping) [30]	A computational procedure used to simulate the null distribution of a complex test statistic (e.g., for a hybrid test), allowing for accurate calculation of empirical P-values.
One-Hot Encoding [32]	A data pre-processing technique that converts categorical variables into a binary (0/1) matrix format, allowing them to be used by machine learning algorithms like autoencoders.

The replication crisis in psychological science and other fields has highlighted the detrimental effects of Questionable Research Practices (QRPs) [7]. QRPs are defined as "ways of producing, maintaining, sharing, analyzing, or interpreting data that are likely to produce misleading conclusions, typically in the interest of the researcher" [33]. In response, the research community has developed a suite of statistical forensic tools to detect inconsistencies in published research, thus helping to identify potential QRPs [35]. This technical support center provides detailed guidance on implementing three key analytical techniques—GRIM, SPRITE, and p-curve—enabling researchers, scientists, and drug development professionals to assess the trustworthiness of scientific findings.

Frequently Asked Questions (FAQs)

1. What are the main limitations of the p-curve technique? While p-curve was a pioneering forensic tool, recent technical critiques have identified significant statistical weaknesses. The formal hypothesis tests within the p-curve framework (e.g., for "evidential value") can exhibit properties such as inadmissibility and non-monotonicity. Furthermore, p-curve's average power estimator is inconsistent and can be substantially biased upward when the set of studies being analyzed has heterogeneous effect sizes or sample sizes [36]. For most applications, the z-curve method is now recommended as a more robust alternative, as it explicitly models heterogeneity and provides reliable confidence intervals [36].

2. The GRIM test shows an inconsistency. Does this prove data fabrication? No, an inconsistency discovered by the GRIM test does not, on its own, prove fabrication. The GRIM test is designed to evaluate the consistency of reported summary statistics [37]. A failed test means the reported mean is mathematically impossible given the stated sample size and scale granularity. This indicates a reporting error, which could range from a simple typo or rounding error to more serious issues like selective reporting or data manipulation [38] [37]. It is a "flag" that warrants further investigation and clarification from the original authors [38].

3. When should I use SPRITE over the GRIM test? Use the GRIM test as an initial, quick check when you only have access to a reported mean and sample size. The SPRITE technique is a more powerful follow-up when you have additional summary statistics, such as a standard deviation (SD). While GRIM can only determine if a specific mean is possible, SPRITE can be used to explore whether any plausible dataset exists that could simultaneously satisfy the reported mean, SD, and sample size [37]. SPRITE is therefore more comprehensive for identifying deeper inconsistencies.

4. Are these forensic techniques only applicable to psychology? Not at all. While they were largely developed and popularized within social psychology, the underlying logic is based on mathematical principles that are universal. The GRIM test, for instance, applies to any research involving small samples and data composed of whole numbers or Likert-type scales [37]. These methods have been applied to research in medicine, biology, and other fields where summary statistics are reported [10].

Troubleshooting Guides

GRIM Test Implementation

Problem: The calculated GRIM value is inconclusive.
- Solution: The GRIM test's conclusiveness depends on the number of decimal places in the reported mean and the sample size. For a given sample size (N), only certain mean values are mathematically possible. Ensure you are using the correct level of precision (e.g., 2 decimal places). If the result is still inconclusive, it may be that the reported value is one of the many possible means. In this case, the GRIM test cannot flag an inconsistency.
Problem: I am unsure if the GRIM test is appropriate for the data type.
- Solution: Refer to the table below. The GRIM test is most effective for data that is composed of integer values.

Data Type	Suitable for GRIM?	Notes
Likert Scales (e.g., 1-7)	Yes	The primary use case. Data are integers.
Counts (e.g., cells, incidents)	Yes	Data are whole numbers.
Age (reported in whole years)	Yes	Data are integers.
Continuous Measures (e.g., height, concentration)	No	Data are not necessarily integers, so the mean can have any value.
Percentages	Yes, with care	Can be treated as a mean on a 0-100 scale.

SPRITE Test Implementation

Problem: SPRITE cannot reconstruct a plausible dataset from the reported summary statistics.
- Solution: This is the core function of SPRITE—to detect inconsistencies. If no dataset can be found, it strongly suggests an error in the reported mean, SD, sample size, or a violation of the assumption that the data are integer-based. Double-check the input values. If they are correct, this result is a major "flag" indicating that the reported statistics are mutually incompatible [37].
Problem: The SPRITE analysis is running slowly.
- Solution: Computational time increases with sample size. For larger samples (e.g., N > 100), the search space for possible datasets becomes enormous. Consider using a more efficient optimization algorithm or constraining the possible values of the data points based on logical minimums and maximums (e.g., a 1-7 scale cannot have values of 0 or 8).

p-Curve and Z-Curve Implementation

Problem: p-curve results are unreliable due to heterogeneous studies.
- Solution: This is a known statistical limitation of the p-curve method [36]. It is recommended to transition to using z-curve. Z-curve uses a mixture model to account for heterogeneity in effect sizes across studies, providing more accurate estimates of average power and expected replication rates [36].
Problem: I have multiple p-values per study; which one should I include in a p-curve/z-curve?
- Solution: To maintain independence of data points, include only the most focal test from each study (e.g., the primary interaction effect in a factorial design, or the key comparison between the main conditions of interest). Including multiple dependent p-values from the same study can distort the results [36].

Analytical Workflows

The following diagram illustrates the general decision-making process for a forensic metascientific analysis, from initial flag to final conclusion.

Forensic Metascience Analysis Workflow [38]

The Scientist's Toolkit: Research Reagent Solutions

The table below lists key statistical tools and resources essential for conducting forensic metascientific analyses.

Tool Name	Type	Primary Function	Key Considerations
GRIM	Statistical Test	Checks if a reported mean is mathematically consistent with its sample size [37].	Best for small-N studies with integer data (e.g., Likert scales). A simple, first-pass check.
SPRITE	Statistical Test	Generates plausible datasets that fit reported mean, SD, and N. Checks if any such dataset can exist [37].	More powerful than GRIM. Used when SD is available to test deeper consistency.
P-Curve	Meta-Analytic Tool	Analyzes the distribution of statistically significant p-values in a set of studies to assess evidential value and estimate average power [36].	Has known statistical weaknesses; sensitive to heterogeneity. Use with caution [39] [36].
Z-Curve	Meta-Analytic Tool	Models the distribution of z-statistics to estimate expected replication rates and average power, accounting for heterogeneity [36].	Recommended modern alternative to p-curve. Provides robust estimates with confidence intervals [36].
StatCheck	Software Tool	Scans documents for statistical reporting inconsistencies, particularly mismatches between p-values and test statistics [40].	Useful for automated, large-scale checks of many papers at once.
R/Python	Programming Language	Provides environments for implementing custom forensic analyses and using specialized packages.	Essential for flexibility. Many forensic techniques (like SPRITE) require custom scripting [37].

Troubleshooting Guide: FAQs on AI-Driven Journal Screening

FAQ 1: Our AI tool flagged a journal as 'questionable,' but it is listed in a major database. What could be the reason for this discrepancy? This discrepancy can arise from several factors. The AI model may have detected specific features on the journal's website that are strong indicators of questionable practices, even if the journal has managed to get listed in a database. These features can include a very high volume of published articles, an overly broad scope, or an editorial board listing that includes reputable researchers without their consent [41]. It is crucial to use the AI's output as a reference for further investigation, not as an absolute judgment. The final decision should involve a human expert who can manually verify the journal's peer-review policy, editorial board authenticity, and other quality markers [42].

FAQ 2: We are encountering a high rate of false positives with our current journal screening tool. How can we improve its accuracy? High false-positive rates often indicate that the model's training data is not representative or its feature extraction is too sensitive. To improve accuracy:

Retrain with Balanced Data: Ensure the model is trained on a balanced dataset of both known legitimate and known predatory journals, using sources like the Directory of Open Access Journals (DOAJ) for legitimate journals and blacklists for predatory ones [42].
Refine Feature Extraction: Enhance the model's ability to identify meaningful feature words. Techniques like using differential scores (diff scores) to pinpoint words that appear with significantly different frequencies in predatory versus legitimate journals can improve precision [42].
Implement Model Interpretability: Use models that provide explanations for their predictions. This allows your team to understand why a journal was flagged and to refine the criteria accordingly [41].

FAQ 3: How can we validate the performance of a new AI-based screening system before full deployment? A robust validation protocol should be followed:

Split Your Data: Divide your labeled dataset of journals into a training set (e.g., 70%) and a testing set (e.g., 30%).
Establish Benchmarks: Test the AI system against a hold-out test set that it has not seen during training.
Measure Key Metrics: Evaluate performance using standard metrics. For example, one AI system achieved an accuracy of 99.4% in data extraction tasks in a controlled test, correctly identifying 1,502 out of 1,511 data points [43]. Another system was tested by human experts who reviewed its output, finding it correctly identified over 1,000 questionable journals, though it also made mistakes by flagging an estimated 350 legitimate ones [41].
Compare with Human Performance: Conduct a parallel analysis where both the AI and human experts screen the same set of journals to compare results and identify the AI's strengths and weaknesses [41].

Experimental Protocols for AI-Based Screening

Protocol 1: Automated Identification of Questionable Journals

This protocol outlines the methodology for building an AI system to screen journals, based on systems described in recent research [41] [42].

Objective: To automatically screen scientific journals and flag those with characteristics of being 'questionable' or 'predatory'.
Materials: A list of journal URLs or names; A training dataset from sources like the Directory of Open Access Journals (DOAJ) for legitimate journals and known blacklists for predatory journals [42].
Methodology:
- Data Collection: Automatically crawl and extract text content from the websites of the target journals.
- Feature Extraction: Convert the website text into quantifiable features. This includes:
  - Bag-of-Words & TF-IDF: Identify and weight the importance of specific words and phrases [42].
  - Diff Score Calculation: Enhance the model by calculating "diff scores" for words—a measure of the difference in their frequency between predatory and legitimate journal websites. Words with high diff scores are strong indicators [42].
  - Structural Features: Extract metadata such as the number of articles published, the number of author affiliations, the presence of a detailed peer-review policy, and the rate of self-citation by authors [41].
- Model Training & Prediction: Train a machine learning classifier (e.g., Random Forest, Naive Bayes) on the extracted features to distinguish between legitimate and questionable journals. The study by Acuña et al. used eight different classification algorithms for this purpose [41] [42].
- Validation: The output of the AI system should be validated by human experts to make the final determination on a journal's status [41].

Diagram: Workflow for Automated Journal Screening

Protocol 2: Detecting AI-Generated Text in Manuscript Submissions

This protocol addresses the challenge of identifying machine-generated text, a growing concern for research integrity [44].

Objective: To differentiate between human-written text (HWT) and machine-generated text (MGT) in submitted manuscripts.
Materials: A sample of text from a manuscript; An AI detection tool (e.g., those based on models like GPT-2); A dataset of confirmed HWT and MGT for training/validation.
Methodology:
- Text Preprocessing: Clean and prepare the text for analysis.
- Feature Analysis: The detector analyzes textual features. A key metric is perplexity, which measures how "predictable" or "surprising" the text is to a language model. MGT typically has lower perplexity as it is generated from a model's learned probabilities [44].
- Classification: The detection algorithm classifies the text as HWT or MGT based on the extracted features. Be aware that the performance of detectors can vary significantly. They are sensitive to the genre of text and the specific LLM used for generation. Their accuracy can drop when faced with obfuscation techniques like paraphrasing [44].
Troubleshooting Note: Be cautious of high false positive rates, particularly on texts written by non-native English speakers. While one study suggested this was an issue, a more recent and rigorous study found that non-native texts actually had higher perplexity, making them more distinguishable from MGT [44]. Always use detection results as one piece of evidence among many.

Performance Data & Research Reagents

The following table summarizes quantitative data on the performance of various AI tools used in research integrity and screening tasks, as reported in the search results.

Table: Performance Metrics of AI Tools for Research Integrity

Tool / System Name	Primary Function	Reported Performance / Key Metric	Source / Context
CU Boulder AI System	Identifies questionable journals	Flagged ~1,400 journals as potentially problematic from a list of ~15,200. After human review, over 1,000 were confirmed questionable [41].	Academic Study
Elicit	Data extraction for systematic reviews	Achieved 99.4% accuracy (1,502 correct out of 1,511 data points) in one systematic review [43].	Company Website / User Report
Proofig AI	Image duplication and manipulation detection	Adopted by major journals/publishers (e.g., American Association for Cancer Research, Science family of journals) for pre-publication checks [45].	News Article
AI Text Detectors	Differentiate HWT from MGT	Performance is highly variable. One detector (Copyleaks) showed relative resistance to adversarial attacks, but no tool is completely reliable. Accuracy can be genre-dependent [44].	Research Perspective

Table: Research Reagent Solutions for AI-Driven Screening

Tool / Resource	Type / Category	Primary Function in Screening
Directory of Open Access Journals (DOAJ)	Data Source / Whitelist	Provides a vetted list of legitimate open-access journals for training AI models and for manual verification [42].
Beall's List / Stop Predatory Journals	Data Source / Blacklist	Historical and community-maintained lists of predatory publishers and journals; used as a source of negative examples for model training [42].
Bag-of-Words & TF-IDF	Algorithm / Feature Extraction	Converts unstructured text from journal websites into a structured, quantifiable format for machine learning models [42].
Diff Score Calculation	Algorithm / Feature Enhancement	Identifies words and phrases that are statistically overrepresented in questionable journals, sharpening the model's predictive power [42].
Random Forest / Naive Bayes	Algorithm / Classifier	Machine learning models that use the extracted features to perform the final classification of a journal as legitimate or questionable [42].
Proofig AI	Software Tool	An AI-powered platform that checks for image duplication, manipulation, and reuse within scientific papers [45].
Scite	Software Tool	Uses AI to analyze how scientific papers are cited (e.g., as supporting or contrasting evidence), helping to assess the reliability of claims [46].

Signaling Pathway for Journal Screening

The following diagram illustrates the logical decision pathway an AI model might follow when analyzing a journal, based on the key features identified in the research [41] [42].

Diagram: AI Journal Screening Decision Logic

Single-case experimental designs (SCEDs) represent a family of research methods that use experimental procedures to study the effects of interventions on individual cases. Unlike group comparison research, SCEDs rely on repeated measurements over time with the individual case serving as its own control [47] [48]. The replication of intervention effects within and/or across cases provides the foundation for establishing causal inferences [47].

Questionable research practices (QRPs) have been identified as a significant contributor to the replication crisis across multiple scientific fields [7] [49]. While initially discussed primarily in the context of group comparison research with null-hypothesis statistical testing, QRPs present equally serious concerns for SCED research, though they may manifest differently due to the methodological distinctions of single-case methodology [7]. Researchers have identified the need to specifically examine how QRPs occur in SCEDs and to develop improved research practices as alternatives [7] [49].

Common QRPs in SCEDs: Identification and Examples

Table 1: Common Questionable Research Practices in Single-Case Experimental Designs

QRP Category	Description	SCED-Specific Manifestations
Selective Data Reporting	Omitting data that do not support hypotheses or show weak effects [7]	Excluding entire participants, specific dependent variables, or data points that demonstrate unstable responding or weak treatment effects [7] [49]
Graphical Manipulation	Altering visual representation of data to enhance apparent effects	Modifying scaling of x- or y-axes, omitting data points indicating instability, selectively combining dependent variables in graphs [7]
Procedural Omission	Failing to report potential confounds that could explain effects	Not documenting implementation fidelity, environmental variables, or concurrent treatments that might influence outcomes [7]
Selective Outcome Reporting	Emphasizing desirable results while downplaying undesirable findings in abstracts and discussions [7]	Writing discussions that highlight positive visual analysis while minimizing interpretations of unstable data or limited functional relations
Flexible Design Execution	Exploiting researcher degrees of freedom during study implementation	Making unplanned changes to phase change criteria, altering intervention intensity without documentation, changing measurement procedures mid-study

Evidence for QRPs in SCED Research

Multiple lines of evidence suggest QRPs occur in SCED research. Systematic reviews comparing published and unpublished SCED studies have found that published studies typically show larger effect sizes than unpublished studies [7] [49]. For example, Sham and Smith (2014) found published studies on pivotal response treatment had larger treatment effects than unpublished studies [7]. Similarly, Dowdy et al. (2020) reported larger effect sizes for published studies of response interruption and redirection compared to unpublished works [49].

Comparative analyses of dissertations and their corresponding journal articles provide further evidence. In one examination of 124 dissertation-article pairs, 12.4% of articles omitted one or more participants and/or dependent variables from the corresponding dissertation [49]. Published studies also showed a higher proportion of experimental effects to non-effects and larger effect sizes compared to dissertations [49].

Detection Methodologies for SCED QRPs

Analytical Framework for QRP Detection

Table 2: Methodological Approaches for Detecting QRPs in SCEDs

Detection Method	Implementation Protocol	Indicators of Potential QRPs
Publication Bias Analysis	Compare effect sizes between published and unpublished studies on similar topics using systematic review methods [7] [49]	Significant discrepancy between published and unpublished effect sizes; absence of small-effect studies in literature
Source Document Comparison	Compare dissertations/theses with resulting publications for completeness of reporting [49]	Omitted participants, conditions, or dependent variables; enhanced effect sizes in publications
Visual Analysis Verification	Apply standardized visual analysis criteria to published graphs; check for graphical integrity [7] [48]	Inconsistent scaling; missing data points; altered axis proportions; discrepancies between visual analysis statements and graphed data
Methodological Consistency Assessment	Compare reported methods with SCED quality standards and check for internal consistency [7] [48]	Unreported changes in procedures; insufficient data points per phase; lack of demonstrated experimental control
Replication Failure Analysis	Examine direct and systematic replication attempts for consistency of effects [7]	Inconsistent outcomes across similar participants, settings, or implementations

Experimental Protocol for Systematic QRP Detection

Aim: To develop a standardized protocol for identifying potential QRPs in published SCED research.

Materials: Sample of published SCED studies, corresponding unpublished documents (dissertations, conference presentations, registrations) when available, standardized coding manual, multiple trained coders.

Procedure:

Identification Phase: Conduct systematic literature search to identify all SCED studies on target topic, including published and unpublished works [49]
Document Retrieval: Obtain complete research documents, including supplemental materials and protocols when available
Blinded Coding: Train multiple coders to assess studies using standardized coding system with demonstrated reliability
Data Extraction: Systematically extract data on:
- Participant characteristics and numbers
- Dependent variables measured and reported
- Design structure and phase changes
- Graphical presentation features
- Statistical and visual analysis results
- Effect sizes and interpretations
Comparative Analysis: Conduct quantitative and qualitative comparisons across published and unpublished works, and within published works across time and journals
Sensitivity Analysis: Assess potential mechanisms for observed discrepancies beyond intentional QRPs

Validation: Establish inter-rater reliability among coders; validate findings through comparison with author self-reports when possible; triangulate across multiple detection methods.

Table 3: Research Reagent Solutions for QRP Detection and Prevention

Tool Category	Specific Tools/Resources	Function and Application
Reporting Standards	SCRIBE, CONSORT extensions for SCEDs [48]	Standardized reporting checklists to improve transparency and completeness
Design Quality Assessment	What Works Clearinghouse Standards, RoBiNT Scale [48]	Quality appraisal tools to evaluate methodological rigor of SCEDs
Data Analysis Tools	Visual analysis protocols, effect size calculators, statistical packages for SCEDs [47] [50]	Complementary analysis methods to reduce reliance on subjective interpretation
Transparency Enhancements	Open Science Framework repositories, preregistration templates [7]	Platforms for sharing protocols, data, and materials to enable verification
Methodological Guides	Design standards texts, methodological tutorials [47] [48]	Resources for proper design implementation and reporting

Troubleshooting Guide: FAQs on QRP Detection

Q1: How can I distinguish between intentional selective reporting and practical space limitations in publications?

A1: Intentional selective reporting typically follows a systematic pattern where omitted data consistently show weaker effects. Practical space limitations should result in random or justified omissions. Check whether authors mention complete data availability elsewhere or provide justification for omissions. Corresponding dissertations often provide the clearest comparison [49].

Q2: What are the most sensitive indicators of graphical manipulation in SCED displays?

A2: Key indicators include: (1) inconsistent axis scaling across similar graphs in the same paper, (2) y-axes that do not begin at zero without clear justification, (3) missing data points that aren't explicitly acknowledged, and (4) disproportionate spacing of time units that may visually exaggerate effects [7].

Q3: How can researchers balance flexibility in SCED implementation with prevention of QRPs?

A3: Pre-registration of design decisions and phase change criteria provides the optimal balance. Document all a priori decisions about conditions for phase changes, number of data points, and handling of unstable data. Any deviations should be explicitly documented with justifications [7] [48].

Q4: What methodological features provide the strongest protection against unintentional QRPs in SCEDs?

A4: Multiple baseline designs across participants, settings, or behaviors provide inherent protection through staggered intervention implementation. Randomization of phase start points, blind data collectors, and interobserver agreement assessment further strengthen validity [47] [48].

Q5: How can the field distinguish between researcher degrees of freedom and legitimate adaptive design changes?

A5: Legitimate adaptations typically: (1) respond to ethical concerns, (2) address unexpected participant needs, (3) are thoroughly documented with rationale, and (4) maintain the experimental integrity of the design. Changes made solely to enhance effects without these characteristics may represent QRPs [7].

Conceptual Framework and Visual Representations

Improved Research Practices and Future Directions

The identification of QRPs in SCEDs has prompted the development of improved research practices as alternatives. Recent initiatives have identified 64 pairs of questionable and improved research practices in SCED across different stages of the research process [7] [49]. These improved practices emphasize:

Transparency: Comprehensive reporting of all methodological details, including participant characteristics, design decisions, and data collection procedures
Preregistration: Documenting hypotheses, design plans, and analysis strategies prior to data collection
Data Sharing: Making complete datasets available to enable verification and secondary analysis
Methodological Rigor: Adherence to established quality standards for SCED implementation [48]
Complementary Analysis: Combining visual analysis with statistical methods to provide multiple perspectives on effects [47] [50]

Future directions include developing standardized QRP detection protocols, creating automated tools for identifying graphical inconsistencies, establishing certification processes for SCED methodology, and enhancing educational curricula to emphasize improved research practices. The integration of causal mediation analysis methods represents another promising avenue for understanding how interventions produce change in SCEDs [50].

Frequently Asked Questions (FAQs)

FAQ 1: What is the core difference between z-curve and the criticized "post-hoc power analysis"? This is a fundamental conceptual distinction. Z-curve does not estimate effect sizes or produce power estimates for individual studies, which is the flaw in "observed" power methods. Instead, z-curve estimates the population mean power of a set of studies that have been selected for statistical significance. It uses the distribution of p-values to estimate the mean probability that these significant studies would successfully reject the null hypothesis again if replicated, without ever calculating an effect size for a single study [51].

FAQ 2: My dataset contains a mix of t-tests, F-tests, and correlations. Can I use z-curve? Yes, but you must be aware of the approximation involved. Z-curve transforms all test statistics into two-sided p-values and then into equivalent z-scores, treating every test as if it were a two-sided z-test. This works well when per-study sample sizes are sufficiently large (typically N > 30), as the t- and F-distributions then approximate the normal distribution. However, if your dataset has many studies with small sample sizes (N < 20-30) and a small total number of studies (k < 20-30), this transformation can bias power estimates, typically leading to underestimation when true power is high [52].

FAQ 3: The reviewer said "power is not a property of a completed study." Is z-curve based on a misunderstanding? No, this criticism stems from a misconception of the z-curve model. In the z-curve framework, every study in a population of studies has a true power, defined by its design, procedure, and subject population. This power is a frequentist probability—the long-term relative frequency of rejection based on hypothetical repeated sampling. Z-curve aims to estimate the mean of these true power values specifically for the sub-population of studies that obtained statistically significant results, which roughly corresponds to the mean power of published results in a field [51].

FAQ 4: What is the minimum number of studies needed for a reliable z-curve analysis? While there is no absolute minimum, simulation results indicate that the approximation used in the p-value to z-score transformation becomes more reliable as the number of studies increases. Datasets with a small number of studies (approximately ≤ 30) combined with small per-study sample sizes are particularly problematic and may produce biased estimates. For more robust results, aim for larger sets of studies (k ≥ 100), which help smooth out the effect of the approximation [52].

Troubleshooting Common Experimental Issues

Problem: Low Estimated Mean Power and Suspected Bias

Symptoms: Z-curve analysis returns a low estimate of mean power (e.g., below 0.5), suggesting the literature may be contaminated by questionable research practices or publication bias.
Diagnosis Steps:
- Check the distribution of your input p-values. A cluster of p-values just barely crossing the significance threshold (e.g., p-values between 0.04 and 0.05) is a classic indicator of p-hacking or selection bias.
- Verify the file-drawer effect is considered. Z-curve explicitly models the selection that occurs when non-significant results are "filed away" and not published [51].
Solution:
- Interpret the result as evidence that the significant findings in your literature set are, on average, not very reliable.
- Report the "expected replication rate" alongside the mean power estimate as a more intuitive metric for your audience.
- Consider a sensitivity analysis by removing a few studies with p-values very close to 0.05 to see how stable the estimate is.

Problem: Bias from Small Sample Sizes and Statistical Transformation

Symptoms: You are analyzing a field where studies typically have small sample sizes (e.g., N < 20 per cell/group). Your z-curve estimate may be artificially low.
Diagnosis Steps:
- Calculate the average sample size of the studies in your dataset.
- If the average is low (< 30) and the number of studies is also small, the transformation of test statistics into z-scores is likely introducing bias [52].
Solution:
- For a few studies with similar, small N: Consider using an alternative method like t-curve, which fits a mixture model using non-central t-distributions instead of normal distributions [52].
- For a diverse set of studies: A direct modeling approach for effect sizes that accounts for selection (e.g., weightr package) might be more appropriate [52].
- As a sensitivity check: Compare your z-curve results with those from other bias-correction methods.

Problem: Inconsistent Test Statistics and Incompatible Input Data

Symptoms: You have a collection of test statistics (t, F, χ², r) and are unsure how to prepare them for analysis.
Diagnosis Steps:
- Ensure all p-values are calculated as two-sided. Z-curve ignores the sign of effects, as it is designed for heterogeneous sets of studies where effects may be in different directions [52].
- Check that you are using the correct p-values from the original studies (not the one-sided versions).
Solution:
- Follow the standard transformation workflow: Convert all test statistics to two-sided p-values, then convert these p-values to z-scores using the inverse standard normal cumulative distribution function: Z = qnorm(1 - p/2) [52].
- The resulting z-scores will all be positive. The z-curve algorithm then models the distribution of these positive z-scores.

Experimental Protocols & Data Presentation

Z-Curve Analysis Methodology

Core Conceptual Model: Z-curve operates on a "coin-tossing" model. Designing a study is like manufacturing a biased coin with a probability of "heads" (success) equal to its true power. Running the study is tossing the coin. Studies that show "tails" (non-significant results) are discarded. Z-curve's goal is to estimate the average bias (mean power) of the coins that showed "heads" in the file-drawer-aware population of published, significant results [51].

Data Preparation Protocol:

Extraction: For each study, extract the exact test statistic (t, F, χ², correlation) and its degrees of freedom or sample size.
P-value Conversion: Convert all test statistics to two-sided p-values.
Z-score Transformation: Transform the two-sided p-values (p) into z-scores using the formula: Z = qnorm(1 - p/2). This uses the quantile function of the standard normal distribution.
Input Vector: Create a vector of these z-scores for input into the z-curve software. Only statistically significant results (typically p < .05, meaning Z > 1.96) should be included, as the model is for the selected population of significant findings.

Analysis Execution: The following table summarizes the core quantitative outputs from a z-curve analysis and their interpretation.

Table 1: Key Output Metrics from Z-Curve Analysis

Metric	Description	Interpretation
Mean Power (Expected Replication Rate)	The estimated average probability that the significant studies in the set would again produce a significant result in an exact replication [51].	The primary indicator of literature reliability. Lower values (< 0.5) indicate a high risk of false positives and unreliable findings.
File Drawer Ratio	An estimate of the ratio of non-significant studies filed away to the significant studies published.	A higher ratio suggests more severe publication bias.
The Z-Curve Plot	A density plot of the significant z-scores, overlayed with the modeled distribution.	Visualizes the distribution of evidence. A "lump" of z-scores just above 1.96 suggests p-hacking.

Workflow Visualization

The following diagram illustrates the sequential steps for performing a z-curve analysis, from data collection to interpretation.

The Scientist's Toolkit: Research Reagent Solutions

Table 2: Essential Software and Conceptual Tools for Credibility Assessment

Tool / Resource	Function	Relevance to QRPs Identification
Z-Curve R Package (`zcurve`)	Implements the z-curve algorithm for estimating mean power and other diagnostics from a sample of p-values [51].	The primary software for conducting the analysis described here.
Two-Sided P-Values	The standardized input for z-curve, derived from various test statistics [52].	Ensures consistency when analyzing heterogeneous literature. Ignoring sign allows for analysis across studies with effects in opposing directions.
File Drawer Model	The conceptual model that accounts for the selection process where non-significant results are not published [51].	Central to understanding and quantifying publication bias, a major category of QRPs.
Statistical Power (True Power)	A property of a study's design and population, defined as the probability of rejecting a false null hypothesis [51].	The core quantity z-curve seeks to estimate for the population of published studies. Distinguishing this from "observed power" is critical.
Inverse Normal CDF	The mathematical function (`qnorm` in R) that converts a p-value to its corresponding z-score [52].	The technical mechanism for creating a common scale (z-scores) from diverse statistical tests.

Frequently Asked Questions (FAQs)

Q1: What are Questionable Research Practices (QRPs) and why is detecting them during peer review critical? Questionable Research Practices (QRPs) are actions that undermine research integrity but often occupy an ethical gray area, falling short of outright fraud like fabrication or plagiarism [53] [4]. Common examples include data manipulation, selective reporting of outcomes, hypothesizing after results are known (HARKing), and problematic authorship [53] [4]. Detecting them during peer review is crucial because these practices, while sometimes seemingly minor in isolation, cumulatively skew the scientific literature, undermine its reliability, and can lead to the canonization of erroneous claims, wasted resources, and eroded public trust in science [53] [4].

Q2: What are the most common factors that lead researchers to engage in QRPs? The engagement in QRPs is influenced by a combination of individual and systemic factors [4]. Key drivers include:

"Publish or Perish" Culture: A strong systemic pressure to publish frequently and in high-impact journals incentivizes corner-cutting [53] [4].
Individual Factors: Studies show that a lower commitment to scientific norms, certain personality traits, and specific career stages (e.g., being a PhD candidate) can predict higher QRP engagement [4].
Organizational Culture: A non-collegial research environment and perceived pressure to publish within an institution are associated with QRP engagement [4].
Insufficient Training: A lack of formal education in research ethics, methodology, and data analysis leaves researchers ill-equipped to navigate ethical dilemmas [53] [4].

Q3: What is the reviewer's fundamental responsibility when checking for QRPs? The reviewer's primary duty is to act as a gatekeeper for scientific quality and integrity. This involves moving beyond simply assessing the novelty and apparent impact of a study to critically evaluating the methodological rigor, data analysis, and interpretation to ensure the findings are valid, reliable, and honestly presented [54]. This service is essential for maintaining trust in the scientific literature [54].

Q4: Can you provide specific red flags for selective reporting (cherry-picking) in a manuscript? Yes, several red flags can indicate selective reporting:

Unexplanned Omissions: The Methods section describes collecting multiple outcome measures or conducting several analyses, but the Results section reports only a subset without justification.
Misalignment between Pre-Registration and Manuscript: If a study was pre-registered (e.g., on ClinicalTrials.gov), the reported outcomes and analyses in the manuscript differ significantly from the pre-registered plan.
Overemphasis on Secondary Analyses: The main, pre-specified primary outcomes are null or weak, but the discussion focuses heavily on a significant finding from a post-hoc or subgroup analysis.
Missing Standard Results: Absence of standard data summaries, such as not reporting all conditions in a multi-group experiment or omitting basic descriptive statistics for key variables.

Troubleshooting Guides: Common QRP Scenarios

Scenario 1: Suspected Data Manipulation or Fabrication

Problem: A reviewer suspects that data may have been manipulated or fabricated, but has no direct access to the original raw data to confirm.

Solution: A Step-by-Step Diagnostic Protocol

Internal Consistency Check: Scrutinize the manuscript for internal inconsistencies. Check if totals add up correctly, if statistics match the presented data (e.g., means and standard deviations are plausible), and if data points in figures align with values described in the text or tables.
Methodological Plausibility Assessment: Evaluate whether the methods described could realistically produce the data presented. Consider the sample size, equipment sensitivity, and procedural details. For example, an extremely low standard error from a very small sample size can be a red flag.
Image Analysis: Use tools like ImageTwin, Forensics, or even careful inspection in standard image viewers to check for duplications, splicing, or inappropriate manipulations in Western blots, micrographs, or other figures.
Statistical Integrity Check: Use statistical tests to screen for anomalies. While not conclusive, tests for digit preference (e.g., Benford's Law) or unnatural granularity in reported p-values can indicate issues.
Escalate to the Editor: If suspicions remain after these checks, do not accuse the authors directly. Write a confidential report to the handling editor detailing your specific concerns and the evidence. The editor can then initiate a formal process, which may include requesting the original raw data from the authors.

Scenario 2: Suspected p-Hacking or Selective Reporting

Problem: A manuscript reports a positive result, but the statistical approach or choice of reported results appears designed to find statistical significance (p-hacking).

Solution: A Methodological Interrogation Workflow

Check for Pre-Registration: First, determine if the study was pre-registered. If yes, perform a line-by-line comparison between the pre-registered analysis plan and the reported results. Any major deviation without a compelling, pre-stated reason is a significant red flag.
Interrogate Data Analysis Flexibility: If no pre-registration exists, assess the "researcher degrees of freedom." Ask:
- Were outlier tests defined a priori? How were outliers handled?
- Were covariates selected before or after looking at the data?
- Could the outcome variable have been defined in multiple ways?
- Were multiple statistical models tested with only the "best" one reported?
Request a Self-Audit: In your review, you can ask the authors to complete a transparency checklist. For example, request they describe all data exclusions, all manipulations, and all measures/conditions collected in the study. Their ability to provide a clear and comprehensive account is itself a test of robustness.
Evaluate Result Robustness: Suggest (or the editor can require) that the authors demonstrate the robustness of their key finding by showing it holds under different reasonable analytical choices (e.g., with and without covariates, using a different outlier rule).

Table: Diagnostic Checklist for Suspected p-Hacking

Checkpoint	Question for the Reviewer to Ask	Action if Anomaly is Detected
Pre-registration	Is an analysis plan publicly available and followed?	Note deviations in review; downgrade confidence in results.
Multiple Testing	Are many comparisons made without appropriate corrections (e.g., Bonferroni)?	Request statistical correction; interpret "significant" p-values with caution.
Outcome Switching	Are the analyzed outcomes different from those stated in the introduction or methods?	Query authors on the change; suspect selective reporting.
Data Peeking	Does the stopping rule for data collection seem arbitrary or undefined?	Question the sampling procedure; request a justification.

Scenario 3: Authorship Issues

Problem: The author list includes individuals who may not have contributed substantially (honorary/gift authorship) or omits individuals who should be included (ghost authorship).

Solution: An Attribution Verification Protocol

Apply ContributorShip Taxonomies: Use a formal taxonomy like CRediT (Contributor Roles Taxonomy) as a framework for evaluation. In your review, ask the corresponding author to specify each author's contribution according to such a standard (many journals now require this).
Assess Coherence: Check if the claimed contributions align logically with the author's listed affiliation and expertise. An individual listed as a key methodology contributor, for example, should have a track record in that method.
Review the Acknowledgments: Sometimes, individuals who performed crucial technical or writing work are relegated to the acknowledgments section, which may indicate ghost authorship.
Recommend Journal Policy: If the journal does not already require a contributions statement, recommend that the editor request one from the authors to clarify the basis for authorship.

Experimental Protocols for Systematic QRP Screening

This section provides a detailed methodology for integrating QRP checks into the standard peer review workflow.

Protocol 1: The QRP Screening Workflow for Reviewers

Objective: To systematically screen a manuscript for signs of Questionable Research Practices during the initial review phase.

Materials:

Manuscript under review
Access to pre-registration repositories (e.g., ClinicalTrials.gov, OSF)
Journal's author guidelines and reporting standards (e.g., CONSORT, ARRIVE)
QRP Screening Checklist (see Table below)

Procedure:

Pre-reading: Before a deep read, check for a pre-registration statement and ID. Note the author list and affiliations.
Alignment Check: Read the Introduction and Methods. Identify the stated hypotheses, primary outcomes, and analysis plan. Cross-reference these with the pre-registration document (if available) or note if key methodological details are absent.
Results-First Analysis: Read the Results section independently. Note which outcomes and analyses are presented. Check for alignment with the methods and for any unexplained omissions or additions.
Consistency Verification: Compare figures, tables, and text for numerical consistency. Check if error bars and statistical tests are appropriate for the data presented.
Checklist Completion: Complete the QRP Screening Checklist below to document potential issues.

Table: QRP Screening Checklist for Peer Review

Category	Item to Check	Status (Pass/Flag/NA)	Notes for Editor/Authors
Transparency	Pre-registration documented and adhered to?
	Data/Code availability statement provided?
Methods	Sample size justification provided?
	All described measures/conditions reported?
	Data exclusion criteria pre-defined and followed?
Results	All primary and secondary outcomes reported?
	Statistical tests appropriate and correctly applied?
	Figures and tables consistent with text?
Authorship	Author contributions seem appropriate and justified?
Overall	Any suspicion of data fabrication/falsification?

Protocol 2: A Pre-Submission Self-Audit for Authors to Prevent QRPs

Objective: To provide authors with a concrete procedure for auditing their own manuscript prior to submission, reducing the likelihood of unintentional QRPs and strengthening the manuscript's integrity.

Materials:

Complete draft of the manuscript
All raw data, analysis code, and output
Pre-registration document (if applicable)
Laboratory notebook or project documentation

Procedure:

Data and Code Audit: Re-run all primary analyses from the raw data using the final analysis code. Confirm that the results in the manuscript match the output of this process exactly.
Manuscript Alignment Audit: Create a three-column table. In the first column, list all hypotheses and analysis plans from the introduction and methods section (or pre-registration). In the second column, list every analysis and result presented in the results section. In the third column, note any discrepancies and provide a brief justification for any change from the plan.
Authorship Contribution Audit: List every author. Next to each name, write their specific contribution using a standard taxonomy like CRediT. Verify that everyone listed meets the journal's authorship criteria and that no one who meets the criteria has been omitted.
Transparency Audit: Ensure the manuscript includes clear statements on data and code availability, funding sources, and conflicts of interest. Confirm that all necessary details for replication are present.

Workflow Visualization: Integrating QRP Checks into Peer Review

The following diagram illustrates a practical workflow for integrating QRP checks into the standard peer review process, providing a visual guide for both editors and reviewers.

This table details key resources and tools that researchers and reviewers can use to identify and prevent Questionable Research Practices.

Table: Research Reagent Solutions for QRP Identification and Prevention

Tool / Resource Name	Type	Primary Function in QRP Context	Relevance to Protocol
Pre-Registration Platforms (e.g., OSF, AsPredicted, ClinicalTrials.gov)	Protocol Repository	Provides a time-stamped, public record of hypotheses, methods, and analysis plans before data collection, combating HARKing and selective reporting.	Critical for Protocol 1 (Alignment Check) and Protocol 2 (Self-Audit).
Statistical Screening Tools (e.g., statcheck for p-values, GRIM Test)	Software Tool	Automatically scans manuscripts for inconsistencies in reported statistics (e.g., p-values that don't match test statistics) or possible data-level impossibilities.	Aids in Protocol 1, Results-First Analysis and Consistency Verification.
Image Forensics Software (e.g., ImageTwin, Forensics)	Software Tool	Detects image duplications, manipulations, and splicing in figures, helping to identify data fabrication/falsification.	Used in Scenario 1 (Data Manipulation), Step 3.
Reporting Guidelines (e.g., CONSORT, ARRIVE, PRISMA)	Reporting Standard	Provides a checklist of essential information that must be included in a manuscript to ensure methodological transparency and completeness.	Forms the basis for the QRP Screening Checklist in Protocol 1.
Contributorship Taxonomies (e.g., CRediT)	Classification System	Provides a standardized list of 14 roles to describe each author's specific contribution, clarifying authorship and reducing gift/ghost authorship.	Central to Scenario 3 (Authorship Issues) and Protocol 2 (Self-Audit).
Data & Code Repositories (e.g., Zenodo, Dryad, GitHub)	Data Repository	Enables public sharing of raw data and analysis code, allowing for independent verification of results and enhancing transparency.	Checked in Protocol 1, Transparency category of the screening checklist.

Solutions and Safeguards: Strategies for Preventing QRPs and Optimizing Research Quality

Understanding Questionable Research Practices (QRPs)

Questionable Research Practices (QRPs) are a range of actions that undermine research integrity without always constituting outright fraud. They often thrive in low-transparency environments and contribute to the reproducibility crisis [55]. The table below summarizes common QRPs, their descriptions, and prevalence based on researcher self-reports.

Table 1: Common Questionable Research Practices (QRPs) and Their Prevalence

QRP Category	Specific Practice	Description	Reported Prevalence
Data Collection & Analysis	P-hacking [56] [55]	Manipulating data analysis until a statistically significant result (p < 0.05) is achieved.	Over 30% of psychologists admitted to QRPs [56].
	Selective Reporting [53]	Reporting only favorable results while omitting negative or inconclusive findings.	Nearly 20% of researchers admitted to modifying data for presentation [53].
Hypothesizing & Reporting	HARKing [56] [55]	Hypothesizing After the Results are Known; presenting a post-hoc hypothesis as if it were a priori.	A survey in applied linguistics found 94% engaged in one or more QRPs [55].
	Overhyping Results [18]	Exaggerating the significance or impact of research findings in written reports.	Prevalent in health disciplines due to cultural pressures [18].
Publication & Authorship	Salami Slicing [18]	Unjustifiably splitting results from a single study into multiple papers to increase publication count.	Often incentivized and rewarded in academic systems [18].
	Ghost or Honorary Authorship [53]	Denying authorship to contributors (ghost) or granting authorship to non-contributors (honorary).	A common form of QRP across disciplines [53].

The Open Science Workflow: An Antidote to QRPs

Open Science practices introduce transparency and rigor at key stages of the research lifecycle, directly countering QRPs. The workflow below illustrates how these practices integrate into a robust research process.

Open Science Troubleshooting Guide & FAQs

This section addresses common challenges researchers face when implementing Open Science practices.

Preregistration

Q1: My preregistration has a serious error, or my plan changed after I started. What should I do?

If you have not started data collection: Create a new, corrected preregistration. Withdraw the original one and include a note with the URL of the new registration [57].
If you have already begun the study: Create a "Transparent Changes" document. Upload this to your project repository and explicitly refer to it when writing up your results to maintain transparency about any deviations [57].

Q2: Does preregistration mean I cannot do any unplanned, exploratory analyses? No. A core purpose of preregistration is to distinguish between confirmatory and exploratory analyses, not to eliminate exploration [57]. Exploratory analysis is crucial for discovery and hypothesis generation. The key is to clearly label which analyses were planned (confirmatory) and which were unplanned (exploratory) in your final report, so readers can interpret the evidence appropriately [57].

Q3: Can I preregister a study if I am using a pre-existing dataset? Yes, but with strict conditions to preserve the confirmatory nature of the analysis. You must justify how you will avoid bias from prior knowledge of the data [57]. The levels of eligibility are:

Prior to analysis: Data exists and has been accessed, but no analysis related to the research plan has been conducted [57].
Prior to access: Data exists, but neither you nor your collaborators have accessed it [57].
Prior to observation: Data exists but has not been quantified or observed by anyone [57].

Q4: I have concerns about sharing data due to participant confidentiality or commercial sensitivity. How can I still be transparent?

For confidential data: Create a "de-identified" or "anonymized" version of the dataset for sharing. Share detailed data analysis scripts and code, which provides transparency into your methods. Provide a comprehensive codebook describing the variables and methodology [58].
For commercially sensitive data: Consider sharing synthetic data that mimics the statistical properties of your real data. Use embargo periods to delay public release. Apply controlled-access protocols where users must request permission.

Q5: My code is messy and I'm embarrassed to share it. What are my options?

Use a "linter" to automatically clean and standardize your code format.
Add comments throughout your script to explain each step.
Create a README file that guides a user through running the code and explains the file structure.
Use version control (e.g., Git) to demonstrate your workflow. Remember, usable code is more valuable than perfect code.

Cultural and Systemic Challenges

Q6: How can I practice Open Science when my lab, supervisor, or field does not support it?

Start small: Preregister one study or share the code for a single analysis.
Lead by example: Demonstrate the benefits, such as how preregistration streamlines the writing process.
Find a community: Join Open Science communities (e.g., the Center for Open Science Ambassador program [58]) or online groups for support and resources.
Frame it strategically: Explain how these practices can prevent future criticism and increase the long-term credibility and impact of your work.

The Scientist's Toolkit: Essential Reagents for Open Science

This table lists key digital and methodological "reagents" needed to conduct transparent, reproducible research.

Table 2: Research Reagent Solutions for Open Science

Tool / Resource	Category	Primary Function	Key Antidote to QRPs
Open Science Framework (OSF) [58]	Registry Platform	A free, open-source project management repository to preregister studies, share data, code, and materials.	Centralizes all practices; counters opacity, selective reporting.
Preregistration Templates [57]	Methodology	Standardized forms (e.g., from OSF, AsPredicted) to guide the creation of a comprehensive research plan.	Prevents HARKing and P-hacking by locking in hypotheses and analysis plans.
R / Python with RMarkdown/ Jupyter	Computational Tool	Languages and literate programming environments that combine code, output, and narrative in a single document.	Makes data analysis fully reproducible, countering analytical flexibility.
Git / GitHub	Version Control	A system to track changes in code and manuscripts, facilitating collaboration and documenting the evolution of a project.	Creates an audit trail, counters post-hoc decisions and file management issues.
Figshare / Zenodo	Data Repository	General-purpose public data repositories for publishing and sharing research datasets with a persistent identifier (DOI).	Enables data sharing mandates, counters low scrutiny and prevents data loss.

The Preregistration Protocol: A Detailed Methodology

Preregistration is a powerful tool to counter HARKing and p-hacking. The diagram below details the key steps and decision points in creating a effective preregistration.

Experimental Protocol: Creating a Preregistration

Objective: To create a time-stamped, uneditable research plan that clearly distinguishes between confirmatory and exploratory research components [57] [59].

Materials:

Computer with internet access.
Account on a preregistration server (e.g., OSF.io [58]).
Completed research protocol (hypotheses, methods, analysis plan).

Methodology:

Preparatory Phase:
- Finalize Research Question and Hypotheses: Clearly state your primary research question and all primary and secondary hypotheses. These must be defined before any data are collected or analyzed [57].
- Specify Methodology: Detail the study design, participant eligibility criteria, sampling plan, and procedures for data collection. Justify the sample size with a power analysis [59].
- Define Variables and Measures: List all variables (independent, dependent, covariates) and describe exactly how they will be measured or manipulated.
- Outline Analysis Plan: Pre-specify the exact statistical models and tests you will use to test each hypothesis. Define criteria for data inclusion/exclusion, any data transformations, and how you will handle missing data [57] [59].

Registration Phase:
- Choose a Registration Type: Decide between a Standard Preregistration (submitting your plan to a registry) or a Registered Report (undergoing peer review of your introduction and methods before data collection, leading to in-principle acceptance regardless of the results) [56].
- Submit to a Registry: Use a platform like the OSF to complete and submit your preregistration form. This creates a permanent, time-stamped record [58] [57].
Post-Registration Phase:
- Conduct the Study: Adhere to the preregistered protocol during data collection and analysis.
- Report Transparently: In the final manuscript, clearly report all preregistered analyses. Any deviations from the plan or unplanned exploratory analyses must be explicitly identified as such. This allows readers to accurately assess the credibility of the findings [57].

Frequently Asked Questions

What are Questionable Research Practices (QRPs) in experimental design? Questionable Research Practices (QRPs) are procedures that compromise the replicability, validity, and integrity of research conclusions. Unlike outright fraud, QRPs often occupy a gray area but significantly threaten the reliability of scientific findings. Examples include selective reporting of results, hypothesizing after the results are known (HARKing), and p-hacking [7] [60]. In single-case experimental designs (SCED), this might manifest as altering graph scales to make visual effects appear more robust [7].

Why is it critical to replace QRPs with improved practices? QRPs can lead to a literature filled with non-replicable findings, wasted resources, and potentially countertherapeutic (iatrogenic) effects if clinical treatments are based on unreliable evidence [7]. Adopting improved practices strengthens the scientific process, enhances the credibility of your work, and ensures that conclusions are valid and dependable.

What are some common motivations behind researchers using QRPs? Researcher behavior is influenced by multiple contingencies. While one motivation is to contribute to a valid and robust research literature, another powerful motivator is the pressure to publish, which confers career advancement, prestige, and funding. When the latter dominates, QRPs become more likely [7].

Troubleshooting Guide: Common QRPs and Improved Alternatives

The table below outlines specific QRPs paired with actionable, improved research practices you can implement immediately.

Table 1: Questionable and Improved Research Practices in Data Collection and Analysis

Issue Area	Questionable Research Practice (QRP)	Improved Research Practice	Key Benefit
Data Reporting	Selective Data Reporting: Omitting negative or statistically insignificant results [7].	Pre-register analysis plans and commit to reporting all collected data, regardless of outcome [7].	Reduces file drawer effect, ensures a complete picture of the research.
Data Analysis	p-hacking: Collecting more data after seeing if results are significant or selectively conducting analyses to produce positive results [7].	Establish a fixed data collection stopping rule and pre-specify all primary data analyses before examining the data [7].	Prevents artificial inflation of effect sizes and false positives.
Hypothesizing	HARKing (Hypothesizing After the Results are Known): Presenting unexpected findings as if they were predicted all along [7].	Clearly distinguish between confirmatory (hypothesis-driven) and exploratory (data-driven) analyses in the research report [7].	Maintains intellectual honesty and helps others correctly interpret findings.
Graphical Presentation (SCED)	Manipulating graph axes or omitting data points to enhance the appearance of a visual effect [7].	Adhere to standard graphing conventions, maintain consistent axis scales, and include all data points in visual analyses [7].	Ensures visual analysis is objective and not misleading.
Procedural Reporting	Omitting possible confounds from the methodology section that could explain the results [7].	Provide a complete and transparent account of the procedure, including any potential limitations or confounding variables [7].	Allows for accurate replication and critical evaluation of the study's validity.

Table 2: Questionable and Improved Research Practices in Study Design and Publication

Issue Area	Questionable Research Practice (QRP)	Improved Research Practice	Key Benefit
Participant Recruitment	Selectively recruiting participants into a treatment condition who are more likely to show positive effects [7].	Use random assignment to conditions and report the recruitment process and participant characteristics in full [7].	Minimizes selection bias and increases the generalizability of findings.
Outcome Reporting	Writing abstracts or discussions to downplay undesirable results and overemphasize desired findings [7].	Present results in a balanced manner, giving appropriate weight to all primary outcomes, including those that are null or contrary to the hypothesis [7].	Provides a truthful summary of the research and prevents misleading readers.
Survey Design & Measurement	Using poorly constructed surveys with leading questions, double-barreled items, or unclear measurement scales [61] [62].	Follow systematic questionnaire development: define the purpose, pilot-test items, ensure clarity, and establish reliability and validity [62] [63].	Reduces measurement error and increases the accuracy of collected data.
Coverage & Sampling	Using an unrepresentative sample or an inaccurate list (frame) to draw respondents, limiting generalizability [61] [62].	Carefully define the target population and use a large, random sampling method from a current and accurate list to minimize frame and selection error [62].	Improves external validity, meaning results are more likely to apply to the broader population of interest.

Experimental Protocol: Pre-Registering Your Study

Pre-registration is a foundational improved practice that involves publicly documenting your research plan before you begin collecting or analyzing data. This combats QRPs like HARKing and p-hacking.

Detailed Methodology:

Define Research Questions & Hypotheses: Precisely state your primary research questions and specific, testable hypotheses [62].
Specify Study Design: Describe the experimental design (e.g., RCT, single-case). For single-case designs, detail the phases and planned transition points [7].
Outline Participants & Sampling: Define eligibility criteria, sample size, and the recruitment method. Justify how the sample size will provide adequate power [62].
Detail Measures & Materials: List all dependent and independent variables. For questionnaires, specify the full instrument and not just a sub-set of items [63].
Plan Data Analysis: Describe the exact statistical or visual analyses you will use to test your primary hypotheses. This includes specifying any covariates and the criteria for significance [7].
Submit to a Registry: Upload this plan to a public, time-stamped registry such as the Open Science Framework (OSF) or ClinicalTrials.gov.

The Scientist's Toolkit: Essential Research Reagent Solutions

Table 3: Key Resources for Implementing Improved Research Practices

Item or Resource	Function
Pre-registration Platforms (e.g., OSF, AsPredicted)	Provides a time-stamped, public record of your research plan to distinguish confirmatory from exploratory work [7].
Reporting Guidelines (e.g., CONSORT, COREQ)	Checklists to ensure complete and transparent reporting of study details, which is crucial for replication and evaluation [60].
Open Data Repositories (e.g., OSF, Zenodo)	Platforms to share de-identified research data, enabling verification of results and secondary analysis.
Statistical Software with Scripting (e.g., R, Python)	Using scripted analyses ensures that all analytical steps are documented and reproducible, as opposed to opaque point-and-click methods.
Digital Lab Notebooks	Securely records procedures, observations, and data in a time-stamped manner, improving the transparency and traceability of the research process.

Workflow Diagram: From Research Question to Publication

The following diagram outlines a robust workflow that integrates improved research practices to mitigate QRPs at every stage.

FAQs: Core Concepts

FAQ 1: What are the primary goals of combining blinded data analysis with pre-registration?

The primary goals are to mitigate confirmation bias and restrict analytical flexibility, which are key drivers of Questionable Research Practices (QRPs) [64] [2]. Pre-registration involves specifying your research questions, hypotheses, methodology, and analysis plan before you observe the research data [65]. This creates a clear, time-stamped record that separates confirmatory (hypothesis-testing) from exploratory (hypothesis-generating) research [66]. Blinded data analysis takes this a step further by ensuring that the researcher conducting the analyses is unaware of which experimental group the data belongs to or the outcomes of initial analyses [67]. Together, these practices reduce the opportunity for p-hacking (exploiting analytical flexibility to obtain significant results) and HARKing (presenting unexpected results as if they were predicted) [64] [2].

FAQ 2: How do confirmation bias and analytical flexibility specifically threaten research integrity?

Confirmation bias is the tendency to seek or interpret evidence in ways that support existing beliefs or expectations [64]. In data analysis, this can manifest as a researcher unconsciously favoring analytical choices that lead to "publishable," statistically significant results. Analytical flexibility, or "researcher degrees of freedom," refers to the many legitimate choices researchers make during data analysis, such as how to handle outliers or which covariates to include [66]. When these choices are made after seeing the data, they can be exploited opportunistically—a practice known as p-hacking [2] [66]. This combination inflates false-positive rates, leads to overestimated effect sizes, and contributes to the replication crisis, ultimately distorting the evidence base [64] [2] [33].

FAQ 3: Aren't these practices only relevant for hypothesis-driven (confirmatory) research?

While pre-registration and blinding are most stringent for confirmatory research, they also provide a robust framework for transparent exploratory research [64] [65]. For non-hypothesis-driven work, you can pre-register your research questions and detailed methodology without stating specific hypotheses. This still provides a record of your planned approach before data collection, enhancing transparency. Furthermore, pre-registration does not forbid unplanned, exploratory analyses; it simply requires researchers to clearly label them as post hoc, preventing them from being presented as confirmatory findings [64] [66].

Troubleshooting Guides

Issue 1: Prior knowledge of the dataset is making true blinding difficult.

This is a common challenge, especially in secondary data analysis where a researcher may have worked with the same dataset before [64].

Solution A: Implement a "Blinded Analyst" Protocol. Separate the roles within your research team. Have one team member who is familiar with the data handle the data cleaning and preparation according to a pre-registered plan, and then pass the anonymized dataset to a second analyst who is completely blind to the group assignments and study hypotheses. This blinded analyst then executes the pre-registered analyses [67].
Solution B: Utilize Data Simulation and Masking. Before receiving the real data, the analysis team can develop and test their code on simulated data that mirrors the expected structure of the real dataset but contains no real effects. Once the code is finalized, it can be run on the real data by a separate team member.

Issue 2: Designing a pre-registration that is both flexible and sufficiently specific.

A common pitfall is writing a pre-registration that is too vague, leaving too many researcher degrees of freedom open [66].

Solution: Use a Structured Pre-registration Template. Opt for a structured template like the OSF Preregistration, which guides you through specifying key elements [66]. The table below summarizes the quantitative evidence supporting the use of structured templates.

Table 1: Effectiveness of Structured vs. Unstructured Preregistration Formats

Preregistration Format	Key Characteristic	Completeness in Restricting Researcher Degrees of Freedom	Key Finding
Structured (e.g., OSF Preregistration)	Detailed instructions with multiple specific questions [66].	Higher	Restricts opportunistic use of researcher degrees of freedom significantly better than unstructured formats [66].
Unstructured (e.g., Standard Pre-Data Collection)	Minimal guidance, maximum flexibility for researchers [66].	Lower	Provides less restriction on researcher degrees of freedom, leading to more potential for analytical flexibility [66].

Guidance for Specificity:
- Hypotheses: State them as specific, concise, and testable predictions [64].
- Variables: For each measured variable, specify the exact instrument and how composite scores will be calculated [66].
- Data Exclusion: Pre-define objective criteria for excluding data points (e.g., based on pre-specified quality thresholds) [2].
- Analysis Plan: Name the specific statistical tests and software you will use. Pre-specify how you will handle missing data and which covariates will be included in models [66].

Issue 3: The data violate the assumptions of my pre-registered analysis.

A pre-registered analysis plan might not be appropriate for the collected data, for example, if the data are non-normal when a parametric test was planned [64].

Solution: Follow a Pre-registered Contingency Plan. The best practice is to anticipate potential issues in your pre-registration. You can include a "conditional analysis plan" that states, for example, "If the data violate the assumption of normality, we will use [alternative non-parametric test] instead." If an unforeseen issue arises, transparently report any deviation from the pre-registered plan, justify the change, and if possible, demonstrate that your conclusions are robust by also presenting the results of the pre-registered analysis [64].

Experimental Protocols

Protocol 1: Implementing a Blinded Analysis Workflow for a Clinical Trial

This protocol details the steps for maintaining blinding during data analysis in a clinical trial setting, which is critical for minimizing observer bias [67].

Pre-Registration: Before database lock, finalize and publicly post the statistical analysis plan (SAP) on a registry like ClinicalTrials.gov or the OSF. This plan must detail all primary and secondary outcomes, statistical models, and handling of missing data [65].
Data Preparation and Freezing: A data manager, who is independent of the analysis team, cleans and freezes the final dataset. This dataset is then stripped of all direct identifiers of treatment group assignment.
Code Finalization on Dummy Data: The statistician or data analyst develops and validates the analysis scripts using a dummy dataset that has the same structure as the real data but with randomly assigned group labels.
Execution on Blinded Data: The finalized scripts are executed on the prepared, blinded dataset by the analyst. The output (tables, figures) is generated without knowledge of which group is which.
Unblinding: According to a pre-specified plan, the blind is broken, and the group labels are applied to the analysis outputs to create the final reports and manuscripts.

The following diagram illustrates this workflow and its role in mitigating specific QRPs.

Diagram 1: Blinded analysis workflow and QRP mitigation.

Protocol 2: Pre-registration for Secondary Data Analysis

Analyzing existing datasets presents unique challenges for pre-registration, as researchers may have prior knowledge of the data [64].

Acknowledge and Document Prior Knowledge: In the pre-registration, explicitly state your familiarity with the dataset and any previous analyses you have conducted. This promotes transparency [64].
Split the Dataset: If the dataset is large enough, split it into an exploratory (or "training") subset and a confirmatory (or "holdout") subset. Pre-register your analysis plan for the confirmatory subset only. The exploratory subset can be used for model development and hypothesis generation.
Focus on Specificity: Given the data already exists, you have no excuse for vagueness. Pre-register exactly which variables from the dataset will be used, how they will be coded, and the complete analysis code if possible.
Address Analytical Flexibility Directly: Pre-emptively close researcher degrees of freedom by specifying:
- The exact model specification.
- The precise method for handling missing data (e.g., multiple imputation with specific variables).
- Criteria for outlier inclusion/exclusion.

Table 2: Key Reagent Solutions for Rigorous Research

Tool / Resource	Category	Primary Function	Example / Link
Pre-registration Templates	Protocol Planning	Provides a structured workflow to create specific, exhaustive pre-registration plans.	OSF Preregistration, AsPredicted [65] [66]
Blinded Analysis Protocol	Methodology	A formal SOP for separating data preparation from analysis to prevent confirmation bias.	Internal lab standard operating procedure (SOP) [67]
Registered Reports	Publishing Format	A journal article format where the introduction, methods, and proposed analyses are peer-reviewed and accepted before data collection, guarding against publication bias [65].	Offered by over 200 journals (e.g., from Springer Nature, Elsevier, Taylor & Francis) [65]
Clinical Trial Registries	Registry	Mandatory platforms for registering clinical trial protocols to combat selective reporting and publication bias [65].	ClinicalTrials.gov, WHO International Clinical Trials Registry Platform [65]
Statistical Software & Packages	Data Analysis	To conduct pre-specified analyses and power calculations. Using scripted analyses (vs. point-and-click) ensures reproducibility.	R (with `pwr` package), Python, SAS, Stata [2]

Frequently Asked Questions: Troubleshooting Your Research Methodology

Q1: Our experimental results are inconsistent and hard to reproduce. Where should we focus our troubleshooting efforts?

A: Inconsistent results often stem from incomplete or variable experimental protocols. Your first step should be to verify that your protocol description includes all necessary and sufficient information for another researcher to replicate your work exactly. A detailed checklist is the most effective tool for this. Focus on the following key areas [68]:

Reagents and Materials: Specify the source, catalog number, and lot number for reagents. Note their storage conditions and expiration dates, as using expired reagents is a common source of failure [68] [69].
Equipment: Report the equipment model, manufacturer, and any specific calibration or maintenance procedures performed [68].
Step-by-Step Procedures: Eliminate ambiguities. For example, instead of "store at room temperature," specify the exact temperature (e.g., 22°C). Instead of "centrifuge briefly," state the exact speed, time, and temperature [68].
Controls: Ensure you have included appropriate positive and negative controls, and confirm they are performing as expected [69].

Q2: We suspect questionable research practices (QRPs) might be affecting our field. What are the most common QRPs related to methodology?

A: Questionable research practices can compromise the validity of scientific conclusions across various research designs. Being aware of them is the first step toward prevention. The table below summarizes some common QRPs identified in the literature [7]:

Questionable Research Practice	Description
Selective Data Reporting	Reporting only positive or statistically significant results while omitting negative or insignificant findings [7].
Selective Procedural Reporting	Omitting details about possible confounds or procedural deviations that could explain the outcomes [7].
p-hacking	Conducting multiple data analyses or selectively choosing data points to produce or enhance statistically significant outcomes [7].
HARKing	Hypothesizing After the Results are Known; formulating a hypothesis to fit the data after the study is complete [7].
Selective Outcome Reporting	Writing abstracts or discussions to downplay undesirable results and emphasize desired results [7].

Q3: A key assay in our drug discovery pipeline is producing unexpected results. What is a systematic approach to troubleshooting this?

A: Follow a disciplined, step-by-step troubleshooting process to efficiently identify the root cause [69] [70]:

Check Your Assumptions: First, confirm whether the unexpected result is a true scientific finding or an error. Review your hypothesis and experimental design for flaws [69].
Review Your Methods Meticulously: This is a crucial step. Check for errors or variability in equipment calibration, reagent purity and expiration, sample integrity, and control validity [69].
Compare with Literature: Compare your results with those from previous studies or databases to see if your findings can be validated or explained [69].
Test Alternative Hypotheses: Design new experiments to test other possible explanations for your results. Consider using different methods or techniques to measure the same outcome [69].
Document Everything: Keep a detailed record of every troubleshooting step, result, and change made. This is essential for tracking progress and communicating your process [69] [70].
Seek Help: Consult with colleagues, collaborators, or experts for fresh perspectives and specialized knowledge [69].

Q4: How can we proactively minimize the need for troubleshooting in our laboratory operations?

A: A proactive culture of prevention and continuous improvement is more effective than a reactive one. Key strategies include [70]:

Invest in Robust Quality Systems: Implement well-documented and validated test methods, standard operating procedures (SOPs), and extensive training programs.
Adopt Orthogonal Methods: Use different methodologies to measure the same value. This reduces reliance on a single test's interpretation and helps catch errors.
Train in Root Cause Analysis: Equip your team with skills in tools like the "Five Whys" and fishbone diagrams to resolve issues faster and create more robust preventive actions.
Standardize Processes: Inconsistencies in core processes hold back development. Focus on process excellence to improve the flow of content and data across R&D functions [71].

The Scientist's Toolkit: Essential Research Reagents & Materials

Item	Critical Function & Rationale
Validated Antibodies	Ensure specificity for the target protein. Unvalidated antibodies are a major source of irreproducible results; use resources like the Antibody Registry for unique identification [68].
Unique Device Identifiers (UDI)	For medical devices and key equipment, UDIs from databases like the Global Unique Device Identification Database (GUDID) enable accurate reporting and tracing [68].
Standardized Cell Lines	Use well-characterized and authenticated cell lines to prevent misidentification and contamination, which can invalidate years of research.
Pharmacological Standards	Certified reference standards with known purity and potency are non-negotiable for assay validation and ensuring consistent drug discovery efforts [72].

Methodology at a Glance: Key Data & Workflows

Table: Key Data Elements for Reproducible Experimental Protocols [68]

Protocol Section	Essential Data Elements to Report
Objectives & Background	Hypothesis, scientific background, and predefined outcomes.
Reagents & Materials	Source, catalog number, lot number, purity, concentration, storage conditions.
Equipment & Instruments	Manufacturer, model, software version, calibration status.
Step-by-Step Procedure	Unambiguous instructions with precise parameters (time, temperature, volumes).
Troubleshooting & Hints	Common problems and their solutions, notes on critical steps.
Data Analysis Plan	Pre-specified statistical methods and criteria for analysis.

Systematic Troubleshooting Workflow

This workflow provides a logical sequence for investigating unexpected experimental results, helping to ensure no stone is left unturned [69] [70].

QRPs vs. Robust Methodologies

This diagram contrasts the detrimental cycle of Questionable Research Practices with the reinforcing benefits of implementing Robust Methodologies, highlighting their opposing impacts on research outcomes [7].

Troubleshooting Guides

Guide 1: Diagnosing Systemic Pressure Points

Problem: High volume of low-quality publications is overwhelming the system.

Symptom: An overwhelming number of papers are being published, with one report noting a increase of 897,000 indexed articles between 2016 and 2022 [73]. Submissions are also rising sharply, with one major publisher reporting a 25% increase in submissions in early 2025 [73].
Root Cause Analysis: Researchers are incentivized to publish as much as possible, while publishers often make more profit when they publish more papers [74]. This creates a perverse cycle where quantity is prioritized over quality.
Solution: Incentives for researchers must be reformed to prioritize quality over quantity and meaning over metrics. Funders (universities, research councils, foundations) have the power to alter the incentives scientists face [74].

Problem: Misalignment between institutional values and academic rewards.

Symptom: Academic assessment and advancement systems often work against their own public missions. Instead of rewarding research that addresses real-world problems, they still rely heavily on publications in brand-name journals, citation counts, and grant dollars [75].
Root Cause Analysis: Community-engaged scholarship is typically dismissed as "service" and carries little weight in reviews. This model disadvantages faculty who prioritize societal impact, mentorship, and other values-based activities [75].
Solution: Modernize academic appointment and advancement to align with stated institutional values. The MA3 Challenge is one initiative seeking "bold, creative strategies to develop academic reward systems that foster a collaborative, responsive, and transparent research environment" [75].

Guide 2: Addressing Questionable Research Practices (QRPs)

Problem: Prevalence of Questionable Research Practices compromising research validity.

Symptom: A 2025 study identified indicators of questionable research practices in 163,129 randomized controlled trials [10].
Root Cause Analysis: Researcher behavior is shaped by multiple contingencies. When behavior is controlled primarily by contingencies related to publication (promotion, prestige, grant funding) rather than scientific contribution, QRPs can result [7]. Researchers have many "degrees of freedom" when conducting a study, and these decisions can be manipulated to produce publishable results [7].
Solution: Adopt improved research practices as alternatives to questionable practices. A 2025 study on single-case experimental designs yielded 64 pairs of questionable and improved research practices, highlighting the need to disseminate and adopt these better alternatives [7].

Frequently Asked Questions (FAQs)

Q1: What evidence exists that the current academic reward system is failing? A: Surveys indicate widespread recognition of the problem. A Cambridge University Press survey of over 3,000 researchers, publishers, funders, and librarians from 120 countries found that only 33% agreed that academic reward and recognition systems are working well. Furthermore, 64% of respondents believed the current system "fails to fully recognize contributions outside publishing articles in established journals" [73].

Q2: What are common types of Questionable Research Practices (QRPs)? A: Common QRPs have been identified across research fields. The table below summarizes several key QRPs and their descriptions [7]:

Questionable Research Practice	Description
Selective Data Reporting	Selectively reporting positive/statistically significant results while omitting negative/insignificant results.
p-hacking	Selectively conducting data analyses to produce and/or enhance positive/statistically significant outcomes.
HARKing	Formulating hypotheses after study outcomes are obtained to "fit" the data.
Selective Outcome Reporting	Writing a paper's abstract or discussion to selectively downplay undesirable results and/or emphasize desired results.

Q3: How is the publishing business model linked to these problems? A: The current system creates a conflict of interest. As one analysis notes, "Researchers are incentivised to publish as much as possible and publishers make more money if they publish more papers" [74]. This model diverts public research funds into shareholder profits, with one publisher maintaining a 37% profit margin [76]. The "author pays" open access model has also been co-opted, with authors paying between £2,000 and £10,000 per article, far exceeding the actual production cost [74].

Q4: What concrete reforms are being proposed or tested? A: Several major initiatives are underway:

The MA3 Challenge: A $1.5 million program funded by multiple foundations to fund proposals for changing hiring, promotion, and tenure at U.S. universities. It will offer grants at two levels ($50,000 or $250,000) over two years to institutions for implementing reforms [75] [77].
Publisher-Led Calls for Change: Major academic presses like Cambridge University Press are publicly calling for institutions to "weaken the link between academic reward and recognition and journal article output, and to adopt more holistic approaches to evaluating academic performance and contribution" [73].
Ethical Publishing: A growing movement advocates for shifting to ethical publishers, which are typically academic non-profit organizations that reinvest profits in academia, as opposed to exploitative publishers who lock publicly-funded research behind paywalls [76].

Q5: What are the key focus areas for modernizing academic rewards? A: Modernization efforts aim to elevate institutional commitment to several key areas [77]:

Open Science: Encouraging transparency, reproducibility, and equitable access to knowledge.
Public and Civic Science: Rewarding scholars who connect research to real-world challenges.
Team Science: Recognizing collaboration and shared credit in complex problem-solving.
Diverse Contributions to Knowledge: Valuing varied forms of scholarship and non-traditional research partnerships.

Experimental Protocols: Identifying and Mitigating QRPs

Protocol 1: Systematic Screening for QRPs in Literature

Objective: To identify and quantify indicators of questionable research practices within a body of literature, such as randomized controlled trials.

Methodology (as implemented in a study of 163,129 RCTs):

Data Acquisition and Processing: Use customized Python scripts (e.g., via RobotReviewer) for automated risk of bias characterization across a large corpus of trials [10].
Data Management and Analysis: Use R for data management and statistical analysis. All data and analysis code should be made publicly available, for example via GitHub, to ensure transparency and reproducibility [10].
Indicator Identification: Develop and validate specific markers for known QRPs. This involves programming the scripts to flag patterns consistent with practices like selective reporting, p-hacking, and HARKing [10].

Protocol 2: Eliciting Expert Consensus on QRPs

Objective: To identify a comprehensive set of questionable and improved research practices within a specific methodological domain (e.g., single-case experimental designs).

Methodology (as implemented in a 2025 study):

Participant Recruitment: Assemble a panel of expert researchers (e.g., 63 experts) with varying backgrounds and expertise in the target domain [7].
Structured Elicitation: Conduct a virtual microconference with focused group discussions. Solicit examples of questionable and improved research practices at all stages of the research process (design, data collection, analysis, reporting) [7].
Qualitative Analysis: Systematically analyze the collected data (e.g., over 2,000 participant notes) to identify shared perspectives and themes. This process results in a paired list of questionable and improved practices [7].

Visualizing the Reform Ecosystem

The following diagram illustrates the interconnected relationships between institutional pressures, researcher actions, and the resulting impacts on the research ecosystem.

Research Reagent Solutions: Key Reform Initiatives

The following table details major programs and frameworks that are essential "reagents" for conducting the experiment of institutional reform.

Initiative / Framework	Function	Key Features
MA3 Challenge [75] [77]	Catalyzes institutional change by funding and supporting bold reforms to academic hiring, promotion, and tenure.	- $1.5M in funding from major foundations.- Two funding tiers: $50K (dept.) and $250K (institution).- Focus on implementation, not just planning.- Includes a community of practice for awardees.
DORA (Declaration on Research Assessment) [73]	Provides a framework for improving how the quality of research output is evaluated.	- Aims to end the use of journal-based metrics.- Advocates for assessing research on its own merits.- Focuses on reform of research assessment.
Ethical Publishing Models [76] [74]	Offers alternative publishing pathways that keep resources within academia and promote open access.	- Non-profit, academic-owned publishers.- Low or reasonable Article Processing Charges (APCs).- Profits are reinvested into the academic community.
Open Research Practices [77]	Serves as a core value for realigning incentives towards transparency and collaboration.	- Encourages pre-registration, data sharing.- Rewards transparency and reproducibility.- Aims to make knowledge equitably accessible.

Frequently Asked Questions (FAQs) on Questionable Research Practices

Q1: What are the most common Questionable Research Practices (QRPs) I should avoid?

A1: The most common QRPs are practices that compromise research integrity, often to achieve statistically significant or desired results. The table below details these practices and their impact [7] [2].

QRP Name	Description	Primary Risk
Selective Data Reporting	Reporting only positive or statistically significant results while omitting negative or insignificant ones [7].	Distorts the literature, creates a "file drawer" effect, misleads meta-analyses [7] [2].
P-hacking	Conducting multiple analyses on a dataset to find a statistically significant result, often without a prior hypothesis [2].	Inflates false positive rates, misrepresents true effects [2].
HARKing (Hypothesizing After the Results are Known)	Presenting a post-hoc hypothesis (created after seeing the results) as if it was an a priori prediction [7].	Undermines the hypothesis-testing framework, makes findings non-falsifiable [7].
Selective Procedural Reporting	Omitting details of the procedure that could be confounds or could explain the results [7].	Prevents replication and masks flaws in experimental design [7].
Inadequate Record Keeping	Failing to maintain a detailed, step-by-step record of the research process and analytical decisions [2].	Makes the work irreproducible and is a gateway to other QRPs [2].

Q2: What ethical principles should guide my statistical practice?

A2: The American Statistical Association (ASA) outlines core principles for ethical statistical practice. Adhering to these is a powerful antidote to QRPs [78].

Professional Integrity and Accountability: Take responsibility for your work. Use valid and appropriate methodology, resist pressure to selectively interpret data, and disclose any conflicts of interest [78].
Integrity of Data and Methods: Be transparent about your data sources, their limitations, and the assumptions behind your methods. Strive to correct errors and share data for peer review where possible [78].
Responsibilities to Stakeholders: Ensure your work is suitable for the needs of your clients, funders, or employers, and do not use statistical practices to mislead them [78].
Responsibilities to Research Subjects: Protect the privacy, confidentiality, and welfare of all data subjects. Use only the data necessary and as permitted by consent [78].

Q3: How can I formulate a strong, researchable question?

A3: A well-framed question is the first defense against QRPs. For clinical or intervention-based research, the PICO framework is highly effective [79] [80].

P (Patient, Population, or Problem): Who or what is the focus of the study? (e.g., "Children with acute otitis media") [79].
I (Intervention): What is the treatment, exposure, or diagnostic test you are interested in? (e.g., "cefuroxime") [79].
C (Comparison or Control): What are you comparing the intervention to? (e.g., "amoxicillin" or "placebo") [79].
O (Outcome): What do you plan to measure, improve, or affect? (e.g., "reduction in symptom duration") [79].

Example Research Question: "In children with acute otitis media (P), is cefuroxime (I) more effective than amoxicillin (C) at reducing symptom duration (O)?" [79]

Furthermore, ensure your question meets the FINER criteria: Feasible, Interesting, Novel, Ethical, and Relevant [79].

Q4: What practical steps can I take to prevent QRPs in my workflow?

A4: Proactive planning and transparency are key. The following table outlines common problems and their solutions.

Research Stage	QRP Risk	Improved Research Practice & Solution
Study Design	Unplanned analyses leading to p-hacking; unclear hypotheses leading to HARKing.	Pre-registration: Publicly file a detailed research plan, including hypotheses, methods, and analysis strategy, before data collection begins on a platform like the Open Science Framework (OSF) [2].
Data Collection	Stopping data collection early once significance is reached.	A priori Power Analysis: Use tools (e.g., the `pwr` package in R) before the study to determine the necessary sample size and stick to it [2].
Data Analysis	Excluding data points without justification; running multiple tests.	Pre-specified Analysis Plan: Define all exclusion criteria and primary statistical tests in your pre-registration. Use blinded data analysis where feasible.
Reporting	Selective reporting of outcomes, conditions, or studies.	Full Transparency: Report all manipulated variables, all collected measures, and all conducted analyses, regardless of the outcome. Use guidelines like SAMPL for statistical reporting [7].

Troubleshooting Guides

Problem: I am under pressure to produce significant results and feel tempted to try different analyses.

Solution:

Return to Your Protocol: Re-read your pre-registration and pre-specified analysis plan. This is your anchor.
Document Everything: If an unplanned analysis is truly exploratory and necessary, clearly document it as such. Note that it was post-hoc and should be interpreted with caution, framing it as hypothesis-generating for future research.
Seek Peer Consultation: Discuss your dilemma with a trusted colleague or mentor. They can provide an objective perspective on the analytical choices.
Embrace Transparency: In your manuscript, be fully transparent. State which analyses were planned (a priori) and which were exploratory (post hoc). This honesty builds credibility [78] [2].

Problem: My experiment did not work, or the results were null/negative. I'm worried it's unpublishable.

Solution:

Re-evaluate the Question: A well-designed study that produces a null result is still scientifically valuable. It tells the scientific community that a tested effect may not be real or is smaller than expected.
Check for Quality: Ensure the study was well-powered and the methods were sound. A valid negative result is not a failure.
Consider Alternative Outlets: Seek journals that specifically publish null or negative results, or repositories for such data.
Publish a Replication or Methods Paper: If the study was a replication attempt, it is highly valuable. Alternatively, the methodology itself might be worth sharing. The ethical principle is to not let the "file drawer" bias the scientific record [78] [2].

Experimental Protocols & Data

Table: Prevalence and Impact of Common QRPs

This table summarizes quantitative data on the prevalence and perceived impact of QRPs, which highlights the importance of rigorous training [7] [34] [2].

Questionable Research Practice	Reported Prevalence	Key Impact on Literature
Selective Data Reporting	Estimated that one in two researchers has engaged in at least one QRP in the last three years [2].	Published studies of an intervention (e.g., Pivotal Response Treatment) show larger effects than unpublished studies, indicating a bias in the available evidence [7].
P-hacking	Some studies suggest prevalence may be smaller than feared; one study of 8,000 psychology articles found only a "small amount" of selective reporting bias [34].	Creates an inflation of false positives, leading to a literature that overstates true effect sizes [2].
HARKing	Common enough to be a major topic of discussion in the replication crisis; precise prevalence is difficult to ascertain [7].	Leads to a proliferation of seemingly supported but actually post-hoc hypotheses, making the theoretical landscape fragile [7].

The Scientist's Toolkit: Essential Reagents for Ethical Research

This table details key conceptual "reagents" and resources necessary for conducting rigorous and reproducible research.

Tool / Resource	Category	Function & Purpose
Pre-registration Platform (e.g., OSF, ClinicalTrials.gov)	Study Design	To pre-specify hypotheses, methods, and analysis plans, preventing p-hacking and HARKing [2].
*Statistical Power Software (e.g., GPower, `pwr` package in R)**	Study Design	To calculate the required sample size a priori, ensuring the study is feasible and has a high probability of detecting a true effect if it exists [2].
Citation Manager (e.g., Zotero, Mendeley)	Reporting	To organize references and ensure proper, accurate attribution of others' work, avoiding plagiarism [2].
Data & Code Repository (e.g., OSF, GitHub)	Dissemination	To share data, code, and materials, enabling replication and scrutiny, which is a core ethical responsibility [78].
Ethical Guidelines (e.g., ASA Ethical Guidelines)	Foundational	To provide a framework for professional integrity, accountability, and responsibilities to all stakeholders in the research process [78].

Workflow Diagrams

Research Integrity Workflow

QRP Identification and Mitigation

Evaluating Evidence and Impact: Assessing QRP Prevalence, Consequences, and Alternative Viewpoints

Technical Support Center: Diagnosing and Resolving Questionable Research Practices

This technical support center provides troubleshooting guides and FAQs to help researchers, scientists, and drug development professionals identify, understand, and address Questionable Research Practices (QRPs) in their work. QRPs are defined as "ways of producing, maintaining, sharing, analyzing, or interpreting data that are likely to produce misleading conclusions, typically in the interest of the researcher" [33].

Frequently Asked Questions (FAQs)

Q: What exactly are Questionable Research Practices (QRPs), and how do they differ from outright research misconduct?

A: QRPs occupy an ethical gray area between sound scientific conduct and outright scientific misconduct (fabrication, falsification, and plagiarism) [4]. They are often technically permissible but ethically and methodologically risky behaviors that can skew results and inflate significance [33]. While research misconduct like data fabrication is universally condemned and often punishable, QRPs offer considerable latitude for rationalization and may even be accepted in some disciplines, despite producing misleading or false research [81].

Q: Why do researchers engage in QRPs, and what are the main contributing factors?

A: Researchers may engage in QRPs due to a complex interplay of individual, institutional, and systemic pressures [4]. A key driver is the competitive "publish or perish" culture, which rewards volume of publication and statistically significant findings over scientific rigor [2] [4]. Contingencies related to career advancement, funding, and prestige can powerfully shape researcher behavior [49]. At an individual level, factors such as cognitive biases, competency shortfalls in methodology, and lower commitment to the normative ideals of science (e.g., organized skepticism, disinterestedness) have been identified as predictors [4].

Q: How can I detect potential QRPs in my own work or during peer review?

A: Detection can be challenging, but here are key indicators:

Check for HARKing: Hypothesizing After the Results are Known is a common QRP. Scrutinize whether the stated hypotheses were genuinely developed a priori [33] [4].
Analyze data collection and exclusion: Look for unjustified stopping rules for data collection or post-hoc deletion of data without pre-established, transparent criteria [18] [2].
Review analytical flexibility: Be wary of "p-hacking," where researchers run multiple analyses and selectively report those that yield significant results (p < 0.05) [18] [2] [33].
Look for cherry-picking: Check if all measured variables, conditions, and studies are reported, or if there is selective reporting of only those that are significant or consistent with predictions [2] [33].

Q: What are the concrete consequences of QRPs for scientific progress and public trust?

A: The consequences are severe and far-reaching:

Replication Crisis: QRPs are a key contributor to the replication crisis, where subsequent studies fail to reproduce original findings, wasting resources and impeding scientific progress [49].
Skewed Scientific Literature: QRPs lead to a literature filled with inflated effect sizes and false positives, canonizing erroneous claims and compromising science-informed policy and interventions [4].
Erosion of Trust: The proliferation of QRPs erodes public trust in science, health professionals, and research institutions, which is particularly damaging in fields like drug development and public health [18] [4].

Q: What protocols and tools can I use to prevent QRPs in my research workflow?

A: Adopting open science practices is the most effective way to prevent QRPs [33]. Key protocols include:

Pre-registration: Publicly register your hypotheses, methods, and analysis plan before observing the research outcomes. This prevents HARKing and p-hacking [2] [33].
Blind Data Analysis: When possible, have data analysis conducted by researchers blinded to the experimental conditions to reduce confirmation bias [33].
Use of Standardized Protocols: Employ validated measures and pre-established statistical models to minimize researcher degrees of freedom [33].
Data and Code Sharing: Publicly share raw data, analysis code, and supplementary materials to enable transparency and reproducibility [2] [33].

Troubleshooting Guides: Diagnosing and Fixing Common QRPs

Issue: Suspected P-hacking in Data Analysis

Diagnosis: P-hacking, or p-value hacking, occurs when many different analyses are carried out to discover statistically significant results when no real effect exists [2]. Symptoms include repeatedly running analyses with different covariates, excluding outliers without justification, or stopping data collection once significance is reached.

Resolution Protocol:

Pre-register Analysis Plan: Before data collection, detail all intended statistical tests, covariate inclusion rules, and data exclusion criteria in a time-stamped document on a platform like the OSF or BMJ Open [2].
Use Blind Analysis: Have a colleague or automated script run the pre-specified analyses on the cleaned dataset without knowledge of the hypotheses.
Report Comprehensively: In your manuscript, disclose all analyses conducted, including those that were non-significant, and justify any deviations from the pre-registered plan.

Issue: Hypothesizing After Results are Known (HARKing)

Diagnosis: This QRP involves presenting a post-hoc hypothesis (developed after seeing the data) as if it were an a priori prediction. This misleads readers about the confirmatory nature of the study [33] [4].

Resolution Protocol:

Maintain a Research Log: Keep a detailed, time-stamped record of all research decisions, including initial hypotheses and rationale.
Submit a Registered Report: For confirmatory research, consider submitting a Stage 1 Registered Report to a journal. This involves peer review of the introduction, methods, and proposed analysis plan before data are collected [4].
Clearly Label Exploratory Analyses: If unexpected findings emerge, clearly identify any subsequent analyses and hypothesis generation as exploratory in the manuscript.

Issue: Selective Reporting of Outcomes or Studies

Diagnosis: Also known as "cherry-picking," this involves only reporting results, variables, or entire studies that are significant or consistent with predictions, while withholding others [2] [49]. This creates a "file drawer effect" and biases the literature [49].

Resolution Protocol:

Pre-register All Outcomes: Specify all primary and secondary outcome variables in your pre-registration.
A Priori Power Analysis: Conduct a power analysis before the study to ensure sufficient sample size, reducing the likelihood of failed studies [2].
Publish All Results: Make a commitment to publish or share results regardless of the outcome. Use platforms for preprints or registered reports to disseminate null or negative findings [4].

Quantitative Data on QRPs

The table below summarizes prevalence data for QRPs across various disciplines, illustrating that these practices are a widespread concern [81].

Table 1: Prevalence of Self-Reported Engagement in Questionable Research Practices

Prevalence	Researcher Population	Reference
51%	807 ecologists and evolutionary biologists	Fraser et al. (2018) [81]
27%	2,155 U.S. psychologists	John et al. (2012) [81]
58%	1,166 U.S., European and Australian psychologists	Motyl et al. (2017) [81]
37%	277 Italian psychologists	Agnoli et al. (2017) [81]
50%	746 management researchers	Banks et al. (2016) [81]
51%	6,813 Dutch scientists (across disciplines)	Gopalakrishna et al. (2022) [81]

The following table contrasts deliberate QRPs with "honest yet unacceptable" research practices, which are unintentional mistakes or weaknesses that are nonetheless damaging due to their wide prevalence [18].

Table 2: Questionable Research Practices vs. Honest Yet Unacceptable Research Practices

Questionable Research Practices (Deliberate)	Honest Yet Unacceptable Research Practices (Unintentional)
Data manipulation or study fabrication [18]	Not submitting negative results for publication [18]
P-hacking to lower p-values [18]	Failure to preregister studies [18]
Changing hypotheses based on results obtained (HARKing) [18]	Downplaying or omitting references to study limitations [18]
Post-hoc deletion of data [18]	Overhyping of results or their significance [18]
Salami slicing of papers (dividing results to increase publication count) [18]	Weak attention to statistical power [18]
Artificially selecting controls to produce statistical significance [18]	Failure to report study weaknesses [18]

The Scientist's Toolkit: Essential Reagents for Research Integrity

This table details key solutions and practices to uphold rigor and transparency in research.

Table 3: Research Reagent Solutions for Combating QRPs

Tool / Solution	Function & Purpose	Example/Platform
Pre-registration	Time-stamps hypotheses and analysis plans, preventing HARKing and p-hacking. Distinguishes confirmatory from exploratory research.	Open Science Framework (OSF), ClinicalTrials.gov, AsPredicted
Registered Reports	A publishing format where peer review happens before data collection. Accepts articles based on scientific rigor, not result significance.	Journals in Cortex, Comprehensive Results in Social Psychology [4]
Data & Code Sharing	Enables full transparency, allows others to verify and build upon findings, and facilitates meta-analyses.	OSF, Zenodo, Dataverse, GitHub
Citation Manager	Helps organize references and ensures accurate attribution to original sources, avoiding improper referencing.	Zotero, Mendeley [2]
Power Analysis Software	Determines the sample size needed to detect an effect a priori, reducing underpowered studies and the incentive for p-hacking.	`pwr` package in R, G*Power, Superpower [2]
Blind Analysis Protocols	A methodology where data analysis is conducted blinded to experimental conditions to reduce confirmation bias.	Internal lab standard operating procedures (SOPs) [33]

Research Integrity Workflow: From Concept to Publication

The diagram below visualizes a robust research workflow integrated with safeguards against QRPs, from initial planning to final publication.

Diagram 1: Integrity-focused research workflow with QRP safeguards.

QRP Detection and Mitigation Logic

This diagram outlines the logical process for diagnosing a potential QRP and implementing the correct mitigating solution.

Diagram 2: Diagnostic logic for identifying and mitigating common QRPs.

Questionable Research Practices (QRPs) are methodological behaviors that, while not necessarily constituting outright fraud, are likely to produce misleading conclusions, typically in the interest of the researcher [33]. These practices threaten the integrity of scientific findings by compromising the replicability and validity of research [7]. The discussion of QRPs has been prominent in fields relying on group comparison research, but they manifest differently in Single-Case Experimental Designs (SCED) due to substantially different research methods and data analysis strategies [7]. This guide provides a comparative troubleshooting FAQ to help researchers identify, avoid, and remedy common QRPs in both research paradigms.

Comparative Analysis of QRPs: A Troubleshooting Guide

The table below summarizes how common QRPs manifest differently in group and single-case research designs, aiding in the identification of potential issues in your work.

Table 1: Troubleshooting Guide: QRPs in Group vs. Single-Case Designs

Research Phase	Questionable Research Practice (QRP)	Common in Group Designs	Common in Single-Case Designs	Primary Consequence
Planning	Selective Sampling	Recruiting participants more likely to show positive effects into a treatment group [7].	Selecting a participant known to be highly responsive to the intervention.	Biased estimate of treatment effect; reduced generalizability.
Data Collection	Adding Data After Results	Collecting more data after seeing results to achieve statistical significance [2] [82].	Adding more intervention sessions until a desired visual effect is achieved.	Inflated Type I error rates; false positives.
Data Analysis	P-hacking	Running multiple analyses to find a statistically significant result (e.g., outlier exclusion, model specification) [7] [2] [33].	Manipulating graphical display (e.g., axis scaling) to enhance the appearance of a visual effect [7].	Misleading representation of the effect's strength and consistency.
Data Analysis	Selective Data Reporting	Reporting only studies, conditions, or dependent variables with positive/statistically significant results [7] [2].	Omitting data points from graphs that indicate instability or weak effects [7].	File drawer effect; distorted meta-analytic findings.
Writing	HARKing (Hypothesizing After Results are Known)	Formulating or presenting a hypothesis after the results are known to fit the data [7] [2].	Developing a post-hoc explanation for a functional relation observed in the data.	Compromised theory testing; overfitting of explanations to noise.
Publication	Selective Outcome Reporting	Writing abstracts/discussions to downplay undesirable results and emphasize desired ones [7].	Overinterpreting weak or unstable effects in the visual analysis as being clinically significant.	Misleading readers about the robustness and applicability of findings.

FAQs on QRP Identification and Prevention

FAQ 1: Our field primarily uses visual analysis of single-case data. How can our analysis be "questionable"?

Visual analysis is not immune to biases. Unlike statistical analysis, there are fewer universal standards, creating "researcher degrees of freedom." Common issues include:

Graphical Manipulation: Altering the scale of the y-axis to make small changes look large, or omitting data points that show undesirable variability [7].
Selective Combining of Data: Parsing or combining dependent variables in graphs in a way that emphasizes a positive outcome [7].
Confirmation Bias in Interpretation: Overinterpreting minor changes in level, trend, or variability as evidence of an effect, or dismissing clear lack of effect due to prior expectations.

Improved Practice: Establish and pre-specify visual analysis criteria (e.g., What constitutes a change in level? How many data points define a trend?). Use statistical analysis as a supplement to visual inspection to increase objectivity [83].

FAQ 2: We had to exclude some participant data. Is this always a QRP?

Not necessarily. Exclusion becomes a QRP when it is done selectively, post-hoc, and without transparent justification to make the results look more favorable.

In Group Designs: Excluding participants from the treatment condition who did not respond well, while including all participants from the control condition [7].
In Single-Case Designs: Deciding to drop an entire case (participant) from the study because the intervention showed no effect for them.

Improved Practice: Before collecting data, establish and document clear, justified a priori criteria for data exclusion (e.g., pre-defined adherence thresholds, protocol deviations). Report all exclusions transparently in the manuscript, including for participants who showed no effect [2] [82].

FAQ 3: Is it acceptable to analyze data one way, then change the analysis to get a better result?

This practice, known as p-hacking in group designs, is a serious QRP. It inflates the false positive rate.

In Group Designs: Trying different covariates, outlier handling methods, or statistical tests until a p-value falls below .05 [2] [33].
In Single-Case Designs: Trying different visual analysis techniques or statistical metrics until the data appears to demonstrate experimental control.

Improved Practice: Preregister your analysis plan, including precise specifications of your primary outcome, how it will be measured, and the exact statistical or visual analysis strategy you will use [2] [82] [33]. For exploratory analyses, clearly label them as such.

FAQ 4: We discovered an interesting finding we didn't predict. How can we report it without HARKing?

This is a common situation. The problem is not the discovery, but how it is presented.

QRP (HARKing): Writing the paper as if you had predicted this finding all along, presenting it as an a priori hypothesis [7] [2].
Improved Practice: Be transparent. Frame the finding as exploratory or post-hoc. Clearly state that it was suggested by the data and requires future confirmation through pre-registered replication [34]. This turns a QRP into a generator of legitimate new hypotheses.

Improved Research Practices and Solutions

The table below outlines key remedies and improved research practices that serve as alternatives to QRPs, applicable to both group and single-case designs.

Table 2: The Scientist's Toolkit: Reagent Solutions for Improved Research Integrity

Tool / Solution	Function / Purpose	Applies To
Pre-registration	Documents hypotheses, methods, and analysis plans before a study is conducted. Limits flexibility in analysis and reporting. [2] [82] [33]	Group & Single-Case
Blind Data Analysis	Analyzing data with hidden condition labels to reduce confirmation bias. The "answer" is revealed only after analysis choices are finalized. [82] [33]	Group & Single-Case
Power Analysis / Sensitivity Analysis	(Group) Determines sample size needed to detect an effect. (Single-Case) Determines the number of measurements needed to detect an effect with a given design. [82]	Primarily Group
Data & Code Sharing	Making raw data and analysis code publicly available. Allows for independent verification and reproducibility checks. [33]	Group & Single-Case
Registered Reports	A publishing format where peer review happens before data collection. Acceptance is based on the question and methods, not the results. [82]	Group & Single-Case
Standard Operating Procedures (SOPs)	A document detailing procedures for common research actions (e.g., outlier handling) to ensure consistency and avoid ad-hoc decisions. [82]	Group & Single-Case
The "21-Word Solution"	A statement in the method section: "We report how we determined our sample size, all data exclusions (if any), all manipulations, and all measures in the study." [82]	Group & Single-Case
Replication	Conducting direct or conceptual replication studies to determine the reliability of findings. [82]	Group & Single-Case

Workflow for Mitigating QRPs in Research

The following diagram visualizes a recommended workflow for integrating improved research practices into your project to mitigate QRPs from start to finish.

Frequently Asked Questions (FAQs)

Q1: What are Questionable Research Practices (QRPs) and how do they directly affect replication? QRPs are suboptimal research practices that occupy an ethical gray area between sound science and outright misconduct (e.g., fabricating data, selective reporting of outcomes, hypothesizing after results are known). They directly reduce the likelihood that a study's findings will replicate because they increase the rate of false positive results and undermine the reliability and validity of the scientific record. [84] [4]

Q2: Why is the base rate of true effects important for replication? The base rate of true effects within a research domain is a major factor determining replication rates. In fields where true effects are rare (e.g., early drug discovery), the relative proportion of false positives will be high, leading to lower replication rates for purely statistical reasons, even in the absence of QRPs. [84]

Q3: What is the difference between replication and direct replication? A replication experiment repeats a measurement under similar conditions to estimate the imprecision or random error of an analytical method. It is a fundamental practice for verifying the reliability of findings. [85] Direct replication specifically refers to an independent study that uses the same methods and procedures to verify a previously published result.

Q4: How can I estimate the imprecision of my method? Perform a replication experiment by analyzing a minimum of 20 samples of the same test material. Calculate the mean, standard deviation (SD), and coefficient of variation (CV). The SD represents the random error or imprecision of your method. [85]

Q5: What constitutes acceptable performance for imprecision? For short-term imprecision (within-run or within-day), the standard deviation should be less than a quarter of the defined total allowable error. For long-term imprecision (total), the standard deviation should be less than one-third of the total allowable error. [85]

Troubleshooting Guides

Issue 1: Failure to Replicate a Statistically Significant Result

Problem: An independent study fails to find a statistically significant effect that was previously reported.

Investigation & Solutions:

Potential Cause	Diagnostic Checks	Corrective Actions
Low Statistical Power [84]	Calculate the power of the original and replication studies.	Always conduct an a-priori power analysis before data collection. For the replication, use a larger sample size than the original study.
Questionable Research Practices (QRPs) in original study [84] [86]	Check for signs of p-hacking (e.g., multiple testing, selective outlier removal). Check if the result is part of a "too good to be true" series of positive findings.	Perform a reanalysis of the original data if available. Pre-register the replication study's hypothesis and analysis plan to avoid the same pitfalls.
Low Base Rate of True Effects [84]	Evaluate the prior probability for the field (e.g., via meta-analyses or prediction markets).	Interpret findings with caution in low base-rate fields. Use Bayesian methods to calculate the posterior probability of the effect being true.
Methodological Discrepancies	Carefully compare lab protocols, reagents, equipment, and sample populations between original and replication study.	Directly collaborate with the original authors to align methods. Conduct a "differential replication" to test the effect under varied conditions.

Issue 2: Unrealistically High Rate of Successful Replications in a Multi-Study Paper

Problem: A paper reports a series of experiments that all successfully replicate a key finding, but the individual studies have low statistical power, making this series of successes improbable.

Investigation & Solutions:

Potential Cause	Diagnostic Checks	Corrective Actions
Selective Reporting (File Drawer Effect) [84]	Calculate the probability of all studies being significant given their individual power. An improbably high success rate suggests unreported failed studies.	Request raw data for all conducted experiments related to the research question. Look for pre-registered study designs as a sign of completeness.
Use of QRPs Across Studies [84]	Check for flexibility in data collection and analysis across the studies (e.g., different outliers removed, changes in dependent variables).	Advocate for the publication of all research outcomes, regardless of statistical significance. Use study pre-registration to lock in analysis plans.

Issue 3: Suspected Data Fabrication or Falsification

Problem: Data patterns appear unnatural, or findings seem implausible, leading to suspicion of data manipulation.

Investigation & Solutions:

Potential Cause	Diagnostic Checks	Corrective Actions
Data Fabrication [86]	Perform data forensics (e.g., check for digit preference, anomalies in distributions). Attempt to replicate the data collection process.	Report concerns to relevant institutional integrity bodies. Independent replication by a separate lab is the strongest test.
Data Falsification [86]	Scrutinize lab notebooks and original data records for inconsistencies. Check for selective omission of data points to achieve significance.	Foster an open science culture where data and code are shared. This enables peer scrutiny and deters misconduct.

Quantitative Data on QRPs and Replication

Table 1: Replication Rates Across Scientific Disciplines

Data synthesized from the Open Science Collaboration (2015) and other large-scale replication projects. [84]

Discipline	Estimated Replication Rate	Key Contributing Factors
Social Psychology	< 30%	Low base rate of true effects, QRPs, low statistical power. [84]
Cognitive Psychology	~50%	Slightly higher base rate of true effects compared to social psychology. [84]
Preclinical Cancer Research	~11% (6 out of 53 landmark studies)	Low base rate, complex biological systems, potential for QRPs. [84]
Economics	~62%	-

Table 2: Prevalence of Self-Reported Engagement in QRPs

Data from various anonymous survey studies. [4]

QRP Example	Estimated Prevalence Range	Notes
Failing to report all of a study's conditions or dependent measures	Up to 100% for some QRPs in psychology [4]	Based on an anonymous elicitation survey with incentives for truth-telling. [84]
HARKing (Hypothesizing After Results are Known)	Commonly self-reported [4]	Considered by some as defensible in certain contexts. [4]
Selectively reporting studies that "worked"	Common [84]	Contributes to the "file drawer problem."
Rounding off p-values (e.g., from .054 to .05)	-	-
At least one QRP in the past three years	~34% [4]	Based on a meta-analytic study by Fanelli (2009).

Experimental Protocols

Protocol 1: Conducting a Basic Replication Experiment for Method Validation

Purpose: To estimate the imprecision (random error) of an analytical method. [85]

Materials:

Test samples (minimum of 20) from at least 2 different control materials that represent low and high medical decision concentrations. [85]
Standard operating procedure for the method.
Appropriate laboratory equipment.

Procedure:

Short-Term Imprecision: Analyze 20 samples of each control material within a single run or within one day. [85]
Data Collection: Record all individual measurements.
Calculation: For each material, calculate the mean, standard deviation (SD), and coefficient of variation (CV).
- Mean: ( \bar{x} = \frac{\sum xi}{n} )
- Standard Deviation: ( s = \sqrt{\frac{\sum(xi - \bar{x})^2}{n-1}} )
- Coefficient of Variation: ( CV = \frac{s}{\bar{x}} \times 100\% ) [85]
Judgment: Compare the calculated CV to predefined acceptability criteria (e.g., < 0.25 × total allowable error for short-term imprecision). [85]
Long-Term Imprecision: Analyze one sample of each control material on 20 different days and repeat the calculations. The total SD should be < 0.33 × total allowable error. [85]

Protocol 2: Designing a Direct Replication Study

Purpose: To verify a previously published finding using the same methods and procedures.

Materials:

Published study to be replicated.
Required lab reagents, equipment, and software.

Procedure:

Study Pre-registration: Publicly register the study's hypothesis, methods, analysis plan, and sample size justification before beginning data collection. This prevents QRPs like p-hacking and HARKing. [86]
Method Alignment: Collaborate with the original authors to obtain protocols, materials, and data analysis scripts to ensure methodological fidelity.
Power Analysis: Conduct an a-priori sample size calculation to ensure the replication study has high power (e.g., 90-95%) to detect the original effect size.
Blinded Data Collection: Where possible, implement blinding to prevent experimenter bias.
Data Analysis: Follow the pre-registered analysis plan exactly. Additionally, conduct robustness checks with alternative statistical models.
Reporting: Report the results regardless of the outcome, including all pre-registered and exploratory analyses. Share all data and code publicly.

Visualizing the Replication Workflow

Replication Validation Process

Researcher Decision Pathway Under Pressure

The Scientist's Toolkit: Research Reagent Solutions

Item	Function/Benefit
Pre-Registration Templates	Provides a structured format for detailing hypotheses, methods, and analysis plans before data collection, preventing QRPs like HARKing and p-hacking.
Open Data Repositories	Platforms for sharing raw research data, enabling independent verification of results and detection of errors or misconduct.
Power Analysis Software	Tools to calculate the necessary sample size to detect an effect, ensuring studies are adequately powered and reducing false negatives.
Electronic Lab Notebooks	Securely records research procedures and data in a time-stamped, uneditable format, providing a clear audit trail.
Statistical Software with Robust Methods	Enables the application of appropriate statistical tests and Bayesian methods to assess the strength of evidence for hypotheses.

FAQs: Addressing Key Challenges in Evidence Assessment

What are Questionable Research Practices (QRPs) and why are they a problem for evidence assessment?

Questionable Research Practices (QRPs) are activities that exist in an ethical grey area between sound scientific conduct and outright misconduct (e.g., fabrication, falsification, and plagiarism) [4]. They are problematic because they undermine the reliability and validity of scientific knowledge, leading to a skewed scientific literature that prolongs support for empirically untenable theories [4]. Common QRPs include [87]:

HARKing (Hypothesizing After Results are Known)
P-hacking: Running multiple statistical analyses until achieving significant results
Selective reporting/cherry-picking: Reporting only results that are significant or consistent with predictions
Not accurately recording the research process
Improper referencing
Failing to share data

These practices are concerning due to their prevalence, with an estimated one in two researchers having engaged in at least one QRP over the last three years [87]. When assessing evidence, these practices can lead to false positives and distorted conclusions, making it crucial to use robust assessment tools that can identify potential QRPs [87].

Which evidence assessment tools are most effective for detecting potential QRPs in different study designs?

Different quality assessment tools are designed to evaluate specific study methodologies. Using the appropriate tool for each study design is essential for properly assessing potential QRPs [88]. The table below summarizes recommended tools for various study types:

Table: Evidence Assessment Tools for Different Study Designs

Study Design	Recommended Assessment Tools	Key Elements Assessed
Randomized Controlled Trials (RCTs)	Cochrane Risk of Bias (ROB) 2.0 [88], CASP RCT Appraisal Tool [88], Jadad Scale [88]	Randomization process, allocation concealment, blinding, outcome data completeness, selective reporting
Cohort Studies	Newcastle-Ottawa Scale (NOS) [88], CASP Cohort Studies Checklist [88]	Group selection, comparability, exposure/outcome assessment, follow-up adequacy
Case-Control Studies	Newcastle-Ottawa Scale (NOS) [88], CASP Case Control Study [88]	Case and control selection, comparability, exposure measurement
Systematic Reviews	AMSTAR Checklist [88], CASP Systematic Review [88]	Search comprehensiveness, study selection criteria, risk of bias assessment, meta-analysis methods
Diagnostic Studies	QUADAS-2 [88], CASP Diagnostic Studies [88]	Patient selection, index test, reference standard, flow and timing
Qualitative Studies	CASP Qualitative Studies [88], McGill MMAT [88]	Research aims, methodology, design, recruitment, data collection

What methodological red flags should I look for when screening studies for potential QRPs?

When assessing literature for potential QRPs, several methodological warning signs should prompt more careful scrutiny:

Statistical inconsistencies: Reporting p-values that barely cross the significance threshold (e.g., p = 0.049) without correction for multiple testing may suggest p-hacking [87]
Selective outcome reporting: Discrepancies between registered primary outcomes and published results, or focusing only on statistically significant findings while omitting non-significant ones [87]
Insufficient methodological detail: Lack of transparent description of research procedures, making replication and verification difficult [87]
Inappropriate data exclusion: Removing outliers without pre-specified, justified criteria [87]
HARKing: Studies where hypotheses appear tailored to fit unexpected results rather than testing pre-specified predictions [4]

How can I assess a study's applicability to my specific research context?

Evaluating a study's applicability involves assessing whether results can be validly applied to your specific organization, population, or research context [88]. Key considerations include:

Population similarity: Are the study participants sufficiently similar to your population of interest in terms of demographics, disease characteristics, and other relevant factors?
Intervention feasibility: Can the intervention be implemented in your setting with available resources, expertise, and infrastructure?
Outcome relevance: Do the measured outcomes align with outcomes important to your context and stakeholders?
Practical significance: Is the effect size practically relevant, not just statistically significant? Consider confidence intervals and minimal important differences [88]

Troubleshooting Guides

Issue: Inconsistent quality ratings across team members

Problem: Different reviewers assign different quality ratings to the same study using the same assessment tool.

Solution:

Develop a detailed codebook: Create explicit criteria for how each item on your chosen assessment tool should be scored, with examples of what constitutes adequate fulfillment of each criterion [88]
Conduct calibration exercises: Before formal assessment, have all team members independently rate the same 2-3 studies, then compare and discuss discrepancies to reach consensus on application of the tool [88]
Implement dual independent review: Have at least two reviewers assess each study independently, then resolve discrepancies through discussion or third-party adjudication [88]
Calculate inter-rater reliability: Periodically assess agreement between reviewers using statistics like Cohen's kappa to monitor consistency [88]

Table: Protocol for Resolving Assessment Discrepancies

Discrepancy Level	Resolution Process	Documentation Requirement
Minor (e.g., 1-point difference on scale)	Discussion between two original reviewers	Note initial scores and rationale for final score
Moderate (e.g., different risk of bias categories)	Discussion with reference to codebook examples	Document specific criteria interpretation
Major (fundamental disagreement on study validity)	Adjudication by third reviewer with methodology expertise	Record all perspectives and final decision rationale

Issue: Suspected selective reporting in a potentially important study

Problem: A study appears to report only positive findings, with incomplete outcome data or missing analyses.

Solution:

Check for protocol registration: Look for pre-registered protocols in clinical trial registries (e.g., ClinicalTrials.gov) or open science platforms (e.g., OSF) [87]
Compare methods and results sections: Identify any outcomes mentioned in methods but not reported in results [87]
Examine analytical choices: Note whether multiple analytical approaches were attempted but only significant ones reported [87]
Contact authors: Request missing information, pre-registered analysis plans, or unreported outcomes [87]
Document limitations: Clearly note concerns about selective reporting in your assessment and consider sensitivity analyses excluding the study if concerns are substantial [88]

Flowchart: Addressing Suspected Selective Reporting

Issue: Integrating evidence of varying quality into a coherent assessment

Problem: Included studies have substantially different methodological quality, making overall conclusions challenging.

Solution:

Pre-specify quality thresholds: Establish minimum quality standards for studies to inform primary conclusions during protocol development [88]
Stratify by quality: Group studies by quality ratings (e.g., high, moderate, low) and analyze effects separately for each group [88]
Use GRADE methodology: Apply the Grading of Recommendations Assessment, Development and Evaluation (GRADE) approach to rate overall quality of evidence across studies [89]
Consider quality in synthesis: Give more weight to higher-quality studies in conclusions, while transparently reporting the contribution of all studies [88]
Acknowledge limitations: Clearly state how variation in quality affects confidence in overall findings and highlight need for higher-quality future research [88]

Research Reagent Solutions: Essential Tools for Evidence Assessment

Table: Key Resources for Evidence Assessment and QRP Identification

Tool/Resource	Primary Function	Application Context
Cochrane Risk of Bias 2.0	Assess methodological quality of randomized trials	Gold standard for RCT quality assessment; evaluates randomization, deviations, missing data, outcome measurement, selective reporting [88]
Newcastle-Ottawa Scale (NOS)	Quality assessment of non-randomized studies	Standard tool for case-control and cohort studies; evaluates selection, comparability, outcome/exposure [88]
AMSTAR Checklist	Appraise systematic reviews and meta-analyses	16-item tool to evaluate systematic review methodology, including search strategy, study selection, data extraction, and synthesis methods [88]
GRADE approach	Rate quality of evidence across studies	System for grading confidence in estimates; considers study design, limitations, inconsistency, indirectness, imprecision, publication bias [89]
Open Science Framework (OSF)	Protocol registration and data sharing	Platform for pre-registering analysis plans, sharing data and materials; helps identify selective reporting and p-hacking [87]
QUADAS-2	Assess quality of diagnostic accuracy studies	Tool specifically designed for diagnostic studies; evaluates patient selection, index test, reference standard, flow and timing [88]

Workflow: Evidence Assessment with QRP Screening

Experimental Protocols for Evidence Assessment

Protocol 1: Dual Independent Review with Adjudication

Purpose: To minimize bias and errors in quality assessment of included studies through structured independent review.

Materials:

Pre-selected evidence assessment tool appropriate to study design [88]
Standardized data extraction form
Codebook with explicit scoring criteria [88]

Methodology:

Training phase: All reviewers complete training on the assessment tool using practice studies not included in the current review [88]
Calibration: Reviewers independently assess the same 2-3 included studies and compare ratings, discussing discrepancies to refine consistent application of criteria [88]
Independent assessment: Two reviewers independently assess each study using the chosen tool without consultation [88]
Comparison of ratings: Reviewers compare initial ratings and note discrepancies
Consensus meeting: Reviewers discuss discrepancies with reference to the codebook and attempt to reach agreement [88]
Adjudication: Unresolved discrepancies are referred to a third reviewer with methodological expertise for final decision [88]
Documentation: Record initial ratings, discussion points, and final agreed ratings for transparency [88]

Protocol 2: Systematic QRP Identification in Statistical Reporting

Purpose: To systematically identify potential questionable research practices in the statistical reporting of studies.

Materials:

Statistical reporting checklist (e.g., based on SAMPL guidelines)
Access to statistical software for verification of reported results (optional)
Protocol registry databases (e.g., ClinicalTrials.gov, OSF)

Methodology:

Pre-registration verification: Check for study pre-registration and compare pre-specified hypotheses, outcomes, and analysis plans with published report [87]
Selective reporting assessment: Identify any outcomes mentioned in methods but not reported in results, or additional outcomes reported without pre-specification [87]
Multiple testing examination: Assess whether appropriate corrections were applied for multiple comparisons when numerous statistical tests were conducted [87]
Data peeking assessment: For clinical trials, check whether interim analyses were properly accounted for in statistical significance thresholds [87]
Analytical flexibility review: Note whether multiple analytical approaches were used but only significant results reported [87]
Data availability check: Assess whether data and analysis code are available for independent verification [87]
Document findings: Systematically document potential QRPs and their likely impact on results interpretation

Technical Support Center: Troubleshooting Replication Failure

Frequently Asked Questions (FAQs)

Q1: Our research team followed all standard methodologies, yet our replication attempts frequently fail. Are we committing Questionable Research Practices (QRPs)?

Failure to replicate is not always a sign of misconduct. The root cause may lie in Questionable Research Fundamentals (QRFs)—the underlying philosophical assumptions of your theories, concepts, and methods [90]. Before suspecting QRPs, investigate your field's base rate of true effects; domains where true effects are rare inherently have lower statistical replicability, even with perfect practices [84]. We recommend a paradigm check of your ontological and epistemological foundations [90].

Q2: We have obtained a statistically significant result (p < 0.05). How can we be more confident it will replicate?

A single p-value is an unreliable indicator of replicability [84]. Focus on strengthening your research fundamentals. This includes pre-registering your study design and analysis plan to curb p-hacking, using larger sample sizes to achieve higher statistical power, and employing more robust statistical methods. Fundamentally, you must critically evaluate whether your measurement approach genuinely captures the psychological phenomenon you intend to study [90].

Q3: What is the most critical but often overlooked step in designing a replicable study?

The most critical step is clearly distinguishing between the study phenomena (e.g., a participant's internal belief) and the means used to explore it (e.g., the questionnaire items). A failure to separate these is a cardinal error in psychology that leads to confusion between what exists and how we measure it, fundamentally undermining replicability [90].

Q4: Our large-scale, multi-site replication study produced ambiguous results. What went wrong?

The problem may be the ergodic fallacy. You are likely applying group-level (nomothetic) findings to explain individual-level processes, which is often statistically invalid [90]. For phenomena that are non-ergodic (where group averages do not represent individuals), consider shifting to case-by-case, person-oriented analyses to establish epistemically justified generalizations [90].

Troubleshooting Guides

Issue: Suspected Low Replicability in a Research Domain

This guide helps diagnose systemic factors affecting replicability in your field.

Step 1: Gather Information & Identify Symptoms
- Action: Collect quantitative data on replication rates, statistical power of typical studies, and evidence of publication bias in your field [84].
- Avoid: Focusing solely on single, sensational replication failures. Look for broad patterns [91].
Step 2: Establish Probable Cause
- Action: Analyze the gathered data. Use the table below to evaluate potential causes beyond QRPs.
- Avoid: Jumping to the conclusion that QRPs are the primary driver without assessing the base rate [84].
Step 3: Test a Solution
- Action: Based on the probable cause, implement a targeted intervention. For low base rates, focus on better theory-building. For low power, advocate for larger samples.
- Avoid: Implementing multiple, simultaneous changes, as this makes it impossible to identify what worked [91].
Step 4: Implement the Solution
- Action: Formally adopt new practices, such as pre-registration or the use of person-oriented analysis frameworks [90].
- Avoid: Failing to document the new protocols and train team members.
Step 5: Verify Functionality
- Action: Monitor long-term replication success and research credibility. This is a slow process that requires tracking meta-scientific indicators over time [40].
- Avoid: Declaring success after a single successful replication.

Table: Diagnostic Table for Replication Failure

Potential Cause	Key Indicators	Supporting Evidence from Literature
Low Base Rate of True Effects [84]	Low replication rates despite high methodological rigor; discovery-oriented (vs. theory-testing) research.	Base rates estimated at ~9% in social psychology and ~20% in cognitive psychology [84].
Low Statistical Power [84]	Small sample sizes; small effect sizes; wide confidence intervals.	Median statistical power in psychology has been estimated at ~36% [84].
Questionable Research Fundamentals (QRFs) [90]	Reliance on variable-oriented approaches; confusion between ontological concepts and their measurement.	A core argument is that QRFs, not just QRPs, are the root cause of psychology's crises [90].
Inappropriate Generalization (Ergodic Fallacy) [90]	Applying group-level findings to individuals; high intra-individual variability.	Many psychological processes are non-ergodic, making sample-to-individual inferences invalid [90].

Issue: Implementing a Robust Multi-Site Replication Project

This guide provides a workflow for conducting large-scale, collaborative replication projects, an established structural solution to the replication crisis [40].

The Scientist's Toolkit: Research Reagent Solutions

Table: Essential Methodological "Reagents" for Robust Research

Item / Concept	Function / Explanation	Field Application
Pre-registration	A time-stamped, immutable plan stating hypotheses, methods, and analysis strategy before data collection.	Mitigates QRPs like HARKing and p-hacking; distinguishes confirmatory from exploratory research [40].
Collaborative Replication Networks	Consortia of labs (e.g., Collaborative Replications and Education Project) that conduct large-scale replications [40].	Increases sample size and generalizability; provides robust replicability estimates for a field [40].
Person-Oriented Analysis	Analytical approaches that focus on the individual as a functioning whole, rather than on isolated variables [90].	Avoids the ergodic fallacy; essential for studying non-ergodic, dynamic psychological processes [90].
Contributor Roles Taxonomy (CRediT)	A standardized taxonomy (e.g., CRediT) to transparently document author contributions [40].	Clarifies authorship and accountability, especially in large collaborative projects [40].
Philosophy of Science Elaboration	The critical process of making explicit the ontological and epistemological foundations of one's research paradigm [90].	Addresses QRFs by ensuring theories and methods are built on coherent, justified fundamentals [90].

Experimental Protocol: Large-Scale Collaborative Replication

1.0 Objective: To independently verify the findings of a high-impact study (the "target") through a pre-registered, multi-laboratory replication effort [40].

2.0 Methodology:

2.1 Target Selection: Identify the target study from a defined pool (e.g., most cited studies in top journals over the last 3 years). An administrative committee must vet the selected study for feasibility of replication [40].
2.2 Protocol Finalization: Faithfully adapt the original study's methodology. Translate materials if necessary. The final protocol must be approved by all participating sites to ensure standardization [40].
2.3 Pre-registration: Publicly pre-register the study hypothesis, experimental design, sampling plan (including target sample size), and primary statistical analysis procedure on a recognized repository (e.g., OSF, AsPredicted).
2.4 Data Collection: Participating sites (e.g., university courses or research labs) collect data according to the finalized protocol. Data collection is monitored to ensure adherence.
2.5 Data Submission & Aggregation: All data and materials are submitted to a central repository. A lead team checks data for quality and completeness before aggregation [40].
2.6 Analysis: The pre-registered analysis is first conducted. A meta-analysis is then performed across all data collection sites to compute an aggregate effect size estimate and assess heterogeneity [40].

3.0 Diagram: Replication Project Workflow

Troubleshooting Guides & FAQs

This technical support center provides resources to help researchers identify and address common issues in studies focused on Questionable Research Practices (QRPs). The guides below offer systematic approaches to troubleshooting methodological challenges.

Troubleshooting Guide: Self-Report Data Collection in QRP Research

Problem: Low response rates and potential for socially desirable answers in surveys measuring QRPs. Application: This guide is for researchers encountering poor data quality during investigations into research integrity using self-report questionnaires [92].

Problem Symptom	Likely Cause	Prerequisites to Check	Resolution Steps
Low survey response rate [93]	Long, cumbersome questionnaire; low perceived anonymity; poorly targeted sample.	Ensure the questionnaire can be completed in an appropriate time frame [93].	1. Run a pilot study to get feedback on design and length [93]. 2. Simplify and shorten the questionnaire [93]. 3. Use multiple recruitment channels and send polite reminders.
Evidence of Social Desirability Bias (e.g., unrealistically low reports of common QRPs) [92]	Respondents providing socially acceptable rather than truthful answers, especially on sensitive topics [93].	Check if questions on sensitive behaviors (e.g., data manipulation) are leading or assumptive [92].	1. Reassure participants of anonymity and confidentiality [93]. 2. Use neutral, non-judgmental language and avoid leading questions [92] [93]. 3. Consider using indirect questioning techniques.
High item non-response or ambiguous answers	Poorly worded questions; complex terminology; confusing format [93].	Check for technical jargon, ambiguous terms, or double-barreled questions [92] [93].	1. Conduct a pilot to check question clarity [93]. 2. Use simple, straightforward language tailored to the audience [93]. 3. Revise or drop problematic items.
Inconsistent responses within the same survey	Question order effects where earlier answers influence later ones [93].	Review the sequence of questions.	1. Ensure questions flow logically from least to most sensitive [93]. 2. Separate potentially reactive questions [93]. 3. Randomize question blocks where possible.

Expected Results: Implementation of these steps should lead to improved response rates and data quality, yielding more valid and reliable metrics on QRPs. If the issue persists: Consider using methodological triangulation (e.g., combining survey data with data audits) to validate findings [92].

Troubleshooting Guide: Validating a Novel QRP Identification Tool

Problem: An instrument developed to detect QRPs (e.g., in text or data) performs poorly during initial validation. Application: This guide assists researchers during the development and validation phase of a new QRP assessment scale or algorithmic tool [92].

Problem Symptom	Likely Cause	Prerequisites to Check	Resolution Steps
Low inter-rater reliability (for manual tools) or low accuracy (for algorithmic tools)	Poorly defined operational criteria for QRPs; inadequate training for coders; flawed model training data.	Confirm that the conceptual framework for all QRPs is clear and comprehensive [92].	1. Gather content through literature review and expert consultation to refine criteria [92]. 2. Develop a detailed codebook with clear examples. 3. Re-train coders or re-train the algorithm with improved data.
Tool fails to generalize to new datasets	Overfitting during development; the tool is too specific to the original sample or context.	Check if the population/context for the original development is similar to the new study [92].	1. Collect a more diverse dataset for development. 2. Apply cross-validation techniques. 3. Recalibrate or adapt the tool for the new context [92].
Poor construct validity	The tool does not adequately measure the theoretical construct of a QRP.	Verify that the tool's items/features map directly onto the theoretical construct.	1. Conduct pilot focus groups to confirm themes and understanding [92]. 2. Perform statistical tests for validity (e.g., convergent, discriminant). 3. Revise the tool's items/features based on analysis.

Expected Results: A more robust, reliable, and valid tool capable of consistently identifying QRPs across different contexts. If the issue persists: Re-evaluate the underlying definition of the QRP being targeted and consider a fundamental redesign of the detection methodology.

Frequently Asked Questions (FAQs)

Q1: What is the most effective way to structure a questionnaire to minimize bias when asking about sensitive QRPs? A1: To minimize bias [93]:

Logical Flow: Order questions to progress from least sensitive to most sensitive [93].
Neutral Wording: Phrase questions neutrally to avoid leading respondents. For example, instead of "How often do you manipulate data?" use "How often, if ever, have you selected or omitted data points in a way that changed the study's conclusions?" [92].
Response Scales: Use balanced rating scales (e.g., Strongly Agree to Strongly Disagree) and include a "prefer not to say" option where appropriate [92] [93].
Pilot Testing: Always conduct a pilot study to identify and correct confusing, leading, or assumptive questions [93].

Q2: How can I improve the reliability of a protocol for manually coding published papers for QRPs? A2: Improve reliability through rigorous coder training and calibration:

Develop a Detailed Codebook: Create a comprehensive manual with explicit, operationalized definitions for each QRP and clear examples.
Training Sessions: Hold structured training sessions for all coders using the codebook.
Practice Round: Have coders independently code the same set of papers not included in the main study.
Calculate and Discuss IRR: Calculate Inter-Rater Reliability (IRR) statistics (e.g., Cohen's Kappa). Discuss discrepancies in coding to align understanding and refine the codebook before beginning the full analysis.

Q3: Our research involves analyzing signaling pathways potentially impacted by QRPs. How can we visually represent these complex relationships clearly? A3: Use standardized diagrams to map out pathways and workflows. Graphviz is an excellent tool for generating clear, reproducible diagrams from text-based code. See the "Mandatory Visualizations" section below for examples and code.

Experimental Protocols

Detailed Methodology: Validating a Research Integrity Questionnaire

This protocol outlines the steps for developing and validating a new questionnaire designed to assess researchers' awareness and engagement in QRPs [92].

1. Conceptualization and Item Generation

Define Constructs: Clearly define the specific QRPs the questionnaire will measure (e.g., p-hacking, data fabrication, selective reporting).
Gather Content: Conduct a scoping review of published literature and existing questionnaires to identify relevant themes and items [92]. Supplement with qualitative methods like focus groups with researchers to identify common themes and terminology [92].
Create Item Pool: Draft a comprehensive list of questions (items) that cover all aspects of the defined constructs. Include a mix of closed-ended questions (e.g., Likert scales) for quantification and open-ended questions for qualitative depth [93].

2. Questionnaire Design

Refine Wording: Ensure questions are clear, concise, and free from technical jargon, leading phrasing, double negatives, and ambiguity [92] [93]. Tailor the reading level to the target audience [92].
Structure and Format:
- Place demographic questions first.
- Order questions logically from least to most sensitive [93].
- Provide a brief introduction with research aims and clear instructions for completion [92].
- Ensure a clean, professional layout.

3. Pilot Testing

Run a Pilot Study: Administer the draft questionnaire to a small, representative sample (e.g., 10-15 researchers) [93].
Gather Feedback: Solicit feedback on question clarity, comprehension, length, and overall flow. Ask if any questions felt intrusive or confusing.
Check Reliability: Perform preliminary statistical analysis on pilot data to assess internal consistency (e.g., Cronbach's Alpha).

4. Finalization and Validation

Revise Questionnaire: Incorporate feedback from the pilot study to produce the final version.
Administer Final Survey: Distribute the finalized questionnaire to the full study sample.
Assess Psychometric Properties:
- Reliability: Analyze internal consistency and test-retest reliability.
- Validity: Evaluate construct validity (e.g., via factor analysis) and criterion validity if a "gold standard" measure exists.

Mandatory Visualizations

Diagram: QRP Identification Workflow

Diagram: Research Integrity Factors

The Scientist's Toolkit: Research Reagent Solutions

This table details key resources and methodologies essential for conducting rigorous research into Questionable Research Practices.

Item / Solution	Function in QRP Research
Validated Questionnaires (e.g., based on established scales)	Provides a reliable and standardized instrument for measuring self-reported attitudes and engagement in QRPs across different populations, allowing for comparability between studies [92] [93].
Pre-Registration Protocol	A detailed plan for study hypotheses, methods, and analysis decisions filed before data collection begins. Serves as a benchmark to detect and prevent HARKing (Hypothesizing After the Results are Known) and selective reporting [94].
Data Auditing Scripts (e.g., in R or Python)	Automated scripts used to screen datasets for statistical anomalies, inconsistencies, or patterns indicative of p-hacking or data fabrication (e.g., digit preference, implausible p-value distributions).
Inter-Rater Reliability (IRR) Framework (e.g., Cohen's Kappa calculation)	A statistical method to ensure consistency and agreement between multiple researchers when manually coding qualitative data or published manuscripts for the presence of QRPs.
Open Science Framework (OSF)	A collaborative platform to share pre-registrations, data, materials, and code. Promotes transparency and allows for direct examination of the research process, mitigating several QRPs.

Conclusion

Addressing Questionable Research Practices requires a multifaceted approach that combines clear definitions, robust detection methodologies, proactive prevention strategies, and critical validation of scientific literature. The recent development of a comprehensive inventory of 40 QRPs provides a crucial foundation for standardized identification, while technological advances like AI screening tools offer promising detection capabilities. For biomedical and clinical research, implementing open science practices, preregistration, and transparent reporting represents the most effective path toward mitigating QRPs' damaging effects on scientific credibility and drug development. Future efforts must focus on cultural and institutional reforms that reduce perverse publication incentives while promoting research quality over quantity. By adopting these integrated strategies, the research community can strengthen the integrity of the scientific record, enhance the replicability of findings, and ultimately accelerate the development of reliable medical treatments and interventions.

Questionable Research Practices (QRPs) in Science: A Comprehensive Guide to Identification, Prevention, and Solutions for Researchers

Questionable Research Practices (QRPs) in Science: A Comprehensive Guide to Identification, Prevention, and Solutions for Researchers

Abstract

Defining the Problem: Understanding Questionable Research Practices and Their Impact on Scientific Integrity

What Are QRPs? From the 'Bestiary of Questionable Researcher Practices' to a Standardized Definition

FAQ 1: What are Questionable Research Practices (QRPs)?

FAQ 2: What are the most common QRPs I should look out for?

FAQ 3: How prevalent are QRPs, and what motivates researchers to use them?

FAQ 4: What are the methodologies for detecting QRPs in research?

The Scientist's Toolkit: Essential Reagents for Research Integrity

FAQ 5: What improved practices can I implement to avoid QRPs?

Defining the Spectrum: From Misconduct to Honest Error

Troubleshooting Guide: Common QRPs and Solutions

FAQ: What are the most common QRPs and how can I avoid them?

Quantitative Data on QRP Prevalence

The Scientist's Toolkit: Frameworks for Improved Research Practices

Research Question Formulation: The PICO & FINER Frameworks

Research Reagent Solutions: Essential Tools for Research Integrity

Troubleshooting Guide: Identifying Common QRPs

p-Hacking (Data Dredging)

HARKing (Hypothesizing After Results are Known)

Selective Reporting (Cherry-Picking)

Quantitative Evidence on QRP Prevalence

Frequently Asked Questions (FAQs)

Research Reagent Solutions

Frequently Asked Questions (FAQs)

Troubleshooting Guide: Diagnosing and Fixing QRPs in Your Lab

Problem 1: Selective Reporting and the File Drawer Effect

Problem 2: P-hacking and Data Dredging

Problem 3: HARKing (Hypothesizing After the Results are Known)

The Scientist's Toolkit: Key Reagent Solutions for Rigorous Research

Visualizing the Pathway from QRPs to the Replication Crisis

FAQ: Quantitative Methods in QRPs Research

What are the primary quantitative research designs for studying QRPs prevalence?

What are common questionable research practices (QRPs) I should measure?

How can I ensure the validity of my survey on sensitive QRPs?

What improved research practices can I recommend as alternatives to QRPs?

Troubleshooting Guides

Issue: Low Response Rates on QRPs Surveys

Issue: Designing a Methodologically Sound Prevalence Study

Issue: Differentiating Between Correlation and Causation in Findings

The Scientist's Toolkit: Research Reagent Solutions for QRPs Identification Research

Technical Support Center: QRP Identification & Resolution

Frequently Asked Questions (FAQs)

Troubleshooting Guides

Quantitative Data on QRPs and Solutions

Experimental Protocols for QRP Identification Research

Visualizing the Academic Incentive Structure and Solutions

The Scientist's Toolkit: Essential Reagents for Rigorous Research

The Detection Toolkit: Methodologies and Techniques for Identifying QRPs in Research

Troubleshooting Guides

Guide 1: Handling Between-Study Inconsistency in Meta-Analysis

Guide 2: Identifying Biologically Implausible Values in Longitudinal Data

Guide 3: Detecting Implausible Records in Categorical Data

Frequently Asked Questions (FAQs)

Data Detective Toolkit: Methods and Reagents

Table 1: Statistical Tools for Detecting Data Anomalies

Table 2: Key Concepts in Questionable Research Practices (QRPs) Identification

Experimental Protocols for Data Detection

Protocol 1: Implementing a Hybrid Test for Inconsistency in Meta-Analysis

Protocol 2: Workflow for Identifying Implausible Electronic Health Records

Research Reagent Solutions

Table 3: Essential Analytical "Reagents" for Data Detection

Frequently Asked Questions (FAQs)

Troubleshooting Guides

GRIM Test Implementation

SPRITE Test Implementation

p-Curve and Z-Curve Implementation

Analytical Workflows

The Scientist's Toolkit: Research Reagent Solutions

Troubleshooting Guide: FAQs on AI-Driven Journal Screening

Experimental Protocols for AI-Based Screening

Performance Data & Research Reagents

Signaling Pathway for Journal Screening

Common QRPs in SCEDs: Identification and Examples

Evidence for QRPs in SCED Research

Detection Methodologies for SCED QRPs

Analytical Framework for QRP Detection

Experimental Protocol for Systematic QRP Detection

Troubleshooting Guide: FAQs on QRP Detection