STATISTICS

Critical Appraisal

based on –

What is Critical Appraisal?

  • Critical appraisal means carefully and systematically checking a research study to judge:
    • Trustworthiness: Is the study accurate and valid?
    • Value: Are the results useful?
    • Relevance: Are the results meaningful for the patient or situation you’re dealing with?
  • It is a key skill for evidence-based medicine (EBM) because it helps clinicians:
    • Find good-quality research.
    • Use evidence properly to make better clinical decisions.

Why is Reliable Research So Important?

  • Clinicians and patients need trustworthy information to make the best healthcare choices.
  • Research involves collecting and analysing data to create new knowledge.
  • Problem: Not all research is good — some studies are badly designed, biased, or misleading.
  • Risk: Poor research can lead to wrong decisions that may harm patients.

What is the Purpose of Critical Appraisal?

Critical appraisal helps you:

  • Check if the study was conducted properly (design and methods).
  • Understand what the study results actually mean.
  • Judge if the study is relevant for your particular patient or clinical question.

How to Recognise Reliable Studies

  • Claims like “Clinical tests have shown…” are everywhere — especially in advertisements and media.
  • Before believing these claims:
    • Check how the study was done.
    • Ask if bias might have influenced the results.

Simple Example: EverYoung Product Study

  • A company claims, “9 out of 10 women say EverYoung makes their skin firmer.
  • But:
    • Study A: They surveyed women already buying EverYoung (who likely already believe it works) → Selection bias → Overestimates how well it works.
    • Study B: They asked a random sample of women to try EverYoung → Less bias, results are more reliable.
  • Main Lesson:
    • Good studies are designed to avoid bias.
    • Poorly designed studies may look convincing but can mislead us.

In Short

  • Critical appraisal protects against being misled by poor research.
  • It ensures that clinical decisions are based on truthful, high-quality evidence — not marketing hype or biased studies.


What is Bias?

  • Bias: Systematic deviation from the truth due to the way a study is conducted, analysed, or reported.
  • Bias can mislead interpretation of research findings.

Common sources of bias include:

1. Selection Bias

  • What it is:
    • Happens when the people chosen for a study are not truly representative of the general population.
    • Groups might already be different before the treatment even starts.
  • Example:
    • A study testing a new asthma drug only includes patients who regularly attend a specialist clinic (who are often healthier and more motivated than average).
    • Result: The treatment looks better than it really is, because the participants were already doing well.

2. Performance Bias

  • What it is:
    • Happens when people in different groups receive different care, aside from the treatment being tested.
  • Example:
    • In a trial of a new diabetes tablet, the treatment group gets more frequent nurse follow-ups than the control group.
    • Result: Any improvement might be partly due to better follow-up, not just the new drug.

3. Detection Bias

  • What it is:
    • Happens when the way outcomes are measured differs between groups.
    • Particularly a risk when outcome assessors are not blinded.
  • Example:
    • In a trial of a new back pain treatment, if physiotherapists know which patients received the new therapy, they may (even unconsciously) rate their pain scores lower.
    • Result: Biased outcome measurements favouring the new treatment.

4. Attrition Bias

  • What it is:
    • Happens when a lot of participants drop out of the study — especially if more drop out from one group than the other.
    • Loss to follow-up can distort the results.
  • Example:
    • In a weight-loss study, more participants drop out from the diet group than the control group because the diet was too hard.
    • Result: It might seem like the diet was more effective — but that’s because those who failed left the study and were not counted.

5. Reporting Bias

  • What it is:
    • Happens when only some outcomes are reported — usually the positive ones — while negative or neutral findings are hidden.
  • Example:
    • A trial on a new antidepressant reports that it improved mood — but quietly ignores that it also caused significant weight gain and sleep problems (which were measured but not reported).
    • Result: The study paints an overly positive picture of the drug.

Summary Tip

Bias TypeKey Danger
Selection BiasGroups aren’t fairly comparable at start.
Performance BiasExtra care (other than intervention) changes results.
Detection BiasOutcome measurement unfairly favours one group.
Attrition BiasLoss of participants distorts group comparisons.
Reporting BiasOnly “good news” outcomes are published.

(Refer to CONSORT guidelines for a full discussion.)


Internal Validity

    What is Internal Validity?

    • Internal validity means:
      How well a study was done, and
      Whether we can trust that the results are actually true for the people studied — not distorted by bias or errors.
    • If a study has minimal bias, it is said to have high internal validity.

    Important Points to Understand

    • No study is perfect:
      • Every study has some risk of bias.
      • The goal is not to find a “perfect” study but to decide:
        • Have the researchers done enough to reduce bias?
        • Are any remaining biases small enough that they wouldn’t change the main conclusion?
    • Critical appraisal asks:
      • Was the study designed carefully?
      • Were methods (like randomisation, blinding, and complete follow-up) done properly?
      • Is there any major bias that could explain the results instead of the intervention?

    Simple Clinical Example

    • Suppose a new blood pressure pill is tested.
    • If:
      • Patients were properly randomised,
      • Researchers were blinded,
      • Few patients dropped out,
    • ➔ Then we can be confident that any improvement in blood pressure is really due to the pill — not because of unfair differences between groups.

    This means the study has high internal validity.

    In Short

    Internal Validity Means…Why It Matters
    Trusting that the study results are accurate for the group studied.If internal validity is low, you can’t trust the findings — even if the study looks impressive.
    Making sure bias is minimised as much as possible.High internal validity = high-quality evidence for making clinical decisions.

    Signs of High vs Low Internal Validity
    FeatureHigh Internal Validity (Good Study)Low Internal Validity (Problematic Study)
    RandomisationTrue randomisation (e.g., computer-generated) used; unpredictable allocation.No true randomisation; predictable allocation (e.g., odd/even dates, names).
    Allocation ConcealmentAllocation hidden from those enrolling participants.Allocation known or easily guessed (e.g., open list, transparent envelopes).
    Baseline ComparabilityGroups similar at the start (age, sex, disease severity).Groups different at baseline (e.g., one group sicker or older).
    Blinding (Masking)Patients, staff, and outcome assessors are blinded.No blinding; participants or assessors know who received treatment.
    Loss to Follow-UpMinimal dropouts; similar loss in both groups; reasons explained.High or unequal dropout rates; missing data not explained.
    Intention-to-Treat (ITT) AnalysisParticipants analysed in the groups to which they were randomised.Participants analysed according to the treatment they actually received (“as-treated” analysis).
    Complete Outcome ReportingAll important outcomes reported as planned.Some outcomes not reported or only favourable outcomes published.

    1-Minute Internal Validity Checklist

    (Answer Yes / No / Unsure as you appraise)

    QuestionYes / No / Unsure
    🔲 Was the study population randomly assigned to groups?
    🔲 Was allocation concealed from those enrolling participants?
    🔲 Were the groups similar at the start (baseline characteristics)?
    🔲 Were patients, healthcare providers, and outcome assessors blinded where possible?
    🔲 Were losses to follow-up minimal and similar between groups?
    🔲 Was an intention-to-treat (ITT) analysis performed?
    🔲 Were all pre-specified outcomes reported (no selective reporting)?

    🚨 Fast Interpretation Guide:

    • All or nearly all “Yes” ➔ Study likely has high internal validity — findings are trustworthy.
    • Several “No” or “Unsure”Be cautious — study may have serious biases that affect the results.
    • Early screening:
      • If randomisation, allocation concealment, or blinding are missing ➔ Major concerns even before considering results.

    Different Types of Research Questions and Study Designs

    • Types of Clinical Questions:
      • Aetiology: What caused the illness?
      • Diagnosis: What does this test result mean in this patient?
      • Prognosis: What is likely to happen to this patient?
      • Harm: Is exposure to a substance harmful?
      • Effectiveness: Does the treatment help?
      • Qualitative: What outcomes matter most to patients?
    • Appropriate Study Designs for Each Question:
      • Qualitative studies: Best for exploring patients’ experiences and values.
      • Randomised controlled trials (RCTs): Best for assessing the effectiveness of interventions.
      • Cross-sectional surveys: Useful for estimating prevalence of conditions.
      • Inception cohort studies: Necessary for prognosis questions (following newly diagnosed patients over time).
    • Hierarchy of Evidence:
      • Not all research designs are equally strong at finding the truth.
      • Some designs are better at avoiding bias and random error.
      • Randomised Controlled Trials (RCTs) generally have higher validity for determining treatment effectiveness compared to case series, qualitative studies, or anecdotes.
      • Subjective reports and qualitative findings are inappropriate for assessing effectiveness:
        • Subjective reports (e.g., personal testimonials) rely heavily on individual perception, which can be influenced by bias, placebo effects, or external factors.
        • Qualitative studies are valuable for understanding patient experiences, feelings, and priorities — but not suitable for objectively measuring whether a treatment actually works.
        • Effectiveness requires objective, quantifiable measurement — best done through controlled experimental designs like RCTs.
      • Historical Example – Radithor®:
        • Radithor® was a “health tonic” containing radium, marketed in the 1920s.
        • People believed (subjectively) that it improved vitality and health.
        • Lack of proper scientific evaluation led to widespread use.
        • Ultimately caused fatal radiation poisoning — most famously killing Eben Byers, whose jaw deteriorated (“The Radium Water Worked Fine Until His Jaw Came Off” — Wall Street Journal, 1932).
        • Lesson: Subjective enthusiasm without rigorous evaluation can cause serious harm.

    Hierarchy of Evidence for Effectiveness (Pyramid)

    LevelType of EvidenceClinical Strength
    1 (Top)Systematic Reviews and Meta-Analyses of RCTsHighest level of evidence. Combines results from multiple RCTs to give overall estimate of effect.
    2Randomised Controlled Trials (RCTs)Gold standard for assessing treatment effectiveness. Minimise bias via randomisation and blinding.
    3Cohort Studies (Prospective or Retrospective)Can identify associations between exposures and outcomes. Weaker than RCTs due to potential confounding.
    4Case-Control StudiesUseful for rare diseases. Greater susceptibility to bias (especially recall and selection bias).
    5Cross-Sectional StudiesMeasure prevalence, not causality. Limited for treatment effectiveness.
    6Case Series and Case ReportsDescriptive only. No control group. Useful for hypothesis generation but not for proving effectiveness.
    7 (Bottom)Expert Opinion / Anecdote / Qualitative StudiesHighly subjective. Valuable for patient perspectives and identifying outcomes that matter but NOT for determining if a treatment works.

    Study Design and Bias

    • Choosing the Correct Study Design:
      • Critical to match study design to the research question.
      • Different designs are prone to different biases.
    • Critical Appraisal Essentials:
      • Step 1: Check if the correct study design was used for the research question.
      • Step 2: Assess whether researchers have minimised biases associated with that design.

    Critical Appraisal Skills Programme (CASP)

      • What is CASP?
        • CASP (Critical Appraisal Skills Programme) is an evidence-based toolset.
        • It provides structured checklists to help systematically assess the quality of research papers.
      • Purpose of CASP Tools:
        • Help clinicians, researchers, and students critically appraise different types of studies (e.g., RCTs, cohort studies, qualitative studies).
        • Improve decision-making by focusing on whether a study is trustworthy, meaningful, and applicable.
      • CASP Focus Areas for Appraisal:
        1. Validity:
          • Was the study conducted in a way that minimised bias?
        2. Results:
          • Are the results statistically and clinically meaningful?
        3. Clinical Relevance:
          • Are the findings applicable to the specific patient population or clinical context?
      • Screening Questions (Key Initial Step):
        • The first two questions on any CASP checklist are designed to quickly detect major flaws.
        • Examples of screening questions:
          • Was the study question clearly focused?
          • Was the study methodologically sound (e.g., randomisation, blinding)?
        • If the answer to either is ‘No’, the study may be fatally flawed → not worth spending time reading further.

      Why CASP is Important Clinically:

      • Saves time by filtering out poor-quality studies early.
      • Ensures that only high-quality evidence influences patient care decisions.
      • Reduces the risk of applying biased, invalid, or non-generalizable research in practice.

      Summary:
      CASP = a structured, reliable way to judge:

      Is this study valid?

      • Are the results reliable?
      • Is it relevant to my patient?

      Simplified CASP RCT Checklist

      1. Are the results valid? (Assess study quality)

      QuestionYes / No / UnsureNotes
      Did the study address a clearly focused research question?
      Was the assignment of participants to groups truly random (randomisation)?
      Was allocation to groups concealed from those enrolling participants?
      Were the groups similar at baseline (e.g., age, disease severity)?
      Were participants, staff, and outcome assessors blinded?
      Was follow-up complete and were all patients analysed in the groups to which they were randomised (intention-to-treat analysis)?

      2. What are the results? (Assess findings)

      QuestionYes / No / UnsureNotes
      How large is the treatment effect? (e.g., Relative Risk, Risk Difference, NNT)
      How precise is the estimate? (e.g., 95% Confidence Interval narrow, not crossing 1)

      3. Will the results help locally? (Assess clinical relevance)

      QuestionYes / No / UnsureNotes
      Can the results be applied to my patient population?
      Were all important outcomes considered (including harms)?
      Are the benefits worth the potential harms and costs?

      How to Use It Quickly

      • First two screening questions: If either is “No” → consider abandoning the study.
      • Then work through systematically if valid.
      • Always finish by considering whether the results are clinically applicable to your patient or setting.

      🔥 Bonus Tip: “RED FLAGS” to stop appraising early

      • No proper randomisation or concealment ➔ major bias risk.
      • Major losses to follow-up ➔ unreliable results.
      • No blinding with subjective outcomes ➔ detection/performance bias risk.
      • Inappropriate control group ➔ unfair comparisons.


      Effectiveness Studies – Randomised Controlled Trials (RCTs)

      Why Evaluating Treatments is Challenging

      • Many illnesses can improve on their own over time (natural recovery).
      • If we simply observe that a patient improves after a treatment, we cannot be sure whether the improvement was due to the treatment or would have happened anyway.
      • Therefore, comparison groups are essential to properly test the effectiveness of any treatment.

      Key Features of a Valid RCT

      Controlled Design
      • An RCT compares outcomes between:
        • An intervention group (receives the treatment).
        • A control group (receives placebo, standard treatment, or no treatment).
      • Purpose: To isolate the true effect of the intervention.
      Randomisation
      • Patients are allocated randomly to the intervention or control group.
      • Purpose:
        • Prevents selection bias (i.e., clinicians unconsciously or consciously placing sicker or healthier patients into a group).
        • Ensures groups are comparable at the start of the study.
      Allocation Concealment
      • The method by which the assignment of patients to intervention or control is kept hidden from the people enrolling participants.
      • Purpose:
        • Prevents researchers from influencing which patients enter which group.
        • Protects the integrity of randomisation.
      Baseline Comparability
      • The groups must have similar characteristics at the start (e.g., age, sex, disease severity).
      • Purpose:
        • Ensures that differences in outcomes are due to the treatment — not differences in who was recruited.
      Blinding (Masking)
      • Keeping participants, healthcare staff, and outcome assessors unaware of group assignment.
      • Purpose:
        • Patients: Blinding reduces placebo effects.
        • Staff: Blinding prevents unequal treatment or care between groups.
        • Outcome assessors: Blinding reduces detection bias when measuring results.
      Attrition Monitoring
      • Participants lost to follow-up can introduce bias if more people are lost from one group.
      • Good practice:
        • Minimise dropouts.
        • Ensure losses are balanced between groups.
      Intention-to-Treat (ITT) Analysis
      • Patients are analysed in the group to which they were originally randomised, even if they did not complete the treatment.
      • Purpose:
        • Preserves the benefits of randomisation.
        • Reflects real-world treatment effects, including adherence issues.
      Role of the CASP RCT Checklist
      • CASP (Critical Appraisal Skills Programme) provides a structured method to appraise RCTs.
      • Focus areas:
        • Was randomisation adequate?
        • Was allocation concealed?
        • Were groups comparable at baseline?
        • Was blinding implemented?
        • Was loss to follow-up minimised?
        • Was ITT analysis used?
      • Overall goal: Confirm that the RCT has high internal validity (i.e., results are trustworthy).


      Study Design, Associated Questions, and Common Biases

      Study DesignUsed forCommon Biases to Watch For
      Randomised Controlled Trial (RCT)– Effectiveness (Intervention studies)– Selection bias (improper randomisation)
      – Allocation bias (no concealment)
      – Performance bias (different care apart from intervention)
      – Detection bias (unblinded outcome assessment)
      – Attrition bias (loss to follow-up)
      – Reporting bias (selective outcomes)
      Cohort Study (Prospective or Retrospective)– Aetiology (causation)
      – Harm (exposure-outcome)
      – Prognosis (if inception cohort)
      – Confounding bias (unmeasured variables)
      – Selection bias (especially in retrospective cohorts)
      – Attrition bias (losses over time)
      Case-Control Study– Aetiology (causation)
      – Harm (assessing rare exposures or outcomes)
      – Recall bias (differential memory of exposure)
      – Selection bias (controls not representative)
      – Confounding
      Cross-Sectional Study– Diagnosis (test accuracy studies)
      – Prevalence (disease frequency)
      – Spectrum bias (non-representative patient sample)
      – Verification bias (if only positive tests verified)
      – Observer bias
      Qualitative Study (e.g., interviews, focus groups)– Exploring patient experiences
      – Identifying important outcomes
      – Researcher bias (subjectivity in interpretation)
      – Selection bias (non-representative participants)
      – Reflexivity bias (failure to ad

      Quick Clinical Reminder

      If you see…Think about…
      Small, single-centre trial with no blindingRisk of selection bias, performance bias, and detection bias.
      Meta-analysis including only small positive studiesRisk of publication bias (positive studies more likely published).
      Case report suggesting a “miracle cure”Anecdotal bias — strong need for controlled evidence (e.g., RCTs).
      Large RCT with proper blinding, low dropout, intention-to-treat (ITT) analysisHigh internal validity — strong, reliable evidence.

      Extra Tip for Appraising by Study Design

      Study DesignEssential Validity Checks
      Randomised Controlled Trial (RCT)Was randomisation adequate?
      Was allocation concealment maintained?
      Were groups comparable at baseline?
      Was blinding (participants, staff, outcome assessors) done?
      Was intention-to-treat (ITT) analysis used?
      Cohort StudyWas exposure clearly defined before outcome?
      Were comparison groups similar?
      Was follow-up complete and sufficiently long?
      Case-Control StudyWere cases and controls drawn from the same population?
      Was exposure measured the same way for both groups?
      Was recall bias addressed?
      Cross-Sectional StudyWas the sample representative?
      Was the diagnostic test compared to a true gold standard?
      Was blinding used in outcome assessment?
      Qualitative StudyWas the sampling strategy appropriate?
      Was the data analysis transparent and rigorous?
      Was researcher influence (reflexivity) considered and addressed?


      Results Interpretation

      • Only interpret results after confirming study validity (good design and minimal bias).

      Common Result Measures:

      • Relative Risk (RR):
        • Ratio of event probability in intervention group vs control group.
      • Odds Ratio (OR):
        • Ratio of odds of event happening vs not happening between groups.
      • RR or OR = 1:
        • No difference between intervention and control.
      • RR or OR > 1:
        • Event more frequent in intervention group.
        • Good if the outcome is desirable (e.g., smoking cessation).
        • Bad if the outcome is undesirable (e.g., death).
      • RR or OR < 1:
        • Event less frequent in intervention group.

      Absolute Measures:

      • Risk Difference (RD):
        • Difference in event rates between intervention and control groups.
      • Number Needed to Treat (NNT):
        • Number of patients needed to treat to achieve one extra good outcome.
        • Calculated as 1 / absolute risk reduction (ARR).

      Understanding Uncertainty: Confidence Intervals and P-Values

      • Confidence Interval (CI):
        • Range within which the true effect likely lies, usually with 95% certainty.
        • Narrow CIs suggest greater precision.
      • P-value:
        • Probability of observing the result by chance if no true effect exists.
        • P < 0.05 often considered “statistically significant” but does not imply clinical importance.

      Clinical Relevance of Results

      • Applicability to Patient or Population:
        • Consider differences between study participants and the clinical population.
      • Evaluation of All Outcomes:
        • Important outcomes must be measured (e.g., survival, not just symptom control).
        • Beware of selective outcome reporting.
      • Weighing Benefits vs Harms:
        • Consider net clinical benefit:
          • e.g., anticoagulation may prevent strokes but increase bleeding risk.
          • Decision depends on the balance of benefit vs harm.
      • Cost Considerations:
        • Trials often don’t report cost-effectiveness.
        • Simple estimation: Cost per benefit = Cost per patient × NNT.

      Systematic Reviews

      • Preferred Evidence Source:
        • Summarise all available evidence systematically.
        • More reliable than a single study.
      • Use quality-assured, up-to-date systematic reviews when available.
      • CASP checklist is available for systematic review appraisal.

      Final Framework for Reading Research

      Always assess three core areas:

      AreaKey Questions
      ValidityWas the study conducted to minimise bias?
      ResultsWhat did the study find (and is it statistically and clinically meaningful)?
      RelevanceAre the findings applicable to the patient or clinical situation?

      Clinical Research Statistical Terms Cheat Sheet

      TermMeaningHow to InterpretClinical Importance
      Relative Risk (RR)Ratio of probability of event in intervention group vs control group.RR = 1: no difference
      RR > 1: event more likely with intervention
      RR < 1: event less likely with intervention
      Measures how much more (or less) likely an outcome is with treatment.
      Odds Ratio (OR)Ratio of the odds of an event occurring in intervention vs control group.OR = 1: no difference
      OR > 1: higher odds with intervention
      OR < 1: lower odds with intervention
      Commonly used in case-control studies; approximates RR when event is rare.
      Risk Difference (RD) (also called Absolute Risk Reduction [ARR])Difference in event rates between groups (Intervention risk – Control risk).RD > 0: intervention increases risk
      RD < 0: intervention reduces risk
      Tells absolute change in risk; important for real-world clinical impact.
      Number Needed to Treat (NNT)Number of patients who need to be treated to prevent one additional bad outcome (or achieve one good outcome).NNT = 1/ARR
      Lower NNT = more effective treatment.
      Helps clinicians assess how worthwhile a treatment is; smaller NNT = better.
      Confidence Interval (CI)Range of values within which the true result likely lies, usually with 95% certainty.Narrow CI: precise estimate
      CI crossing 1 (for RR or OR): not statistically significant
      Indicates reliability and precision of the result.
      P-valueProbability that the observed results occurred by chance if there is no real effect.P < 0.05: commonly considered statistically significant
      Lower P = less likely due to chance
      Statistical significance, but not automatically clinical significance.

      Quick Examples:

      • RR = 0.75 → Treatment reduces risk by 25%.
      • OR = 2.0 → Odds of event are twice as high in the intervention group.
      • RD = -0.10 → 10% absolute reduction in risk with treatment.
      • NNT = 10 → Need to treat 10 patients to prevent 1 adverse event.
      • 95% CI for RR = 0.60–0.90 → Suggests consistent risk reduction; statistically significant (doesn’t cross 1).
      • P = 0.03 → 3% chance results are due to random variation; considered statistically significant.

      Leave a Reply

      Your email address will not be published. Required fields are marked *

      This site uses Akismet to reduce spam. Learn how your comment data is processed.