Critical Appraisal
based on –
What is Critical Appraisal?
- Critical appraisal means carefully and systematically checking a research study to judge:
- Trustworthiness: Is the study accurate and valid?
- Value: Are the results useful?
- Relevance: Are the results meaningful for the patient or situation you’re dealing with?
- It is a key skill for evidence-based medicine (EBM) because it helps clinicians:
- Find good-quality research.
- Use evidence properly to make better clinical decisions.
Why is Reliable Research So Important?
- Clinicians and patients need trustworthy information to make the best healthcare choices.
- Research involves collecting and analysing data to create new knowledge.
- Problem: Not all research is good — some studies are badly designed, biased, or misleading.
- Risk: Poor research can lead to wrong decisions that may harm patients.
What is the Purpose of Critical Appraisal?
Critical appraisal helps you:
- Check if the study was conducted properly (design and methods).
- Understand what the study results actually mean.
- Judge if the study is relevant for your particular patient or clinical question.
How to Recognise Reliable Studies
- Claims like “Clinical tests have shown…” are everywhere — especially in advertisements and media.
- Before believing these claims:
- Check how the study was done.
- Ask if bias might have influenced the results.
Simple Example: EverYoung Product Study
- A company claims, “9 out of 10 women say EverYoung makes their skin firmer.“
- But:
- Study A: They surveyed women already buying EverYoung (who likely already believe it works) → Selection bias → Overestimates how well it works.
- Study B: They asked a random sample of women to try EverYoung → Less bias, results are more reliable.
- Main Lesson:
- Good studies are designed to avoid bias.
- Poorly designed studies may look convincing but can mislead us.
In Short
- Critical appraisal protects against being misled by poor research.
- It ensures that clinical decisions are based on truthful, high-quality evidence — not marketing hype or biased studies.
What is Bias?
- Bias: Systematic deviation from the truth due to the way a study is conducted, analysed, or reported.
- Bias can mislead interpretation of research findings.
Common sources of bias include:
1. Selection Bias
- What it is:
- Happens when the people chosen for a study are not truly representative of the general population.
- Groups might already be different before the treatment even starts.
- Example:
- A study testing a new asthma drug only includes patients who regularly attend a specialist clinic (who are often healthier and more motivated than average).
- Result: The treatment looks better than it really is, because the participants were already doing well.
2. Performance Bias
- What it is:
- Happens when people in different groups receive different care, aside from the treatment being tested.
- Example:
- In a trial of a new diabetes tablet, the treatment group gets more frequent nurse follow-ups than the control group.
- Result: Any improvement might be partly due to better follow-up, not just the new drug.
3. Detection Bias
- What it is:
- Happens when the way outcomes are measured differs between groups.
- Particularly a risk when outcome assessors are not blinded.
- Example:
- In a trial of a new back pain treatment, if physiotherapists know which patients received the new therapy, they may (even unconsciously) rate their pain scores lower.
- Result: Biased outcome measurements favouring the new treatment.
4. Attrition Bias
- What it is:
- Happens when a lot of participants drop out of the study — especially if more drop out from one group than the other.
- Loss to follow-up can distort the results.
- Example:
- In a weight-loss study, more participants drop out from the diet group than the control group because the diet was too hard.
- Result: It might seem like the diet was more effective — but that’s because those who failed left the study and were not counted.
5. Reporting Bias
- What it is:
- Happens when only some outcomes are reported — usually the positive ones — while negative or neutral findings are hidden.
- Example:
- A trial on a new antidepressant reports that it improved mood — but quietly ignores that it also caused significant weight gain and sleep problems (which were measured but not reported).
- Result: The study paints an overly positive picture of the drug.
Summary Tip
Bias Type | Key Danger |
---|---|
Selection Bias | Groups aren’t fairly comparable at start. |
Performance Bias | Extra care (other than intervention) changes results. |
Detection Bias | Outcome measurement unfairly favours one group. |
Attrition Bias | Loss of participants distorts group comparisons. |
Reporting Bias | Only “good news” outcomes are published. |
(Refer to CONSORT guidelines for a full discussion.)
Internal Validity
What is Internal Validity?
- Internal validity means:
➔ How well a study was done, and
➔ Whether we can trust that the results are actually true for the people studied — not distorted by bias or errors. - If a study has minimal bias, it is said to have high internal validity.
Important Points to Understand
- No study is perfect:
- Every study has some risk of bias.
- The goal is not to find a “perfect” study but to decide:
- Have the researchers done enough to reduce bias?
- Are any remaining biases small enough that they wouldn’t change the main conclusion?
- Critical appraisal asks:
- Was the study designed carefully?
- Were methods (like randomisation, blinding, and complete follow-up) done properly?
- Is there any major bias that could explain the results instead of the intervention?
Simple Clinical Example
- Suppose a new blood pressure pill is tested.
- If:
- Patients were properly randomised,
- Researchers were blinded,
- Few patients dropped out,
- ➔ Then we can be confident that any improvement in blood pressure is really due to the pill — not because of unfair differences between groups.
This means the study has high internal validity.
In Short
Internal Validity Means… | Why It Matters |
---|---|
Trusting that the study results are accurate for the group studied. | If internal validity is low, you can’t trust the findings — even if the study looks impressive. |
Making sure bias is minimised as much as possible. | High internal validity = high-quality evidence for making clinical decisions. |
Signs of High vs Low Internal Validity
Feature | High Internal Validity (Good Study) | Low Internal Validity (Problematic Study) |
---|---|---|
Randomisation | True randomisation (e.g., computer-generated) used; unpredictable allocation. | No true randomisation; predictable allocation (e.g., odd/even dates, names). |
Allocation Concealment | Allocation hidden from those enrolling participants. | Allocation known or easily guessed (e.g., open list, transparent envelopes). |
Baseline Comparability | Groups similar at the start (age, sex, disease severity). | Groups different at baseline (e.g., one group sicker or older). |
Blinding (Masking) | Patients, staff, and outcome assessors are blinded. | No blinding; participants or assessors know who received treatment. |
Loss to Follow-Up | Minimal dropouts; similar loss in both groups; reasons explained. | High or unequal dropout rates; missing data not explained. |
Intention-to-Treat (ITT) Analysis | Participants analysed in the groups to which they were randomised. | Participants analysed according to the treatment they actually received (“as-treated” analysis). |
Complete Outcome Reporting | All important outcomes reported as planned. | Some outcomes not reported or only favourable outcomes published. |
1-Minute Internal Validity Checklist
(Answer Yes / No / Unsure as you appraise)
Question | Yes / No / Unsure |
---|---|
🔲 Was the study population randomly assigned to groups? | |
🔲 Was allocation concealed from those enrolling participants? | |
🔲 Were the groups similar at the start (baseline characteristics)? | |
🔲 Were patients, healthcare providers, and outcome assessors blinded where possible? | |
🔲 Were losses to follow-up minimal and similar between groups? | |
🔲 Was an intention-to-treat (ITT) analysis performed? | |
🔲 Were all pre-specified outcomes reported (no selective reporting)? |
🚨 Fast Interpretation Guide:
- All or nearly all “Yes” ➔ Study likely has high internal validity — findings are trustworthy.
- Several “No” or “Unsure” ➔ Be cautious — study may have serious biases that affect the results.
- Early screening:
- If randomisation, allocation concealment, or blinding are missing ➔ Major concerns even before considering results.
Different Types of Research Questions and Study Designs
- Types of Clinical Questions:
- Aetiology: What caused the illness?
- Diagnosis: What does this test result mean in this patient?
- Prognosis: What is likely to happen to this patient?
- Harm: Is exposure to a substance harmful?
- Effectiveness: Does the treatment help?
- Qualitative: What outcomes matter most to patients?
- Appropriate Study Designs for Each Question:
- Qualitative studies: Best for exploring patients’ experiences and values.
- Randomised controlled trials (RCTs): Best for assessing the effectiveness of interventions.
- Cross-sectional surveys: Useful for estimating prevalence of conditions.
- Inception cohort studies: Necessary for prognosis questions (following newly diagnosed patients over time).
- Hierarchy of Evidence:
- Not all research designs are equally strong at finding the truth.
- Some designs are better at avoiding bias and random error.
- Randomised Controlled Trials (RCTs) generally have higher validity for determining treatment effectiveness compared to case series, qualitative studies, or anecdotes.
- Subjective reports and qualitative findings are inappropriate for assessing effectiveness:
- Subjective reports (e.g., personal testimonials) rely heavily on individual perception, which can be influenced by bias, placebo effects, or external factors.
- Qualitative studies are valuable for understanding patient experiences, feelings, and priorities — but not suitable for objectively measuring whether a treatment actually works.
- Effectiveness requires objective, quantifiable measurement — best done through controlled experimental designs like RCTs.
- Historical Example – Radithor®:
- Radithor® was a “health tonic” containing radium, marketed in the 1920s.
- People believed (subjectively) that it improved vitality and health.
- Lack of proper scientific evaluation led to widespread use.
- Ultimately caused fatal radiation poisoning — most famously killing Eben Byers, whose jaw deteriorated (“The Radium Water Worked Fine Until His Jaw Came Off” — Wall Street Journal, 1932).
- Lesson: Subjective enthusiasm without rigorous evaluation can cause serious harm.
Hierarchy of Evidence for Effectiveness (Pyramid)
Level | Type of Evidence | Clinical Strength |
---|---|---|
1 (Top) | Systematic Reviews and Meta-Analyses of RCTs | Highest level of evidence. Combines results from multiple RCTs to give overall estimate of effect. |
2 | Randomised Controlled Trials (RCTs) | Gold standard for assessing treatment effectiveness. Minimise bias via randomisation and blinding. |
3 | Cohort Studies (Prospective or Retrospective) | Can identify associations between exposures and outcomes. Weaker than RCTs due to potential confounding. |
4 | Case-Control Studies | Useful for rare diseases. Greater susceptibility to bias (especially recall and selection bias). |
5 | Cross-Sectional Studies | Measure prevalence, not causality. Limited for treatment effectiveness. |
6 | Case Series and Case Reports | Descriptive only. No control group. Useful for hypothesis generation but not for proving effectiveness. |
7 (Bottom) | Expert Opinion / Anecdote / Qualitative Studies | Highly subjective. Valuable for patient perspectives and identifying outcomes that matter but NOT for determining if a treatment works. |
Study Design and Bias
- Choosing the Correct Study Design:
- Critical to match study design to the research question.
- Different designs are prone to different biases.
- Critical Appraisal Essentials:
- Step 1: Check if the correct study design was used for the research question.
- Step 2: Assess whether researchers have minimised biases associated with that design.
Critical Appraisal Skills Programme (CASP)
- What is CASP?
- CASP (Critical Appraisal Skills Programme) is an evidence-based toolset.
- It provides structured checklists to help systematically assess the quality of research papers.
- Purpose of CASP Tools:
- Help clinicians, researchers, and students critically appraise different types of studies (e.g., RCTs, cohort studies, qualitative studies).
- Improve decision-making by focusing on whether a study is trustworthy, meaningful, and applicable.
- CASP Focus Areas for Appraisal:
- Validity:
- Was the study conducted in a way that minimised bias?
- Results:
- Are the results statistically and clinically meaningful?
- Clinical Relevance:
- Are the findings applicable to the specific patient population or clinical context?
- Validity:
- Screening Questions (Key Initial Step):
- The first two questions on any CASP checklist are designed to quickly detect major flaws.
- Examples of screening questions:
- Was the study question clearly focused?
- Was the study methodologically sound (e.g., randomisation, blinding)?
- If the answer to either is ‘No’, the study may be fatally flawed → not worth spending time reading further.
Why CASP is Important Clinically:
- Saves time by filtering out poor-quality studies early.
- Ensures that only high-quality evidence influences patient care decisions.
- Reduces the risk of applying biased, invalid, or non-generalizable research in practice.
Summary:
CASP = a structured, reliable way to judge:
Is this study valid?
- Are the results reliable?
- Is it relevant to my patient?
Simplified CASP RCT Checklist
1. Are the results valid? (Assess study quality)
Question | Yes / No / Unsure | Notes |
---|---|---|
Did the study address a clearly focused research question? | ||
Was the assignment of participants to groups truly random (randomisation)? | ||
Was allocation to groups concealed from those enrolling participants? | ||
Were the groups similar at baseline (e.g., age, disease severity)? | ||
Were participants, staff, and outcome assessors blinded? | ||
Was follow-up complete and were all patients analysed in the groups to which they were randomised (intention-to-treat analysis)? |
2. What are the results? (Assess findings)
Question | Yes / No / Unsure | Notes |
---|---|---|
How large is the treatment effect? (e.g., Relative Risk, Risk Difference, NNT) | ||
How precise is the estimate? (e.g., 95% Confidence Interval narrow, not crossing 1) |
3. Will the results help locally? (Assess clinical relevance)
Question | Yes / No / Unsure | Notes |
---|---|---|
Can the results be applied to my patient population? | ||
Were all important outcomes considered (including harms)? | ||
Are the benefits worth the potential harms and costs? |
✅ How to Use It Quickly
- First two screening questions: If either is “No” → consider abandoning the study.
- Then work through systematically if valid.
- Always finish by considering whether the results are clinically applicable to your patient or setting.
🔥 Bonus Tip: “RED FLAGS” to stop appraising early
- No proper randomisation or concealment ➔ major bias risk.
- Major losses to follow-up ➔ unreliable results.
- No blinding with subjective outcomes ➔ detection/performance bias risk.
- Inappropriate control group ➔ unfair comparisons.
Effectiveness Studies – Randomised Controlled Trials (RCTs)
Why Evaluating Treatments is Challenging
- Many illnesses can improve on their own over time (natural recovery).
- If we simply observe that a patient improves after a treatment, we cannot be sure whether the improvement was due to the treatment or would have happened anyway.
- Therefore, comparison groups are essential to properly test the effectiveness of any treatment.
Key Features of a Valid RCT
Controlled Design
- An RCT compares outcomes between:
- An intervention group (receives the treatment).
- A control group (receives placebo, standard treatment, or no treatment).
- Purpose: To isolate the true effect of the intervention.
Randomisation
- Patients are allocated randomly to the intervention or control group.
- Purpose:
- Prevents selection bias (i.e., clinicians unconsciously or consciously placing sicker or healthier patients into a group).
- Ensures groups are comparable at the start of the study.
Allocation Concealment
- The method by which the assignment of patients to intervention or control is kept hidden from the people enrolling participants.
- Purpose:
- Prevents researchers from influencing which patients enter which group.
- Protects the integrity of randomisation.
Baseline Comparability
- The groups must have similar characteristics at the start (e.g., age, sex, disease severity).
- Purpose:
- Ensures that differences in outcomes are due to the treatment — not differences in who was recruited.
Blinding (Masking)
- Keeping participants, healthcare staff, and outcome assessors unaware of group assignment.
- Purpose:
- Patients: Blinding reduces placebo effects.
- Staff: Blinding prevents unequal treatment or care between groups.
- Outcome assessors: Blinding reduces detection bias when measuring results.
Attrition Monitoring
- Participants lost to follow-up can introduce bias if more people are lost from one group.
- Good practice:
- Minimise dropouts.
- Ensure losses are balanced between groups.
Intention-to-Treat (ITT) Analysis
- Patients are analysed in the group to which they were originally randomised, even if they did not complete the treatment.
- Purpose:
- Preserves the benefits of randomisation.
- Reflects real-world treatment effects, including adherence issues.
Role of the CASP RCT Checklist
- CASP (Critical Appraisal Skills Programme) provides a structured method to appraise RCTs.
- Focus areas:
- Was randomisation adequate?
- Was allocation concealed?
- Were groups comparable at baseline?
- Was blinding implemented?
- Was loss to follow-up minimised?
- Was ITT analysis used?
- Overall goal: Confirm that the RCT has high internal validity (i.e., results are trustworthy).
Study Design, Associated Questions, and Common Biases
Study Design | Used for | Common Biases to Watch For |
---|---|---|
Randomised Controlled Trial (RCT) | – Effectiveness (Intervention studies) | – Selection bias (improper randomisation) – Allocation bias (no concealment) – Performance bias (different care apart from intervention) – Detection bias (unblinded outcome assessment) – Attrition bias (loss to follow-up) – Reporting bias (selective outcomes) |
Cohort Study (Prospective or Retrospective) | – Aetiology (causation) – Harm (exposure-outcome) – Prognosis (if inception cohort) | – Confounding bias (unmeasured variables) – Selection bias (especially in retrospective cohorts) – Attrition bias (losses over time) |
Case-Control Study | – Aetiology (causation) – Harm (assessing rare exposures or outcomes) | – Recall bias (differential memory of exposure) – Selection bias (controls not representative) – Confounding |
Cross-Sectional Study | – Diagnosis (test accuracy studies) – Prevalence (disease frequency) | – Spectrum bias (non-representative patient sample) – Verification bias (if only positive tests verified) – Observer bias |
Qualitative Study (e.g., interviews, focus groups) | – Exploring patient experiences – Identifying important outcomes | – Researcher bias (subjectivity in interpretation) – Selection bias (non-representative participants) – Reflexivity bias (failure to ad |
Quick Clinical Reminder
If you see… | Think about… |
---|---|
Small, single-centre trial with no blinding | Risk of selection bias, performance bias, and detection bias. |
Meta-analysis including only small positive studies | Risk of publication bias (positive studies more likely published). |
Case report suggesting a “miracle cure” | Anecdotal bias — strong need for controlled evidence (e.g., RCTs). |
Large RCT with proper blinding, low dropout, intention-to-treat (ITT) analysis | High internal validity — strong, reliable evidence. |
Extra Tip for Appraising by Study Design
Study Design | Essential Validity Checks |
---|---|
Randomised Controlled Trial (RCT) | Was randomisation adequate? Was allocation concealment maintained? Were groups comparable at baseline? Was blinding (participants, staff, outcome assessors) done? Was intention-to-treat (ITT) analysis used? |
Cohort Study | Was exposure clearly defined before outcome? Were comparison groups similar? Was follow-up complete and sufficiently long? |
Case-Control Study | Were cases and controls drawn from the same population? Was exposure measured the same way for both groups? Was recall bias addressed? |
Cross-Sectional Study | Was the sample representative? Was the diagnostic test compared to a true gold standard? Was blinding used in outcome assessment? |
Qualitative Study | Was the sampling strategy appropriate? Was the data analysis transparent and rigorous? Was researcher influence (reflexivity) considered and addressed? |
Results Interpretation
- Only interpret results after confirming study validity (good design and minimal bias).
Common Result Measures:
- Relative Risk (RR):
- Ratio of event probability in intervention group vs control group.
- Odds Ratio (OR):
- Ratio of odds of event happening vs not happening between groups.
- RR or OR = 1:
- No difference between intervention and control.
- RR or OR > 1:
- Event more frequent in intervention group.
- Good if the outcome is desirable (e.g., smoking cessation).
- Bad if the outcome is undesirable (e.g., death).
- RR or OR < 1:
- Event less frequent in intervention group.
Absolute Measures:
- Risk Difference (RD):
- Difference in event rates between intervention and control groups.
- Number Needed to Treat (NNT):
- Number of patients needed to treat to achieve one extra good outcome.
- Calculated as 1 / absolute risk reduction (ARR).
Understanding Uncertainty: Confidence Intervals and P-Values
- Confidence Interval (CI):
- Range within which the true effect likely lies, usually with 95% certainty.
- Narrow CIs suggest greater precision.
- P-value:
- Probability of observing the result by chance if no true effect exists.
- P < 0.05 often considered “statistically significant” but does not imply clinical importance.
Clinical Relevance of Results
- Applicability to Patient or Population:
- Consider differences between study participants and the clinical population.
- Evaluation of All Outcomes:
- Important outcomes must be measured (e.g., survival, not just symptom control).
- Beware of selective outcome reporting.
- Weighing Benefits vs Harms:
- Consider net clinical benefit:
- e.g., anticoagulation may prevent strokes but increase bleeding risk.
- Decision depends on the balance of benefit vs harm.
- Consider net clinical benefit:
- Cost Considerations:
- Trials often don’t report cost-effectiveness.
- Simple estimation: Cost per benefit = Cost per patient × NNT.
Systematic Reviews
- Preferred Evidence Source:
- Summarise all available evidence systematically.
- More reliable than a single study.
- Use quality-assured, up-to-date systematic reviews when available.
- CASP checklist is available for systematic review appraisal.
Final Framework for Reading Research
Always assess three core areas:
Area | Key Questions |
---|---|
Validity | Was the study conducted to minimise bias? |
Results | What did the study find (and is it statistically and clinically meaningful)? |
Relevance | Are the findings applicable to the patient or clinical situation? |
Clinical Research Statistical Terms Cheat Sheet
Term | Meaning | How to Interpret | Clinical Importance |
---|---|---|---|
Relative Risk (RR) | Ratio of probability of event in intervention group vs control group. | RR = 1: no difference RR > 1: event more likely with intervention RR < 1: event less likely with intervention | Measures how much more (or less) likely an outcome is with treatment. |
Odds Ratio (OR) | Ratio of the odds of an event occurring in intervention vs control group. | OR = 1: no difference OR > 1: higher odds with intervention OR < 1: lower odds with intervention | Commonly used in case-control studies; approximates RR when event is rare. |
Risk Difference (RD) (also called Absolute Risk Reduction [ARR]) | Difference in event rates between groups (Intervention risk – Control risk). | RD > 0: intervention increases risk RD < 0: intervention reduces risk | Tells absolute change in risk; important for real-world clinical impact. |
Number Needed to Treat (NNT) | Number of patients who need to be treated to prevent one additional bad outcome (or achieve one good outcome). | NNT = 1/ARR Lower NNT = more effective treatment. | Helps clinicians assess how worthwhile a treatment is; smaller NNT = better. |
Confidence Interval (CI) | Range of values within which the true result likely lies, usually with 95% certainty. | Narrow CI: precise estimate CI crossing 1 (for RR or OR): not statistically significant | Indicates reliability and precision of the result. |
P-value | Probability that the observed results occurred by chance if there is no real effect. | P < 0.05: commonly considered statistically significant Lower P = less likely due to chance | Statistical significance, but not automatically clinical significance. |
Quick Examples:
- RR = 0.75 → Treatment reduces risk by 25%.
- OR = 2.0 → Odds of event are twice as high in the intervention group.
- RD = -0.10 → 10% absolute reduction in risk with treatment.
- NNT = 10 → Need to treat 10 patients to prevent 1 adverse event.
- 95% CI for RR = 0.60–0.90 → Suggests consistent risk reduction; statistically significant (doesn’t cross 1).
- P = 0.03 → 3% chance results are due to random variation; considered statistically significant.