| 
				
					|  | 
							
								| Clinical Epidemiology and Biostatistics: A Primer for Orthopaedic Surgeons Part 1 Mininder S. Kocher MD MPH, David Zurakowski PhD
 CHILDREN'S HOSPITAL ORTHOPAEDIC INSTITUTE FOR CLINICAL EFFECTIVENESS, HARVARD MEDICAL SCHOOL, HARVARD SCHOOL OF PUBLIC HEALTH, BOSTON, MA
 
 Introduction Clinical epidemiology and biostatistics are the basic sciences
									of clinical research. This series of articles will provide a
									basic primer of clinical epidemiology and biostatistics for the
									orthopaedic surgeon. The evidence-based medicine and patient-derived outcomes
									assessment movements burst onto the scene of clinical
									medicine in the 1980s and 1990s as a result of contemporaneous
									medical, societal, and economic influences. Work by
									Wennberg and colleagues revealed large small-area variations in
									clinical practice, with some patients thirty times more likely to
									undergo an operative procedure than other patients with identical
									symptoms merely because of their geographic location1-6.
									Further critical research suggested that up to 40% of some
									surgical procedures might be inappropriate and that up to 85%
									of common medical treatments were not rigorously validated7-9.
									Meanwhile, the costs of health care were rapidly rising to over
									two billion dollars per day, increasing from 5.2% of the gross
									domestic product in 1960 to 16.2% in 199710. Health maintenance
									organizations and managed care emerged. In addition,
									increasing federal, state, and consumer oversight were brought
									to bear on the practice of clinical medicine. These forces have led to an increased focus on the clinical
									effectiveness of care. Clinical epidemiology provides the methodology
									to assess the clinical effectiveness of care. Part I of this
									series, presented here, provides an overview of the concepts
									of study design, hypothesis testing, measures of treatment
									effect, and diagnostic performance. Evidence-based medicine,
									outcomes assessment, data, and statistical analysis will be
									covered in Part II, to be published in next year's edition of The
									Orthopaedic Journal at Harvard Medical School. Examples
									from the orthopaedic literature and a glossary of terminology
									are provided. Study Design  In observational studies researchers observe patient groups
									without allocation of the intervention, whereas in experimental
									studies researchers allocate the treatment. Experimental studies
									involving humans are called trials. Research studies may be retrospective,
									meaning that the direction of inquiry is backwards
									from the cases and that the events of interest transpired before
									the onset of the study, or they may be prospective, meaning
									that the direction of inquiry is forward from the cohort inception
									and that the events of interest transpire after the onset of
									the study (Fig. 1). Cross-sectional studies are used to survey
									one point in time.
 All research studies are susceptible to invalid conclusions
									due to bias, confounding, and chance. Bias is the nonrandom
									systematic error in the design or conduct of a study. Bias usually
									is not intentional; however, it is pervasive and insidious.
									Forms of bias can corrupt a study at any phase, including
									patient selection (selection and membership bias), study performance
									(performance, information, and nonresponder bias),
									and outcome determination (detection, recall, acceptability,
									and interviewer bias). A confounder is a variable having independent
									associations with both the independent (predictor) and
									dependent (outcome) variables, thus potentially distorting their
									relationship. Frequent confounders in clinical research include
									gender, age, socioeconomic status, and comorbidities. As discussed
									below in the section on hypothesis testing, chance may
									lead to invalid conclusions based on the probability of type-I
									and type-II errors, which are related to p values and power. The adverse effects of bias, confounding, and chance
									can be minimized by study design and statistical analysis.
									Prospective studies minimize selection, information, recall,
									and nonresponder bias. Randomization minimizes selection
									bias and equally distributes confounders. Blinding can decrease
									bias, and matching can decrease confounding. Confounders
									can sometimes be controlled post hoc with use of stratified
									analysis or multivariable methods. The effects of chance can be
									minimized by an adequate sample size based on power calculations 
									and appropriate alpha levels. The ability of study design
									to optimize validity while minimizing bias, confounding, and
									chance is recognized by the hierarchical levels of evidence and
									grades of recommendations established by the U.S. Preventative
									Services Task Force and the Oxford Centre for Evidence-Based
									Medicine on the basis of study design (Table I).
 |  
								| 
									  
									    | Level |  | Levels of Evidence Description |  
									    | 1a | Systematic Review (with homogeneity) of Randomized Clinical Trials |  
									    | 1b | Individual Randomized Clinical Trial (with narrow confidence interval) Individual Inception Prospective Cohort Study with =80% follow-up
 |  
									    | 1c | All or None Case Series |  
									    | 2a | Systematic Review (with homogeneity) of Cohort Studies |  
									    | 2b | Individual Cohort Study Low-Quality Randomized Clinical Trial
 |  
									    | 2c | "Outcomes" Research Ecological Studies
 |  
									    | 3a | Systematic Review (with homogeneity) of Case-Control Studies |  
									    | 3b | Individual Case-Control Study |  
									    | 4 | Case Series Low-Quality Cohort and Case-Control Studies
 |  
									    | 5 | Expert Opinion |  
									    |  |  
									    | Grade |  | Grade of Recommendation Description |  
									    | A | Consistent Level 1 Studies |  
									    | B | Consistent Level 2 or 3 Studies; Extrapolations from Level 1 Studies |  
									    | C | Level 4 Studies; Extrapolations from Level 2 or 3 Studies |  
									    | D | Level 5 Studies; Troublingly Inconistent or Inconclusive Studies of any Level |  |  
								| Observational study designs include case series, casecontrol
									studies, cross-sectional surveys, and cohort studies.
									A case series is a retrospective, descriptive account of a
									group of patients with interesting characteristics or a series of
									patients who have undergone an intervention. A case series of
									one patient is a case report. Case series are easy to construct
									and can provide a forum for the presentation of interesting or
									unusual observations. However, case series are often anecdotal,
									are subject to many possible biases, lack a hypothesis, and are
									difficult to compare with other series. Thus, case series are
									usually viewed as a means of generating hypotheses for further
									studies but are not viewed as conclusive. A case-control study
									is a one in which the investigator identifies patients with an
									outcome of interest and controls without the outcome and
									then looks back retrospectively to identify possible causes or
									risk factors. The effects in a case-control study are frequently
									reported with use of the odds ratio. Case-control studies
									are efficient (particularly for the evaluation of unusual
									conditions or outcomes) and are relatively easy to perform.
									However, an appropriate control group may be difficult
									to identify, and preexisting high-quality medical records
									are essential. Moreover, case-control studies are very
									susceptible to multiple biases (particularly selection and
									detection bias). Cross-sectional surveys are often used
									to determine the prevalence of disease or to identify coexisting
									associations in patients with a particular condition at
									one particular point in time. Surveys are also frequently
									performed to determine preferences and treatment
									patterns. Because crosssectional studies represent a
									snapshot in time, they may be misleading if the research
									question involves the disease process over time. Surveys
									also present unique challenges in terms of adequate
									response rate, representative samples, and acceptability
									bias. A traditional cohort study is one in which a population
									of interest is identified and followed prospectively in order to
									determine outcomes and associations with risk factors. Cohort
									studies are optimal for studying the incidence, course, and risk
									factors of a disease because they are longitudinal, meaning that
									a group of subjects is followed over time. The effects in a cohort
									study are frequently reported in terms of relative risk. Because
									these studies are prospective, they can optimize follow-up and
									data quality and can minimize bias associated with selection,
									information, and measurement. In addition, they have the
									correct time-sequence to provide strong evidence regarding
									associations. However, these studies are costly, are logistically
									demanding, often require long time-periods for completion,
									and are inefficient for the assessment of unusual outcomes or
									diseases. Experimental trials may involve the use of concurrent
									controls, sequential controls (cross-over trials), or historical
									controls. The randomized clinical trial (RCT) with concurrent
									controls is the gold standard of clinical evidence as it provides
									the most valid conclusions (internal validity) by minimizing
									the effects of bias and confounding. A rigorous randomization 
									with enough patients is the best means of avoiding confounding. 
									The performance of an RCT involves the construction of a protocol 
									document that explicitly establishes eligibility criteria, sample 
									size, informed consent, randomization, stopping rules, blinding, 
									measurement, and data analysis. Because allocation is random, 
									selection bias is minimized and confounders (known and unknown) 
									are theoretically equally distributed between groups. Blinding 
									minimizes performance, detection, interviewer, and acceptability
									bias. Intention-to-treat analysis minimizes nonresponder
									and transfer bias, while sample-size determination
									ensures adequate power. The intention-to-treat principle states
									that all patients should be analyzed within the treatment group
									to which they were randomized in order to preserve the goals
									of randomization. Although the RCT is the epitome of clinical
									research designs, the disadvantages of RCTs include their
									expense, logistics, and time to completion. Accrual of patients
									and acceptance by clinicians may be problematic. With rapidly
									evolving technology, a new technique may become rapidly well
									accepted, making an existing RCT obsolete or a potential RCT
									difficult to accept. Ethically, RCTs require clinical equipoise
									(equality of treatment options in the clinician's judgment) for
									enrollment, interim stopping rules to avoid harm and evaluate
									adverse events, and truly informed consent. Finally, while RCTs
									have excellent internal validity, some have questioned their
									generalizability (external validity) because the practice pattern
									and the population of patients enrolled in an RCT may be overly
									constrained and nonrepresentative. Ethical considerations are intrinsic to the design and conduct
									of clinical research studies. Informed consent is of paramount
									importance and it is the focus of much of the activity
									of Institutional Review Boards. Investigators should be familiar
									with the Nuremberg Code and the Declaration of Helsinki as
									they pertain to ethical issues of risks and benefits, protection of
									privacy, and respect for autonomy11,12. Hypothesis Testing The purpose of hypothesis testing is to permit generalizations
									from a sample to the population from which it came.
									Hypothesis testing confirms or refutes the assertion that the
									observed findings did not occur by chance alone but rather
									occurred because of a true association between variables. By
									default, the null hypothesis of a study asserts that there is no
									significant association between variables whereas the alternative
									hypothesis asserts that there is a significant association. If
									the findings of a study are not significant we cannot reject the
									null hypothesis, whereas if the findings are significant we can
									reject the null hypothesis and accept the alternative hypothesis.
 |  
								| 
									  
									    | Experiment | Truth |  
									    | Not Significant | Significant |  
									    | Not Significant | Correct | Type-II (ß) error |  
									    | Significant | Type-I (a) error | Correct |  |  
								| Thus, all research studies that are based on a sample make
									an inference about the truth in the overall population. By constructing
									a 2 x 2 table of the possible outcomes of a study (Table
									II), we can see that the inference of a study is correct if a significant
									association is not found when there is no true association
									or if a significant association is found when there is a true
									association. However, a study can have two types of errors. A
									type-I or alpha (a) error occurs when a significant association
									is found when there is no true association (resulting in a "false
									positive" study that rejects a true null hypothesis). A type-II or
									beta (ß) error wrongly concludes that there is no significant
									association (resulting in a "false negative" study that rejects a
									true alternative hypothesis). The P value refers to the probability of the type-I (a) error.
									By convention, the alpha level of significance is set at 0.05,
									which means we accept the finding of a significant association
									if there is less than a one in twenty chance that the observed
									association was due to chance alone. Thus, the P-value which
									is calculated from a statistical test, is a measure of the strength
									of evidence from the data in favor of the null hypothesis. If the
									P-value is less than the alpha level then the evidence against
									the null hypothesis is strong enough to reject it and conclude
									that the result is statistically significant. P values frequently
									are used in clinical research and are given great importance by
									journals and readers; however, there is a strong movement in
									biostatistics to de-emphasize p values because a significance
									level of P<0.05 is arbitrary, a strict cutoff point can be misleading
									(there is little difference between P=0.049 and P=0.051,
									yet only the former is considered "significant"), the P value
									gives no information about the strength of the association, and
									the P value may be statistically significant without being clinically
									important. Alternatives to the traditional reliance on P values
									include the use of variable alpha levels of significance based
									on the consequences of the type-I error and the reporting of P
									values without using the term "significant." Use of 95% confidence
									intervals in lieu of P values has gained acceptance as
									these intervals convey information regarding the significance
									of findings (95% confidence intervals do not overlap if they are
									significantly different), the magnitude of differences, and the
									precision of measurement (indicated by the range of the 95%
									confidence interval). Whereas the P-value is often interpreted as
									being either statistically significant or not, the 95% CI provides
									a range of values that allows the reader to interpret the implications
									of the results. In addition, while P-values have no units,
									confidence intervals are presented in the units of the variable of
									interest, which helps the reader to interpret the results. Power is the probability of finding a significant association
									if one truly exists and is defined as 1 - the probability of type-II
									(ß) error. By convention, acceptable power is set at =80%, which
									means there is =20% chance that the study will demonstrate no
									significant association when there is a true association. In practice,
									when a study demonstrates a significant association, the
									potential error of concern is the type-I (a) error as expressed
									by the p value. However, when a study demonstrates no significant
									association, the potential error of concern is the type-II
									(ß) error as expressed by power. That is, in a study that demonstrates
									no significant effect, there may truly be no significant
									effect or there may actually be a significant effect but the study
									was underpowered because the sample size may have been too
									small. Thus, in a study that demonstrates no significant effect,
									the power of the study should be reported. The calculations
									for power analyses differ depending on the statistical methods
									utilized for analysis, however four elements are always involved
									in a power analysis: a, ß, effect size, and sample size (n). Effect
									size is the difference that you want to be able to detect with
									the given a and ß. It is based on a clinical sense about how
									large a difference would be clinically meaningful. Low sample
									sizes, small effect sizes, and large variance decrease the power
									of a study. An understanding of power issues is important in
									clinical research to minimize resources when planning a study
									and to ensure the validity of a study. Sample size calculations
									are performed when planning a study. Typically, power is set
									at 80%, alpha is set at 0.05, the effect size and variance are
									estimated from pilot data or the literature, and the equation
									is solved for the necessary sample size. Power analysis is performed
									after a study. Typically, alpha is set at 0.05, the sample
									size, effect size, and variance of the actual study are used, and
									the study's power is determined. Diagnostic Performance A diagnostic test can result in four possible scenarios: (1)
									true positive if the test is positive and the disease is present, (2)
									false positive if the test is positive and the disease is absent, (3)
									true negative if the test is negative and the disease is absent,
									and (4) false negative if the test is negative and the disease is
									present (Table III). The sensitivity of a test is the percentage (or
									proportion) of patients who have the disease that are classified 
									positive (true positive rate). A test with 97% sensitivity implies 
									that of 100 patients with disease, ninety-seven will have a
									positive test. Sensitive tests have a low falsenegative rate. A 
									negative result on a highly sensitive test rules disease out (SNout). 
									The specificity of a test is the percentage (or proportion)
									of patients without the disease who are classified negative (true 
									negative rate). A test with 91% specificity implies that of 100
									patients without the disease, ninety-one will have a negative test. 
									Specific tests have a low false-positive rate. A positive result on 
									a highly specific test rules disease in (SPin). Sensitivity and 
									specificity can be combined into a single parameter, the likelihood 
									ratio (LR), which is the probability of a true positive divided by
									the probability of a false positive. Sensitivity and specificity 
									can be established in studies in which the results of a diagnostic 
									test are compared with the gold standard of diagnosis in the same 
									patients—for example, by comparing the results of magnetic resonance
									imaging with arthroscopic findings13.
 |  
								| 
									  
									    |  | Disease Positive | Disease Negative |  
									    | Test Positive | a (true positive) | b (false positive) |  
									    | Test Negative | c (false negative) | d (true negative) |  |  
								| Sensitivity and specificity are technical parameters of diagnostic
									testing performance and have important implications for
									screening and clinical practice guidelines14,15; however, they are
									less relevant in the typical clinical setting because the clinician
									does not know whether or not the patient has the disease. The
									clinically relevant questions are the probability that a patient
									 has the disease given a positive result (positive predictive value)
									and the probability that a patient does not have the disease
									given a negative result (negative predictive value). The positive
									and negative predictive values are probabilities require an estimate
									of the prevalence of the disease in the population and can
									be calculated using equations that utilize Bayes' theorem. There is an inherent trade-off between sensitivity and
									specificity. Because there is typically some overlap between
									the diseased and nondiseased groups with respect to a test
									distribution, the investigator can select a positivity criterion
									with a low false-negative rate (to optimize sensitivity) or one
									with a low false-positive rate (to optimize specificity) (Fig. 2).
									In practice, positivity criteria are selected on the basis of the
									consequences of a false-positive or a false-negative diagnosis.
									If the consequences of a false-negative diagnosis outweigh
									the consequences of a false-positive diagnosis of a condition
									(such as septic arthritis of the hip in children16), a more sensitive
									criterion is chosen (Fig. 3). This relationship between the
									sensitivity and specificity of a diagnostic test can be portrayed
									with use of a receiver operating characteristic (ROC) curve.
									An ROC graph shows the relationship between the true positive
									rate (sensitivity) on the y-axis and the false positive rate (100-
									specificity) on the x-axis plotted at each possible cut-off (Figure
									4). If a test discriminates well, its ROC curve approaches a truepositive
									rate of 100% and a true negative rate of 0%. On the
									other hand, a test that discriminates poorly has a diagonal ROC
									curve (45-degree line). Overall, the diagnostic performance can
									be evaluated by the area under the ROC curve. In the case of
									perfect discrimination the area under the curve will equal 1.0,
									while an area of 0.5 indicates random guessing17.
 |  
								|  |  
								| Measures of Effect Measures of likelihood include probability and odds. Probability 
									is a number, between 0 and 1, that indicates how likely an event 
									is to occur based on the number of events per the number of trials. 
									The probability of heads on a coin toss is 0.5. Odds are the ratio 
									of the probability of an event occurring to the probability of the 
									event not occurring. The odds of flipping a heads on a coin toss is 
									1 (0.5/0.5). Because probability and odds are related, they can be 
									converted where odds = probability/(1 - probability). Relative risk (RR) can be determined in a prospective cohort study, where RR equals
									the incidence of disease in the exposed cohort divided by the incidence 
									of disease in the nonexposed cohort (Table IV). A similar measurement
									in a retrospective case-control study (where incidence cannot
									be determined) is the odds ratio (OR), which is the ratio of the
									odds of having the disease in the study group compared with
									the odds of having the disease in the control group (Table IV). Factors that are likely to increase the incidence, prevalence,
									morbidity, or mortality of a disease are called risk factors.
									The effect of a factor that reduces the probability of an adverse
									outcome can be quantified by the relative risk reduction (RRR),
									the absolute risk reduction (ARR), and the number needed to
									treat (NNT (Table IV)). The effect of a factor that increases the
									probability of an adverse outcome can be quantified by the relative
									risk increase (RRI), the absolute risk increase (ARI), and
									the number needed to harm (NNH) (Table IV).
 |  
								| 
									  
									    |  | Adverse Events | No Adverse Events |  
									    | Experimental Group | a | b |  
									    | Control Group | c | d |  |  
								| Glossary Absolute Risk Reduction (ARR): Difference in risk of adverse outcomes between experimental and control participants in a trial. Alpha (Type I) Error: Error in hypothesis testing where a significant association is found when there is no true significant association (rejecting a true null hypothesis). The alpha level is the threshold of statistical significance established by the researcher (P<0.05 by convention). Beta (Type II) Error: Error in hypothesis testing where no significant association is found when there is a true significant association (rejecting a true alternative hypothesis). Bias: Systematic error in the design or conduct of a study. Threatens validity of the study. Blinding: Element of study design in which patients and/or investigators do not know who is in the treatment group and who is in the control group. The term masking is often used. Case-Control Study: Retrospective observational study design which involves identifying cases with outcome of interest and controls without outcome, and then looking back to see if they had exposure of interest. Case Series: Retrospective observational study design which describes a series of patients with an outcome of interest or who have undergone a particular treatment. No control group. Confidence Interval (CI): Quantifies the precision of measurement. Usually reported as 95% CI, which is the range of values within which there is a 95% probability that the true value lies. Confounding: A variable having independent associations with both the dependent and independent variables, thus potentially distorting their relationship. Cohort Study: Prospective observational study design which involves the identification of group(s), with the exposure or condition of interest, and then following the group(s) forward for the outcome of interest. Controlling for: Term used to describe when confounding variables are adjusted in the design or analysis of a study in order to minimize confounding. Crossover Study: Prospective experimental study design which involves the allocation of two or more experimental treatments one after the other in a specified or random order to the same group of patients. Cross-Sectional Study: Observational study design which assesses a defined population at a single point in time for both exposure and outcome (survey). Dependent Variable: Outcome or response variable. Distribution: Values and frequency of a variable (Gaussian, binomial, skewed) Effect Size: The magnitude of a difference considered to be clinically meaningful. Used in power analysis to determine the required sample size. Experimental Study: Study design in which treatment is allocated (trial). Failure: Generic term used for an event. Hypothesis: A statement that will be accepted or rejected based on the evidence in a study. Incidence: Proportion of new cases of a specific condition in the population at risk during a specified time interval. Independent Events: Events whose occurrence has no effect on the probability of each other. Independent Variable: Variable associated with the outcome of interest that contributes information about the outcome in addition to that provided by other variables considered simultaneously. Intention to Treat Analysis: Method of analysis in randomized clinical trials in which all patients randomly assigned to a treatment group are analyzed in that treatment group, whether or not they received that treatment or completed the study. Interaction: Relationship between two independent variables such that they have a different effect on the dependent variable. Likelihood Ratio (LR): Likelihood that a given test result would be expected in a patient with condition compared to a patient without the condition. Ratio of true-positive rate to false-positive rate. Matching: Process of making two groups homogeneous for possible confounding factors. Meta-Analysis: An evidence-based systematic review that uses quantitative methods to combine the results of several independent studies to produce summary statistics. Multiple Comparisons: Pairwise group comparisons involving more than one P-value. Negative Predictive Value (NPV): Probability of not having the disease given a negative diagnostic test. Requires an estimate of prevalence. Null Hypothesis: Default testing hypothesis assuming no difference between groups. Number Needed to Treat (NNT): Number of patients needed to treat in order to achieve one additional favorable outcome. Observational Study: Study design in which treatment is not allocated. Odds: Probability that event will occur divided by probability that event will not occur. Odds Ratio: Ratio of the odds of having condition/outcome in experimental group to the odds of having the condition/outcome in the control group (case-control study). One-Tailed Test: Test in which the alternative hypothesis specifies a deviation from the null hypothesis in one direction only. Placebo: Inactive substance used to reduce bias by simulating the treatment under investigation. Positive Predictive Value (PPV): Probability of having the disease given a positive diagnostic test. Requires an estimate of prevalence. Power: Probability of finding a significant association when one truly exists (1-probability of type II (ß) error). By convention, power of 80% or greater is considered sufficient. Prevalence: Proportion of individuals with a disease or characteristic in the study population of interest. Probability: A number, between 0 and 1, indicating how likely an event is to occur. Prospective Study: Direction of inquiry is forward from cohort. Events transpire after study onset. P-Value: Probability of type I (a) error. If the P-value is small, then it is unlikely that the results observed are due to chance. Randomized Clinical Trial (RCT): Prospective experimental study design which randomly allocates eligible patients to experimental vs control groups or different treatment groups. Random Sample: A sample of subjects from the population such that each has equal chance of being selected. Receiver Operating Characteristic (ROC) Curve: Graph showing the test's performance as the relationship between the truepositive rate and the false-positive rate.
									 Regression: Statistical technique for determining the relationship among a set of variables. Relative Risk (RR): Ratio of incidence of disease or outcome in exposed versus incidence in unexposed cohorts (cohort study). Relative Risk Reduction (RRR): Proportional reduction in adverse event rates between experimental and control groups in a trial. Retrospective Study: Direction of inquiry is backwards from cases. Events transpired before study onset. Sample: Subset of the population. Selection Bias: Systematic error in sampling the population. Sensitivity: Proportion of patients who have the outcome that are classified positive. Sensitivity Analysis: Method in decision analysis used to determine how varying different components of a decision tree or model change the conclusions. Specificity: Proportion of patient without the outcome who are classified negative. Validity: Degree to which a questionnaire or instrument measures what it is intended to measure. Notes: Dr. Kocher is an Instructor in Orthopaedic Surgery, Harvard Medical School, and Director of the Program in Clinical Effectiveness, Harvard School of Public Health and Department of Orthopaedics, Children's Hospital, Boston MA. Dr. Zurakowski is the Principal Statistician in the Department of Orthopaedic Surgery, Children's Hospital, Boston MA Address Correspondence To:Mininder S. Kocher, M.D., M.P.H.
 Department of Orthopaedic Surgery
 Children's Hospital
 300 Longwood Avenue
 Boston, MA 02115
 617.355.4849/ 617.739.3338 (fax)
 e-mail: mininder.kocher@tch.harvard.edu
 References:
										 
											Wennberg J, Gittelsohn A: Small area variations in health care delivery. Science, 182(117): 1102-8, 1973.Wennberg J, Gittelsohn A.: Variations in medical care among small areas. Sci Am, 246(4): 120-34, 1982.Wennberg JE: Dealing with medical practice variations: a proposal for action. Health Aff (Millwood), 3(2): 6-32, 1984.Wennberg JE: Outcomes research: the art of making the right decision. Internist, 31(7): 26, 28, 1990.Wennberg JE: Practice variations: why all the fuss? Internist, 26(4): 6-8, 1985.Wennberg JE, Bunker JP, Barnes B: The need for assessing the outcome of common medical practices. Annu Rev Public Health, 1: 277-95, 1980.Chassin MR: Does inappropriate use explain geographic variations in the use of health care services? A study of three procedures [see comments]. JAMA, 258(18): 2533-7, 1987.Kahn KL, Kosecoff J, Chassin M R, Flynn MF, Fink A, Pattaphongse N, Solomon DH, Brook RH: Measuring the clinical appropriateness of the use of a procedure. Can we do it? Med Care, 26(4): 415-22, 1988.Park RE, Fink A, Brook RH, Chassin MR, Kahn KL, Merrick NJ, Kosecoff J, Solomon DH: Physician ratings of appropriate indications for three procedures: theoretical indications vs indications used in practice. Am J Public Health, 79(4): 445-7, 1989.Millenson ML. Demanding Medical Excellence. Chicago: University of Chicago Press, 1997.Katz J. The Nuremberg Code and the Nuremberg Trial. JAMA, 276:1662-6, 1996.World Medical Organization. Declaration of Helsinki: Recommendations guiding physicians in biomedical research involving human subjects. JAMA, 277:925-6, 1997.Kocher MS, DiCanzio J, Zurakowski D, Micheli LJ. Diagnostic performance of clinical examination and selective magnetic resonance imaging in the evaluation of intra-articular knee disorders in children and adolescents. Am J Sports Med, 2001, 29(3): 292-296.Kocher MS. Ultrasonographic screening for developmental dysplasia of the hip: An epidemiologic analysis. Part I. Am J Orthop, 2000, 29(12): 929-933.Kocher MS. Ultrasonographic screening for developmental dysplasia of the hip: An epidemiologic analysis. Part II. Am J Orthop, 2001, 30(1):19-24.Kocher MS, Zurakowski D, Kasser JR. Differentiating between septic arthritis and transient synovitis of the hip in children: An evidence-based clinical prediction algorithm. J Bone Joint Surg, 1999, 81A:1662-1670.Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143:29-36, 1982. |  |  |  |