MATH 2441 | Probability and Statistics for Biological Sciences |
Relative Risk and the Odds Ratio
Review: The Purpose of Estimating the Difference of Two Population Proportions
Primary Outcomes and Risk Factors
Experimental vs. Observational Studies
Prospective vs. Retrospective Studies
Review: The Purpose of Estimating the Difference of Two Population Proportions
In the preceding document, we described the method for estimating the difference between two population proportions, which amounts to estimating the difference in relative frequencies of some characteristic in two populations. This is one way to compare the rates of occurrence of characteristics such as the prevalence of some disease, genetic effect, or reaction between two populations which differ in some respect thought to have a bearing on that rate of incidence. In this document we look at another approach to quantifying differences in frequency of occurrence of some characteristic of two populations.
To make the discussion in this document a little more concrete, we'll return to two examples used to illustrate the estimation of the difference between two population proportions.
Example 1: In a recent publication, R Corona and coauthors (Epidemiol. Infect. (1998), 121, 623-630) reported that of 641 male patients attending a sexually transmitted disease clinic who were under 30 years of age, 18 were HIV positive. On the other hand, 39 of 855 tested individuals who were 30 or more years old tested positive for HIV.
From the data produced by the study we estimated the difference in rate of HIV-positive tests among the population of patients who were under 30 years of age and the population of patients who were 30 years of age or older.
Example 2: In the same article, the authors also distinguished between patients who tested positive or negative for the hepatitis B core antigen (Anti-HBC). Of 1026 patients with negative Anti-HBC, 17 were HIV positive, whereas, of 468 patients who were positive Anti-HBC, 40 were HIV positive.
Patients displaying the hepatitis B core antigen form one population, and patients who did not display this antigen form a second population. From the data, we estimated the difference in the rates of HIV-positive tests between these two populations.
Primary Outcomes and Risk Factors
In situations such as these (we'll describe the criteria a bit more precisely below), it is quite common to use a somewhat different sort of statistic to compare rates of incidence of some characteristic between two populations: either the relative risk (RR) or the odds ratio (OR). A quick look through a variety of journals that deal with topics of interest in the biological sciences indicates that either the RR or the OR appear in summary tables quite often, yet the theory behind these statistics, their calculation, and interpretation, is not addressed by many basic statistics textbooks. One of the few authors who do cover the topic is Wayne Daniel (Biostatistics, 6^{th} edition, pp 542-555). The approach and notation below is similar to Daniel's (and seems to be quite standard) to make it easier for you to pursue this topic further on your own if necessary.
Two other remarks are in order before getting down to details. First, it is more usual to study this topic in the general context of applications of the χ^{2}-distribution and tests for goodness-of-fit, homogeneity and dependence. We will look at those topics later in the course, but since you've already been introduced to the χ^{2}-distribution distribution (in connection with computing confidence interval estimates of the population variance), and since this topic ties in so well (both in concept and in practical applications) with the estimation of the difference of two population proportions, this departure from convention seems justified.
Secondly, to understand what the RR and the OR are measuring, we need to look briefly at some basic issues of data collection and statistical experimental design. The last few weeks of this course have focussed on what you do with data that has already been collected. It is a good time to remind ourselves that issues of how the data was collected are very important as well -- and the way we analyze data and interpret results (and the correctness of those analyses and interpretations) is very strongly dependent on how the data was obtained. Putting poor data into sophisticated mathematical or statistical formulas, or using inappropriate statistical methods to analyze any data leads to poor or inappropriate or false conclusions.
In situations like this, there is always a primary outcome that is tallied. In the above examples, the researchers were counting the number of patients in their study who were HIV positive. (Of course, this means they were also counting the number of patients who were not HIV positive). The sample proportion or rate of occurrence of the primary outcome was then calculated by dividing the number of primary outcomes observed by the number of patients in the sample.
Secondly, in each of the examples above, the patients involved in the study were categorized according to a second characteristic, called a risk factor. In Example 1, the risk factor is the whether the patient was under 30 years of age, or whether the patients was 30 years old or older. In Example 2, the risk factor is whether or not the patient displayed the hepatitis B core antigen.
The goal of the study is to determine whether the risk factor is related to the primary outcome or not. So, the goal of the study in Example 1 would be to determine if the age group of the patient (under 30 or 30 or over) results in a different rate or incidence of the primary outcome -- are patients under 30 more or less likely to be HIV positive than are patients 30 years old or older, and if so, by how much. The study described in Example 2 would have the goal of determining if patients giving a positive result in the Anti-HBC test are more or less likely to be HIV positive, and if so, by how much.
Experimental vs. Observational Studies
Statistical studies can be classified into two broad types. In the jargon being developed here, experimental studies involve assigning risk factors to subjects or members of the samples according to some designed plan. On the other hand, observational studies involve sampling the target population, and simply observing which risk factors are present in each subject or member of the samples so selected.
From the wording of the article cited, it appears that both examples described above were observational studies. In example 1, patients who arrived at the clinic were assessed as either being under 30 years old or 30 or more years old. In example 2, the patients who arrived at the clinic were determined by testing to either have the hepatitus B core antigen in their bodies or not. It wouldn't appear that either study could have been usefully turned into an experimental study. However, note that for example 2 to become an experimental study, we would have to select individuals at random and then turn them into either Anti-HBC positives or negatives. Of course, turning a person into an Anti-HBC positive would require deliberately infecting them with a serious disease, and so would be an unthinkable approach to studying the factors that may pre-dispose a person to HIV infection.
Experimental studies are much more common on non-human or non-living systems. An example where an experimental study would be feasible might be a situation in which the effect of some potential fertilizer on crop yield is being studied. Then plants or groups of plants could be assigned at random to receive different amounts of the fertilizer. In a food sanitation study, randomly selected specimens of the food could be treated with specific concentrations of a solution under study. In such situations, the number of elements of the sample subjected to each treatment, or the characteristics of the treatment itself are carefully planned and selected by the researcher to achieve the clearest conclusions possible.
In observational studies, such manipulations of the sample elements are not possible for technical reasons or perhaps (particularly when human or animal subjects are involved) ethical reasons.
When experimental studies are feasible, they can be designed to give much more specific results than would usually be possible with observational studies, and so would be the method of choice.
Prospective vs. Retrospective Studies
There are two basic types of observational studies: prospective studies and retrospective studies. In both types, two random samples are selected for comparison. The difference between the two has to do with whether samples are selected on the basis of risk factor or on the basis of primary outcome.
In prospective studies, one of the random samples consists of subjects or elements which possess the risk factor and the other random sample consists of subjects which do not possess the risk factor. The elements of each sample are then tracked into the future (the word "prospective" means to look ahead into the future) to determine which will eventually exhibit the primary outcome, and which will not. For instance, if one wished to determine the effect of being under 30 or not on the incidence of contracting HIV, you would start by selecting a random sample of people under 30 years of age, and another random sample of people who were not under 30 years of age. Then, over the term of the study, everyone in these two samples would be monitored to determine how many of each sample do contract HIV and how many do not. Prospective studies can take a lot of time to complete (particularly if the primary outcome is quite rare, or quite slow to occur) and so can also be quite expensive.
With retrospective ("back-looking") studies, one of the random samples consists of subjects or elements which have exhibited the primary outcome (called cases), and the other random sample consists of subjects or elements which have not exhibited the primary outcome (called controls). Thus, for the issue of age as a risk factor in contracting HIV, a prospective study would involve selecting a random sample of persons who are HIV positive, and another random sample of persons who are not HIV positive, and determining how many people in each sample falls into each of the two risk factor groups.
(The usefulness of results in retrospective studies can be influenced strongly by how one goes about selecting the random sample of subjects not exhibiting the primary outcome (the so-called noncases or controls). Often this process is not completely random, but an attempt is made to match controls with subjects in the first sample so that as much as possible the only difference between the two would be in the area of the risk factor under consideration. For example, if you were wanting to get information about a risk factor in the development of some respiratory disease, you would probably want to at least roughly match subjects in the two samples according to age, gender, perhaps some other aspects of lifestyle, family history, geographical location, etc. so that if a difference in incidence is observed, it can be attributed to just the risk factor under consideration as unambiguously as possible.)
Obviously, this whole issue of designing experimental or observational studies becomes highly complex in practice. The rather superficial description above is sufficient for our present purposes.
Relative Risk (RR)
As does Daniel, we define the concept of "relative risk", abbreviated "RR", only in the context of a prospective study, and the notion of an "odds ratio", abbreviated "OR", only in the context of a retrospective study. This terminology is not followed strictly by all who use it and you will, for example, find references that refer to relative risks defined for retrospective studies. In part, this results from the fact that the RR and OR are approximately equal values when the primary outcome is relatively rare.
In prospective studies. the product of the study can be summarized as a 2 x 2 contingency table illustrated by the following generic form:
Primary Outcome |
|||
Risk Factor |
Present |
Absent |
Total |
Present |
a |
b |
a + b |
Absent |
c |
d |
c + d |
Total |
a + c |
b + d |
n = a + b + c + d |
Note that the design of the study has fixed the totals in the right-hand column, with the observations made during the study providing the breakdown values in the middle two columns.
Now, from the table, the proportion of subjects with the risk factor who display the primary outcome a/(a + b), can be viewed as a measure of the risk of developing the primary outcome when the risk factor is present. Similarly, the risk of developing the primary outcome when the risk factor is absent is measured by the proportion c/(c + d). Then, we define the relative risk, RR, as the ratio of these two risks:
(RROR-1)
(In Daniel, the symbol RR is typeset with a carat, ^, overtop to indicate that this is a sample statistic being used as a point estimator of the population relative risk. We won't do that here -- for one thing, it's difficult to do with a pc-based word processor. But of course, (RROR-1) is based on numbers observed for a sample and so is a sample statistic. Our intention is to use the result as an estimate of the corresponding property of the populations which have been sampled.)
The meaning of the value of RR can be seen from the way the formula has been set up. None of the four numbers: a, b, c, and d, can be negative, so the value of RR is always greater than or equal to zero. Note that neither of the sums: a + b or c + d, can themselves be zero, since these are the sizes of the samples selected from the populations with and without the risk factor respectively (in statistics, the decision to work with a sample of size zero is probably an indication that further review of basic concepts is in order! J ). The neutral result, RR = 1, happens when same rate of occurrence of the primary outcome has been observed for both those subjects with the risk factor and those without the risk factor. This would seem to indicate that the presence or absence of the risk factor has little to do with the rate of occurrence of the primary outcome.
Values of RR less than 1 indicate the primary outcome was observed with a lower relative frequency for subjects exhibiting the risk factor than for subjects not exhibiting the risk factor. RR will be its minimum possible value of zero only if a = 0 and c is nonzero. But, a = 0 means that none of the subjects with the risk factor exhibited the primary outcome while some without the risk factor actually did -- which is at least a strong hint that the risk factor does not promote the primary outcome (and may well inhibit it, since the primary outcome has been observed when the risk factor is not present).
Values of RR greater than 1 indicate that the primary outcome was observed with greater likelihood for subjects exhibiting the risk factor than for subjects not exhibiting the risk factor. This would be evidence in support of the risk factor promoting the occurrence of the primary outcome. At the extreme, RR becomes infinite (when one or more subjects with the risk factor exhibit the primary outcome but none without the risk factor do so.)
Although RR can be as small as zero and as large as infinity in principle, these extremes correspond to a = 0 or c = 0, respectively. The underlying theory is invalid in such situations and so such extreme values of RR should be considered meaningless. The theory underlying the use of both the RR and the OR involves the χ^{2}-distribution, and most authors recommend that valid results require all frequencies to be at least 5.
Note that if the primary outcome is quite rare, then 'a' and 'c' will be quite small numbers relative to 'b' and 'd' respectively, and so a + b » b and c + d ≈ d, giving us the approximation
(RROR-2)
(Here, '≈' is used to emphasize the approximateness of this formula). You will see the ratio (RROR-2) show up again a bit later.
Daniel gives the following formula for the upper and lower limits of a 100(1 - α% confidence interval estimate of the population RR, based on previous work by O. S. Miettinen (published in the American Journal of Epidemiology, 1974, 100, p. 515-516, and 1976, 103, p. 226-235)
100(1 - α)% CI: (RROR-3)
where z_{α/2 }is the indicated critical factor from the standard normal probability table, and
(RROR-4)
The symbol used here suggests that this is an estimate of a random variable with the χ^{2}-distribution (which is true, but needn't concern us here). Using the '+' sign in the exponent in (RROR-3) gives a larger number than when the '-' sign is used, so the '-' sign will give the lower limit and the '+' sign will give the upper limit of the confidence interval estimate of the population relative risk. Formula (RROR-3) above is not exactly as it appears in Daniel (see page 544), because he uses the symbol z _{α} to denote the same quantity that we would use z_{α/2} to represent.
Example 1: Considering the study described above as Example 1 to be a prospective study, compute the RR for the primary outcome of a positive HIV test with respect to the risk factor of age. Also compute a 95% confidence interval estimate of the population relative risk.
Solution:
The study described in these two examples is difficult in some ways to categorize as either a prospective study or a retrospective study. The primary outcome for the data mentioned is the categorizing of experimental subjects as either HIV positive or not HIV positive. The risk factor in Example 1 is whether the subject is under 30 years of age or is 30 or more years old. In Example 2, the risk factor is whether the subject tests positive to the hepatitis B core antigen or not. The subjects for the samples were not chosen either on the basis of this primary outcome (making it a retrospective study) nor on the basis of either of these two particular risk factors (making this a prospective study). Instead, subjects became part of the study because they attended a clinic dealing with sexually transmitted disease.
However, since the authors report values of "crude odds ratios", it would appear that they consider the study to be retrospective (You could formulate a story which would make this seem plausible. Once the individual patients in the samples were categorized as either HIV positive or not -- that is, were categorized into one of the two possible primary outcome groups -- the researchers could have gone back and determined which of each outcome group fell into which risk factor group.)
One thing we could do is try to duplicate their calculations to see whether what they're calling an odds ratio corresponds to what this document is calling an odds ratio. In the process, we will be able to illustrate how the formulas are applied to raw data.
So, from the statement of Example 1, we get the following 2 x 2 contingency table:
Primary Outcome |
|||
Risk Factor |
HIV Positive |
HIV Negative |
Total |
30 or older |
a = 39 |
b = 816 |
855 |
under 30 |
c = 18 |
d = 623 |
641 |
Total |
57 |
1439 |
n = 1496 |
Thus,
Thus, the relative rate of the primary outcome of being HIV positive for patients 30 or older compared to patients under 30 is 1.6244. In this case, the primary outcome is quite rare, about 4% of the patients, and so the approximation formula (RROR-2) should give a reasonably good result:
which is fairly close to the exact value of 1.6244. To get a 95% confidence interval estimate for the population relative rate, we need to calculate
(Notice that the four numbers in the product in the denominator here are just the four row and column totals in the contingency table). Keeping in mind that z_{0.025} = 1.96, we get for the two limits of the confidence interval estimate:
Thus, there is a 95% probability that the interval [0.9443, 2.7942] captures the true value of the relative risk for the populations corresponding to the risk factor of age here.
(Since the values tabulated as OR in the published paper are slightly different than the above, and agree more or less exactly with the numbers we'll get when we calculate the OR using this data, we conclude, as earlier surmised, that the authors are considering their work to be a retrospective study and so calculated OR's according to the formulas to be presented below. However, the calculations above show you how to use the RR formulas.)
Example 2: For reference, we duplicate the above calculations for the data described in example 2 above. In tabular form, we have
Primary Outcome |
|||
Risk Factor |
HIV Positive |
HIV Negative |
Total |
Anti-HBC positive |
a = 40 |
b = 428 |
468 |
Anti-HBC negative |
c = 17 |
d = 1009 |
1026 |
Total |
57 |
1437 |
n = 1494 |
Now, using (RROR-1), we get
The approximation formula (RROR-2) gives
The 95% confidence interval estimate of the population relative risk turns out to have the limits
since
The Odds Ratio
The odds ratio is used to estimate the population relative risk when data is obtained in a retrospective study. In a retrospective study, the data can also be displayed in the form of a 2 x 2 contingency table:
Primary Outcome |
|||
Risk Factor |
Cases |
Controls |
Total |
Present |
a |
b |
a + b |
Absent |
c |
d |
c + d |
Total |
a + c |
b + d |
n = a + b + c + d |
In this situation, the design of the study has fixed the values in the bottom row of the table, with the breakdowns in the middle two rows being the result of experimental observation. Remember, the researcher here would start out with the two independent samples comprising respectively the a + c cases and the b + d controls. The risks, a/(a + b) and c/(c + d) no longer make sense, because they depend roughly on the relative number of cases to control subjects, which is a choice to be made by the researcher, not an intrinsic property of the study. Nevertheless, we would very much like to obtain an analogue of the relative risk from this data.
We can generate the formula for the so-called odds ratio using quite a simple argument (along the lines presented by Daniel, for example, in his textbook). In probability theory, "odds" are the ratio of the probability of one outcome to the probability of its opposite outcome. Thus, the odds associated with the outcomes {heads, tails} when a single fair coin is flipped are "1 to 1", sometimes written as 1:1, indicating that both probabilities are identical. The odds of rolling a six with a single fair dice is 1:5, since the probability of rolling a six is 1/6 and the probability of not rolling a six is 5/6, and the ratio of these two numbers is 1 to 5.
Often, as is done below, the odds a:b of some event A are rescaled and written so that the second number is 1 -- that is a:b ≡ a/b:1. Then we say simply that the odds of event A are a/b.
Now, for the subjects with the risk factor, the probability of being a case is a/(a+b) and the probability of being a control is b/(a+b). Thus, for the subjects with the risk factor present, the odds of being a case are
or (RROR-5a)
That is, those subjects with the risk factor are a/b times as likely to be cases as they are to be controls. Similar, for those subjects without the risk factor, the odds of being a case are
or (RROR-5b)
The ratio of these two "odds", defined as
(RROR-6)
should be independent of the actual relative numbers of cases and controls involved in the experiment, since the frequencies a and c should scale together and the frequencies b and d should scale together. (We will return to this issue very briefly at the end of this document in the section titled "A Bit of Algebra.") Formula (RROR-6) is the sample odds ratio, and is used as an estimator of the population relative risk. In principle, the OR can have any value between zero and infinity (with the same cautions about values near zero or near infinity that were stated above in connection with the RR). An OR of value 1 indicates the odds of being a case are the same whether the risk factor is present or absent. OR's smaller than 1 indicate that the odds of being a case are lower for subjects with the risk factor that for those without. OR's which are larger than 1 indicate that the odds of being a case are higher for subjects with the risk factor than for those without the risk factor.
Notice that the right-hand side of formula (RROR-6) is identical to the right-hand side of formula (RROR-2) giving an approximation for the RR. Thus, when we used (RROR-2) in the above examples to compute an approximate value for RR, we were really calculating OR.
Left and right limits of 100(1 - α)% confidence intervals for the OR are given by the formula
(RROR-7)
with z_{α/2} and X^{2} having the same meaning as in formula (RROR-3).
Example 1: For the Example 1 data, we get
with the limits of the 95% confidence interval estimate based on (RROR-7) being
These values agree well with the reported values in the published paper: OR = 1.65, and the 95% confidence interval being [0.94, 2.92]. It isn't obvious why the upper limit of our confidence interval formula is not the same as the value obtained by the authors of the paper. Since this happens with the Example 2 data as well, it might be that the authors of the published paper used some minor variation of the formula that we are unfamiliar with.
The interpretation of this last result could be stated in one of several ways:
there is a probability of 95% that the ratio of the odds of being a case when a subject has the risk factor to the odds of being a case when the subject does not have the risk factor is a value between 0.9423 and 2.9040. In rounder numbers there is a 95% likelihood that subjects with the risk factor have odds of being a case which are 1 to 2.9 times as great as the odds of subjects who do not have the risk factor being a case. |
more loosely, but picking up on the intent to use the value of the OR as an estimate of the population relative risk, we might say that there is a 95% chance that the relative risks of being a case are between 1 and 2.9 times as great for individuals with the risk factor as they are for individuals without the risk factor. |
Example 2: For the example 2 data we get
with the limits of the 95% confidence interval estimate based on (RROR-7) being
The authors report OR = 5.55 and the limits of the 95% confidence interval estimate as [3.01, 10.3]. Our values of the OR are in good agreement, but there is some small difference in the limits of the confidence interval estimate.
Notice that in both of these examples, the 95% confidence interval estimates are rather broad despite the samples involved in the study consisting of a total of nearly 1500 patients. This degree of imprecision seems to be a general feature of formula (RROR-7). We haven't had a chance to refer to the papers by Miettinen to see if he has anything to say about this. However, it is possible to do a bit of an experiment with this data for Example 2. With the actual data, and a = 40, the upper limit of the confidence interval estimate is about 2.8 times as large as the lower limit, and the width of the confidence interval is about 10% greater than the actual value of the OR. If you quadruple all of the numbers in the table for Example 2 (simulating a quadrupling of the sample sizes but keeping all tallies in the same proportion), the 95% confidence interval estimate of the OR becomes [4.27, 7.20]. Now the upper bound is only about 1.7 times as big as the lower bound and the width of the interval is only about 50% the value of the OR. So, it looks like the only way to improve these confidence interval estimates is to increase sample sizes rather dramatically.
Even though the 95% confidence interval estimate of the OR in Example 2 is a rather broad interval, it is still good enough to conclude that patients who are positive for the hepatitis B core antigen are more than three times as likely to be HIV positive compared to those who are not Anti-HBC positive.
A Bit of Algebra
Just before leaving this topic, we'll take a few lines to make the connection between the OR defined above and the population relative risk a little bit more explicit. This will also indicate the conditions under which a retrospective study is likely to give a good estimate of relative risk. Finally, with the formulas developed we can show quickly that while the RR makes no sense for retrospective studies, the OR does -- a claim made earlier but not supported. No doubt this section will try your patience somewhat but hopefully it will also remind you that at its base, statistics is a branch of mathematics, and many of the formulas and principles that we tend simply to state as fact, are really the result of relatively straightforward (though perhaps sometimes tedious) mathematical analysis.
To be specific, let
N = the population size
p = the overall relative frequency of the primary outcome in the population,
p_{p} = the relative frequency of the risk factor among those having the primary outcome
p_{a} = the relative frequency of the risk factor among those not having the primary outcome
Then we can write down the equivalent of the 2 x 2 contingency tables above for the entire population:
Primary Outcome |
|||
Risk Factor |
Present |
Absent |
Total |
Present |
p_{p }pN |
p_{a}(1 - p)N |
[p_{p} p + p_{a}(1 - p)]N |
Absent |
(1 - p_{p})pN |
(1 - p_{a})(1 - p)N |
[(1 - p_{p})p + (1-p_{a})(1-p)]N |
Total |
pN |
(1 - p)N |
N |
Now, for the entire population, the rate of the primary outcome among those who have the risk factor is
In the same way, the rate of primary outcome among those who do not have the risk factor is
Note that the population size has cancelled out of both expressions. The population relative risk is the ratio of these two expressions:
(RROR-8)
On the other hand, the equivalent of the population OR is the ratio is
(RROR-9)
For (RROR-8) and (RROR-9) to give approximately the same result, you can see that we would need to have
and
This can only happen if p_{p}p and (1 - p_{p})p are both small, and they can both be small only if p itself is small (if you adjust the value of p_{p} to reduce the value of one term, the value of the other term will increase). Thus, the odds ratio (from a retrospective study) is a reasonable way to estimate the population relative risk when the primary outcome is relatively rare. Further, in such cases where p is small, it is reasonable to approximate (1 - p) as 1, so that the expressions (RROR-8) and (RROR-9) become
(RROR-10)
where a, b, c, d, are the constants appearing in earlier contingency tables in this document. Here, the symbol '≈' is intended to indicate "is very approximately equal to" -- the four expressions are expected to have similar values in the context in which they have meaning.
Finally, consider the situation of a retrospective study in which a sample of N_{case} cases are selected, and another sample of N_{cont} controls are selected. The previous contingency table would now look like:
Primary Outcome |
|||
Risk Factor |
cases |
controls |
Total |
Present |
p_{p }N_{case} |
p_{a}N_{cont} |
p_{p} N_{case} + pa N_{cont} |
Absent |
(1 - p_{p})N_{case} |
(1 - p_{a})N_{cont} |
(1 - p_{p})N_{case} + (1-p_{a})N_{cont} |
Total |
N_{case} |
N_{cont} |
N = N_{case} + N_{cont} |
Note that here we don't need to include the relative frequencies p and (1 - p) explicitly, because the sample sizes themselves include those factors.
Now,
(RROR-11)
You see that the individual sample sizes cancel out entirely, leaving a result which does not depend on the (arbitrarily chosen) values of N_{case} and N_{cont}.
On the other hand, if we set up the expression for RR from this table, we get
(RROR-12)
This expression does depend on the individual values of both N_{case} and N_{cont}. Adding a few more cases or a few more controls to the study could change the value of RR calculated from (RROR-12) substantially. Thus, computing a sample RR in a retrospective study does not give you a meaningful estimate of a population characteristic. This is why we needed to define the OR, which is meaningful for data in a retrospective study, and then demonstrate that under certain conditions (a rare primary outcome) the sample OR estimates the population RR.
This material is available in Microsoft WORD format here.