MATH 2441

Probability and Statistics for Biological Sciences

BCIT

The t-Distribution

 

Calculation of t-probabilities Using the t-table

t-distribution Calculations with MS Excel

 

The so-called Student's t-distribution could well be the second most commonly used probability distribution in statistics. It was first described in a paper written by William Gosset in 1908. Gosset, an employee of the Guinness brewing company in Dublin, Ireland, became involved in the statistical analysis of data collected from studies of the brewing process. Gossett published reports of his work under the pseudonym "Student" to get around a Guinness company policy prohibiting employees from publishing reports on their work -- hence the name "Student t-distribution." Just as the standard normal random variable is conventionally denoted by the character 'z', the student t-distributed random variable is conventionally denoted by the symbol 't'.

The t-distribution often arises in situations involving small sample sizes, and perhaps limited information in other respects. For example, when data from a large sample is available for problems involving the population mean (or small samples, but the population standard deviation is known), the standard normal distribution applies. However, when the sample size is small and the population standard deviation is not known, it is necessary to use the t-distribution rather than the standard normal distribution. In a way, you can think of the standard normal distribution as a special case of the t-distribution appropriate when sample sizes are large.

One way to write the probability density function for the t-distribution is:

Here, the symbols Γ ( ) denote a mathematical function called the "Gamma Function" which is a generalization of the factorial function which you've seen earlier in this course. The two Gamma Functions just produce constants in this formula. These along with the square root term are present to ensure that
Pr(-
< t < + ) = 1, as is required of all such probability distributions.

 

You need to notice two things about this probability density function.

first, it depends only on t2 which means that it is symmetric about t  = 0. Thus, like the standard normal probability distribution, the t-distribution is symmetric about 0.
The shape of the density curve depends on a parameter denoted ν here (the Greek letter 'n', pronounced 'noo'). This quantity, ν , called the degrees of freedom, is a positive nonzero integer value. wpe4.gif (2359 bytes)

The solid curves in the figure to the right are graphs of f(t) for ν = 1 (the smallest possible value) and ν = 3, respectively in order of increasing height at t = 0. The dotted curve is a graph of f(z), for the standard normal random variable. From this you see that the t-distribution has a very bell-like shape, but for smaller values of ν , the bell is lower and broader. As the value of ν is increased, the t-distribution looks more and more like the standard normal distribution. It can be shown mathematically that the t-distribution is identical to the standard normal distribution in the limit that ν → ∞ . However, at values of ν as small as 10 or 12, the graphs of f(t) are nearly indistinguishable from graphs of the standard normal probability density function, and by the time ν is as large as 29 or 30, results using the t-distribution agree with results from the standard normal distribution to within a percentage point or two, and so statisticians tend to use the standard normal probability tables in place of t-tables whenever the value of ν is larger than 29 or 30.

To summarize, the t-distributed random variable and its distribution have the following general properties:

the mean value is zero (like the standard normal random variable)
the distribution is bell-shaped, and symmetric about the value zero on the horizontal axis
the t random variable can have any value between - and + , but most of the probability density is found in the near vicinity of t = 0 (though not in quite as narrow a region as for the standard normal distribution)
there is a family of distinct t-distributions, distinguished by the value of a single parameter ν , which can have any positive non-zero integer value. The larger the value of ν , the more that particular t-distribution will be like the standard normal distribution.
generally, for smaller values of ν , the t-distribution will have a lower central peak and higher tails than does the standard normal distribution. Whereas the variance of the standard normal distribution is exactly 1, the variance of the t-distribution is
                              

a value which is bigger than 1 in principle (indicating that the t-distribution is more spread out than the standard normal distribution), but this fraction has values very close to 1 once the value of ν becomes appreciably bigger than 1 itself.

The t-distribution can be used to calculate probabilities in much the same way that you would calculate probabilities for any other continuous distribution. Pr(a < t < b) would be just the area under the t-probability density function between t = a and t = b. In principle, computation of such an area would involve evaluation of an integral.

Evaluating integrals is at best rather tedious, and often, is impossible to do exactly by hand, so it is tempting to try to exploit the properties of the t-distribution (which seem to be so much like the those of the standard normal distribution) to develop tables of values that can be used to calculate probabilities. Unfortunately, this approach would require a separate one-page table for each value of ν . However, statisticians also realized that most of their applications involving the t-distribution didn't require the computation of probabilities so much as the determination of a few commonly used percentiles of the t-distribution.

As a result, it has become conventional to organize t-tables quite differently from the way the standard normal probability table is organized. As you see in the table included at the end of this document, just one row of the table is reserved for each value of ν . Although most published t-tables cover the range ν = 1 to ν  = 30, we've given a bit more coverage in our table so you can see how little difference there is between values of the t-percentiles for values of ν larger than 29 or 30, and values of corresponding z-percentiles.

wpeA.gif (2868 bytes)

The numbers in the body of the t-table are values of tα for the value of α given at the top of the column and the value of α given at the left of the row. Often, to emphasize the fact that a t-percentile is distinguished by both the value of α (the area of the right-hand tail it cuts off -- see the diagram to the right as a reminder of the meaning of this subscript a notation) and the value of ν , people use a combined notation: tα , ν , where appropriate numbers would be substituted for each symbol.


Example: Use the attached t-table to determine t0.05,9.

Answer:

To find this value, read the number in the row labeled ν = 9 and the column headed α = 0.05. We get

                t0.05,9. = 1.833

What this number means is that if we had an experiment which produced values of t according to the t-distribution with ν = 9, then

                Pr(observe a value of t which is greater than 1.833) = 0.05

 

Thus, finding percentiles of the t-distribution from the table is just a matter of reading the correct row and column for values of ν and α covered by the table. The table given here covers all of the values of ν that you will ever need it for. It only covers ten different values of α , but these are by far the most commonly used ones in constructing confidence interval estimates (or setting up rejection regions in hypothesis testing), so you should find the table quite adequate for most of the work you do that requires use of the t-distribution. If you must have values of tα , ν , for values of α not covered in the table, you could hunt for more extensive tables, or you might consider interpolating between values available in the table given here (not really recommended), or, nowadays, you can use readily available computer programs to produce the values you need (see instructions regarding the use of Excel just below).

 


Calculation of t-probabilities Using the t-table

It is very uncommon to need to calculate probabilities for the t-distributed random variable except for regions which are either a single tail, or made up of two identical tails, that iswpeF.gif (2803 bytes)

wpeD.gif (2300 bytes)

 

 

 

 

 

 

We will illustrate how to at least estimate the areas of regions of this type from the standard t-tables. In the next section, we explain briefly how to get more accurate values using functions available in Excel. Calculating this sort of probability is required in the computation of so-called p-values for hypotheses tests.

Example: Estimate Pr(t > 1.85) for ν = 17 using the standard t-table.

Answer:wpe10.gif (3428 bytes)

The figure shows the situation. Looking at the row labeled ν = 17 in the standard t-table, we see that

                t0.05,17 = 1.740

indicating that

                Pr(t > 1.740) = 0.05

for this value of ν . Similarly, we see that

                t0.025,17 = 2.110

indicating that

                Pr(t > 2.110) = 0.025.

We selected these two entries because they correspond to values 1.740 and 2.110 which bracket the number 1.85 appearing in the original question. Thus, at the very least, we can say that for ν = 17,

                Pr(t > 1.85) is between 0.025 and 0.05.

This is not very precise, but as an estimate of a p-value for a hypothesis test, it is probably adequate. You might think of doing some linear interpolation to get a better value:

               

(The exact value to this precision is 0.0409, so the linear interpolation has tended to overestimate the probability, as you would expect from the shape of the graph of the density function.) If you must work from standard tables, and you must have better accuracy than simply bracketing the probability between two successive tabulated values, then some sort of interpolation scheme such as the above is necessary. If high accuracy is necessary and you have access to a computer application (such as MS Excel or software applications designed to facilitate statistical calculations), then use that tool to calculate the required probabilities directly.


Example: Estimate Pr(|t| > 2.58) when ν = 7.

Answer:

From the earlier figure illustrating this situation, we see immediately that we can use the symmetry of the t-distribution about t = 0 to write

                Pr(|t| > 2.58) = Pr (t < -2.58) + Pr(t > 2.58) = 2 x Pr(t > 2.58)

Then, from the ν = 7 row of the standard t-table, we find that the two entries bracketing t = 2.58 give:

                Pr(t > 2.365) = 0.025

and

                Pr(t > 2.998) = 0.01

Thus, Pr(t > 2.58) is a value between 0.025 and 0.01. This means that 2 x Pr(t > 2.58) is a value between 2 x 0.025 = 0.05 and 2 x 0.01 = 0.02. Thus, we conclude that

                Pr(|t| > 2.58) is a number between 0.05 and 0.02.

(Linear interpolation gives 0.0398 and the exact value is 0.0365.)



t-distribution Calculations with MS Excel

Excel provides two functions related to the t-distribution.

                TDIST(c, ν , 1) gives Pr(t > c) for ν degrees of freedom.

                TDIST(c, ν , 2) gives Pr(|t| > c) for ν degrees of freedom.

TINV(x, ν ) gives the value of c that satisfies the equation: Pr(|t| > c) = x. That is, it gives tx/2,ν . This is the function we used to prepare the table on the next page.

 

 

Right-Hand Tail Critical Values for the Student t-distribution

<-------------------- alpha -------------------->

v = n - 1

0.20

0.15

0.10

0.05

0.025

0.01

0.005

0.0025

0.001

0.0005

1

1.376

1.963

3.078

6.314

12.706

31.821

63.656

127.321

318.289

636.578

2

1.061

1.386

1.886

2.920

4.303

6.965

9.925

14.089

22.328

31.600

3

0.978

1.250

1.638

2.353

3.182

4.541

5.841

7.453

10.214

12.924

4

0.941

1.190

1.533

2.132

2.776

3.747

4.604

5.598

7.173

8.610

5

0.920

1.156

1.476

2.015

2.571

3.365

4.032

4.773

5.894

6.869

6

0.906

1.134

1.440

1.943

2.447

3.143

3.707

4.317

5.208

5.959

7

0.896

1.119

1.415

1.895

2.365

2.998

3.499

4.029

4.785

5.408

8

0.889

1.108

1.397

1.860

2.306

2.896

3.355

3.833

4.501

5.041

9

0.883

1.100

1.383

1.833

2.262

2.821

3.250

3.690

4.297

4.781

10

0.879

1.093

1.372

1.812

2.228

2.764

3.169

3.581

4.144

4.587

11

0.876

1.088

1.363

1.796

2.201

2.718

3.106

3.497

4.025

4.437

12

0.873

1.083

1.356

1.782

2.179

2.681

3.055

3.428

3.930

4.318

13

0.870

1.079

1.350

1.771

2.160

2.650

3.012

3.372

3.852

4.221

14

0.868

1.076

1.345

1.761

2.145

2.624

2.977

3.326

3.787

4.140

15

0.866

1.074

1.341

1.753

2.131

2.602

2.947

3.286

3.733

4.073

16

0.865

1.071

1.337

1.746

2.120

2.583

2.921

3.252

3.686

4.015

17

0.863

1.069

1.333

1.740

2.110

2.567

2.898

3.222

3.646

3.965

18

0.862

1.067

1.330

1.734

2.101

2.552

2.878

3.197

3.610

3.922

19

0.861

1.066

1.328

1.729

2.093

2.539

2.861

3.174

3.579

3.883

20

0.860

1.064

1.325

1.725

2.086

2.528

2.845

3.153

3.552

3.850

21

0.859

1.063

1.323

1.721

2.080

2.518

2.831

3.135

3.527

3.819

22

0.858

1.061

1.321

1.717

2.074

2.508

2.819

3.119

3.505

3.792

23

0.858

1.060

1.319

1.714

2.069

2.500

2.807

3.104

3.485

3.768

24

0.857

1.059

1.318

1.711

2.064

2.492

2.797

3.091

3.467

3.745

25

0.856

1.058

1.316

1.708

2.060

2.485

2.787

3.078

3.450

3.725

26

0.856

1.058

1.315

1.706

2.056

2.479

2.779

3.067

3.435

3.707

27

0.855

1.057

1.314

1.703

2.052

2.473

2.771

3.057

3.421

3.689

28

0.855

1.056

1.313

1.701

2.048

2.467

2.763

3.047

3.408

3.674

29

0.854

1.055

1.311

1.699

2.045

2.462

2.756

3.038

3.396

3.660

30

0.854

1.055

1.310

1.697

2.042

2.457

2.750

3.030

3.385

3.646

31

0.853

1.054

1.309

1.696

2.040

2.453

2.744

3.022

3.375

3.633

32

0.853

1.054

1.309

1.694

2.037

2.449

2.738

3.015

3.365

3.622

33

0.853

1.053

1.308

1.692

2.035

2.445

2.733

3.008

3.356

3.611

34

0.852

1.052

1.307

1.691

2.032

2.441

2.728

3.002

3.348

3.601

35

0.852

1.052

1.306

1.690

2.030

2.438

2.724

2.996

3.340

3.591

36

0.852

1.052

1.306

1.688

2.028

2.434

2.719

2.990

3.333

3.582

37

0.851

1.051

1.305

1.687

2.026

2.431

2.715

2.985

3.326

3.574

38

0.851

1.051

1.304

1.686

2.024

2.429

2.712

2.980

3.319

3.566

39

0.851

1.050

1.304

1.685

2.023

2.426

2.708

2.976

3.313

3.558

40

0.851

1.050

1.303

1.684

2.021

2.423

2.704

2.971

3.307

3.551

50

0.849

1.047

1.299

1.676

2.009

2.403

2.678

2.937

3.261

3.496

60

0.848

1.045

1.296

1.671

2.000

2.390

2.660

2.915

3.232

3.460

80

0.846

1.043

1.292

1.664

1.990

2.374

2.639

2.887

3.195

3.416

100

0.845

1.042

1.290

1.660

1.984

2.364

2.626

2.871

3.174

3.390

150

0.844

1.040

1.287

1.655

1.976

2.351

2.609

2.849

3.145

3.357

Infinity

0.842

1.036

1.282

1.645

1.960

2.326

2.576

2.807

3.090

3.290

 

This material is also available in Microsoft WORD format here.

listtopics.gif (2280 bytes)

Copyright 1999 [David W. Sabo]. All rights reserved.
Revised: April 1, 2003