A quantity from a sample used to decide whether or not to reject the null hypothesis.
A hypothesis test is typically specified in terms of a test statistic, which is a function of the sample; it is considered as a numerical summary of a set of data that reduces the data to one or a small number of values that can be used to perform a hypothesis test. Given a null hypothesis and a test statistic T, we can specify a "null value" T0 such that values of T close to T0 present the strongest evidence in favor of the null hypothesis, whereas values of T far from T0 present the strongest evidence against the null hypothesis. An important property of a test statistic is that we must be able to determine its sampling distribution under the null hypothesis, which allows us to calculate p-values.
For example, suppose we wish to test whether a coin is fair (i.e. has equal probabilities of producing a head or a tail). If we flip the coin 100 times and record the results, the raw data can be represented as a sequence of 100 Heads and Tails. If our interest is in the marginal probability of obtaining a head, we only need to record the number T out of the 100 flips that produced a head, and use T0 = 50 as our null value. In this case, the exact sampling distribution of T is the binomial distribution, but for larger sample sizes the normal approximation can be used. Using one of these sampling distributions, it is possible to compute either a one-tailed or two-tailed p-value for the null hypothesis that the coin is fair. Note that the test statistic in this case reduces a set of 100 numbers to a single numerical summary that can be used for testing.
A test statistic shares some of the same qualities of a descriptive statistic, and many statistics can be used as both test statistics and descriptive statistics. However a test statistic is specifically intended for use in statistical testing, whereas the main quality of a descriptive statistic is that it is easily interpretable. Some informative descriptive statistics, such as the sample range, do not make good test statistics since it is difficult to determine their sampling distribution.