On the Problem of Multiple Matching

by I.L. BATTIN

Introduction.

2. Matchings between two Decks of cards.

REFERENCES

Moments of the n_ij

Moments of the distribution of Chi Square.

Therefore

Drew University

The problem of determining the distribution of the number of "hits" or "matchings" under random matching of two decks of cards has received attention from a number of authors within the last few years. In 1984 Chapman [2] considered pairings between two series of t elements each, and later [3] generalized the problem to series of u and t (R u) elements respectively. In the same paper he also considered the distribution of the mean number of correct matchings resulting from n independent trials, and gave a method, and tables, for determining the significance of any obtained mean. In 1937 Bartlett [1] considered matchings of two decks of cards, using a number of interesting moment generating functions.

In 1937 Huntington [12, 13] gave tables of probabilities for matchings between decks with the compositions (5⁵), (4⁴), and (3³), where (s^t) denotes a deck consisting of s of each of t kinds of cards. More generally (s₁s₂…s_t) denotes s_i cards of the first kind, s₂ of the second, etc. Sterne [16] has given the first four moments of the frequency distribution for the (5⁵) case and has fitted a Pearson Type I distribution function to the distribution. Sterne obtained his results by considering the probabilities in a 5 X 5 contingency table. He also considered the 4 X 4 and 3 X 3 cases. In 1938 Greville [7] gave a table of the exact probabilities for matchings between two decks of compositions (5⁵). Greenwood [4] obtained the variance of the distribution of hits for matchings between two decks having the respective compositions (s^t) and (s₁s₂…s_t) with s_is₁ +s₂+^… s_t = st = n, and where it is not necessary that all the s's should be different from zero. Earlier Wilks [19] had considered the same problem for t = 5 and n = 25.

In a very interesting paper Olds [15] in 1938 used permanents to express a moment generating function suitable for the problem in question. He obtained factorial moments and the first four ordinary moments about the mean, first for two decks with composition (4²), and then for two decks of composition (s^t) . In 1938 Stevens [17] considered a contingency table in connection with matchings between two sets of n objects each, and gave the means, variances, and covariances of the single cell entries and various sub-totals of the cell entries. Stevens [18] also gave a treatment of the problem of matchings between two decks which was based on elementary considerations. In 1940 Greenwood [6] gave the first four moments of the distribution of hits between two decks of any composition whatever, generalizing the problem which had been treated earlier by Olds [15]. Finally in 1941, Greville [8] gave the exact distribution of hits for matchings between two decks of arbitrary composition. He also considered the problem from the standpoint of a contingency table, as had been done earlier by Stevens.

In 1939 Kullback [14] considered matchings between two sequences obtained by drawing at random a single element in turn from each of n urns U_i containing elements of r types E_j in the respective proportions p_ij. He showed that if the process of drawing were indefinitely repeated the distribution of hits would be that of a Poisson series.

The work which has been done thus far applies to the problem of matching two decks of cards. In the present paper a method is developed for obtaining the moments of the distribution of hits for matchings between three or more decks of cards of arbitrary composition.

In the present paper it will be convenient to take as the point of departure the method used by Wilks [19] in his treatment of the problem of hits occurring under random matching of two decks of 25 cards each, namely a target deck with composition (5⁵) and a matching deck with composition . He showed that

where,

is a suitable generating function for obtaining the moments of the distribution.
In fact, if we define an operator as

where , and if h denotes the number of hits, then for r = 1, 2, ... , 5,

P(h = r) = coefficient of in

And it is readily seen that

Wilks' function involves a particular order for the target deck. If we are to generalize and obtain moments for matchings between more than two decks, it is obvious that we must devise a procedure which will, in the case of two decks, be perfectly symmetrical and not require that one deck be given a preferred status. In the case of two decks this is readily accomplished by the use of Kronecker deltas, and in the case of three or more decks by the use of obvious generalizations of these deltas.

For two decks of 25 cards each with compositions (5⁵) we need only let

where . Then, if

it readily follows that

More generally, for two decks of n cards each, the cards being of k types, and the decks having compositions respectively, we let

The factors of 0 are in one-to-one correspondence with the n events of dealing a card from each of the two decks. The values which can be assumed by the subscripts i and j are in one-to-one correspondence with the k types of cards. The symbol x_i corresponds to the first deck, y_i to the second, the subscripts i and j corresponding to the different types of cards in each deck. The expansion of consists of all products which can be formed by choosing one and only one pair from each factor of as a factor of the product. In forming any term of , choosing . from any factor of corresponds to dealing a card of type from both decks, and introduces into the coefficient of the term. Choosing from any factor corresponds to dealing a card of type from the first deck, from the second. If , then, since is not introduced into the coefficient. Therefore in the coefficient of any term of , will be raised to a power, say s, which is equal to the number of factors of from which pairs . have been chosen.

The total number of ways in which the term

can arise is equal to the number of ways in which two decks of types respectively can be dealt, {where and similarly for }. But this is given by

The coefficient of in is the total number of ways in which the term

can be formed subject to the restriction that pairs are chosen from s of the factors of . But this is precisely the number of ways in which the two decks can be dealt so that there will be s hits. Hence if, as above, h is the number of hits, the probability that h = s, assuming all permutations in each deck to be equally likely, is given by

Since this is true for all values of s it follows that

Since

we have at once

It is an equally straightforward matter to show that

and that

Evidently any of the and may be zero, provided only that

The case of two decks with unequal numbers of cards m and n,(m < n), is readily handled by substituting for the smaller deck one obtained by adding n-m "blank" cards - that is, cards of any type not already appearing in either deck, as indicated by Greville [8], who however obtained his results by considering a preferred order for one of the decks.

EXAMPLE 1. In the case of the decks treated by Wilks [19], n = 25, k = 5, n_1i = n_2j = 5. Hence from (
and from

EXAMPLE 2. Suppose we have two decks as shown by the scheme

Here deck B has five fewer cards than deck A. Hence we must presume that there are six types of cards in all, and that the decks have the respective distributions (578000) and (034625). We then have at once

And

- 3. Matchings between three decks.

Let the three decks be of types respectively, with , and consider the function

where

and the other deltas are the usual Kronecker symbols.

Each factor of corresponds to one deal from each of the three decks. The symbols x, y, and z correspond respectively to cards in the first, second, and third decks. The subscripts i, j, k, = 1, 2, …q correspond to the types of cards - there being q distinct types.

Choosing from a factor of corresponds to a deal in which a card of type a is dealt from all three decks, and introduces into the coefficient of the corresponding term in the expansion of . Similarly, choosing , corresponds to a hit between the first and second decks, and introduces into the coefficient. Similarly choosing introduces ; introduces . Choosing corresponds to a deal with no hits, and introduces no powers of e into the coefficient, since all the d's are zero.
Let be defined by :

Then the coefficient of is the number of ways in which the cards can be dealt so as to yield precisely h₁₂₃ triples, or hits between all three decks. Similarly the coefficient of is the number of ways in which the cards can be dealt so as to yield precisely h₁₂ hits between the first and second decks, with corresponding results for the first and third (h₁₃) and second and third (h₂₃) decks.

By the same reasoning as before then, we have

with similar results for h₁₃ and h₂₃ . And it is a straightforward matter to show that

with corresponding results for other moments. It is understood each summation index takes values from 1 to q.

As before, if the decks do not all have the same total number of cards it is merely necessary to introduce one or more sets of "blank" cards. Thus we would replace decks with the compositions (57800), (03462), (00335) by hypothetical decks (5780000 ), (0346250), (0033509) and proceed as before.

EXAMPLE 3. For three decks of 25 cards, consisting of five of each of five kinds we have n = 25, .

Hence

with similar results for and

4. Generalization to any number of decks.

If the moments of the distribution of hits --doubles, triples, quadruples, . . .-- in matching any number of decks is desired, these can be obtained by using an obvious generalization of . Thus for four decks we would define not all equal, and use

However, it is evident that as the number of decks is increased the summations involved and the manipulation of the (generalized) K operators rapidly become complicated.

5. Application of our moment-generating technique to two-way contingency tables.

The moment-generating technique which we have discussed has wider applications than merely to matching problems. As an example of considerable interest we shall consider the contingency problem. Consider the array

and also the function

If i and j are particular values of a and b respectively, then to the i-th row of the array corresponds the product , consisting of identical factors , one such factor corresponding to each of the elements in the row. To the j-th column of the array corresponds the x_j which appears in each of the factors of . To the ij-th cell of the array corresponds which appears only in the factors , and in each of these only as the coefficient of xj

The expansion of consists of all products which can be formed by taking as factors one and only one element ,' (not summed) from each factor of . But taking from one of the factors of corresponds exactly to putting an element in the ij-th cell of a lattice such as . Hence every term in the expansion of corresponds to a particular distribution in such a lattice. Moreover, all terms of correspond to distributions in which the row totals are , for we must take n_i. elements from the product Further, those terms in which the appear in the particular product correspond to distributions in which the column totals are , since choosing elements corresponds to putting elements in the j-th column and some row of the lattice.

Expanding we obtain

where the summation is over all partitions

It is clear that since every set of values of the subject to the partition restrictions corresponds to a particular distribution of n elements in the lattice , every particular product corresponds to such a distribution, and represents the number of ways in which it can arise. Further, the total coefficient displayed , namely , represents the total number of ways in which distributions with row totals and column totals can arise. Setting all the we readily find

Hence the probability of any particular distribution with fixed row totals and fixed column totals is :

Consider now the result of differentiating with respect to a particular . We obtain

where denotes summation over indices such that . Now , but also n_ij can never be less than

. For . Since the maximum value of the maximum value of .

Hence

Accordingly, combining all the terms of in which n_ij has a particular value, g, we have

where S* denotes summation and P* multiplication with n_ij = g.

Since is precisely the number of distributions for which n_ij = g, it follows that

Similarly it follows that

where we may have (i = k) or (i ≠ k), and (j = l))or (j ≠ l).
By straightforward differentiation and reduction we find that for the array with given marginal totals

For the array :

Hence, using the above results we can, theoretically, find all the moments of the exact distribution Of . It is not difficult to show that

The value of and the variance of . were found by straightforward application of our methods and the results agreed with those given by Haldane [10].
The writer is indebted to Professor Wilks for helpful criticisms and suggestions.

Type of cards

Total of all types

N° in deck A

N° in deck B

[1] M. S. BARTLETT, "Properties of sufficiency and statistical tests," Proc. Roy. Soc., A, Vol. 160 (1937), pp. 268-282.

[2] DWIGHT CHAPMAN, "The statistics of the method of correct matchings," Amer. Jour. Psych., Vol. 46 (1934), pp. 287-298.

[3] DWIGHT CHAPMAN, "The generalized problem of correct matchings," Annals of Math. Stat., Vol. 6 (1935), pp. 85-95.

[4] J. A. GREENWOOD, "Variance of a general matching problem," Annals of Math. Stal., Vol. 9 (1938), pp. 56-59.

[5] J. A. GREENWOOD, "Variance of the ESP call series," Jour. of Parapsychology, Vol. 2 (1938), pp. 60-64.

[6] J. A. GREENWOOD, "The first four moments of a general matching problem," Annals of Eug., Vol. 10 (1940), pp. 290-292.

[7] T. N. E. GREVILLE, "Exact probabilities for the matching hypothesis," Jour. of Parapsychology, Vol. 2 (1938), pp. 55-59.

[8] T. N. E. GREVILLE, "The frequency distribution of a general matching problem," Annals of Math. Stat., Vol. 12 (1941), pp. 350-354.

[9] J. B. S. HALDANE, "The mean and variance of Chi square, when used as a test of homogeneity, when expectations are small," Biometrika, Vol . 31 (1940).

[10] J. B. S. HALDANE, "The first six moments of Chi-square for an n-fold table with n degrees of freedom when some expectations are small," Biometrika, Vol. 29 (1939), p. 389.

[11] J. B. S. HALDANE, "The exact value of the moments of the distribution of chi-square, used as a test of goodness of fit, when the expectations are small," Biometrika, Vol. 29 (1939), p. 133.

[12] E. V. HUNTINGTON, "A rating table for card matching experiments," Jour. of Parapsychology, Vol. 4 (1937), pp. 292-294.

[13] E. V. HUNTINGTON, "Exact probabilities in certain card-matching problems," Science, Vol. 86 (1937), pp. 499-500.

[14] SOLOMON KULLBACK, "Note on a matching problem," Annals of Math. Stat., Vol. 10 (1939), pp. 77-80.

[15] E. G. OLDS, "A moment-generating function which is useful in solving certain matching problems," Bull. Amer. Math. Soc., Vol. 44 (1938), pp. 407-413.

[16] T. E. STERNE,, "The solution of a problem in probability," Science, Vol. 86, (1937) pp. 500-501.

[17] W. L. STEVENS, "Distribution of entries in a contingency table," Annals of Eugenics, Vol. 8 (1938), pp. 238-244.

[18] W. L. STEVENS, "Tests of significance for extra sensory perception data," Psychological Review, Vol. 46 (1938), pp. 142-150.

[19] S. S. WILKS, "Statistical aspects of experiments in telepathy," a lecture delivered to the Galois Institute of Mathematics, Long Island University, December 4, 1937.