Shannon’s Diversity Index

Basic Concepts

For categorical data, there is no mean or median, and so we can’t use the measures of variation described in Measures of Variability. Instead, we use a measure of the distribution of observations among the categories. In particular, for a random sample, we can use Shannon’s index of diversity (aka as Shannon-Weiner’s index), which is defined as

Shannon's diversity index

Here, ni is the number of observations from the sample in the ith of k (non-empty) categories and so n = \sum_{i=1}^n n_i is the sample size. An equivalent formula is

Shannon's diversity index

where pi is the proportion of observations in the ith of k (non-empty) categories. We usually use e, 10, or 2 as the base of the logarithm.

The maximum value of H′ occurs when all the categories have the same number of observations, in which case all the pi are equal. Since the sum of the pi is 1, it follows that

image129

ThusH' max

which proves thatMax Shannon's diversity index

It is common to consider the following measure of relative diversity (aka evenness or homogeneity):

Relative diversity

We can view 1 – J′ as a measure of heterogeneity (or dominance).

Another measure of homogeneity is given by the formula

Alternative relative diversity

Interpretation

As observed above, H′ is largest occurs when all the categories have the same number of observations (homogeneity of categories), in which case all the pi are equal. In this case, J′ = 1. J′ takes values between 0 and 1. When J′ = 0 the categories are most heterogeneous.

Examples

Example 1: Find Shannon’s index of diversity and index of relative diversity for a random sample of 25 observations distributed among five categories as shown in range B4:F4 of Figure 1.

Shannon's diversity index

Figure 1 – Sample Index of Diversity

The result is shown in Figure 1.

Here, cell G4 contains the formula =SUM(B4:F4). Cell B5 contains the formula =B4/G4. Cell B6 contains the worksheet formula =LOG10(B5). B8 contains the formula =-SUMPRODUCT(B5:F5,B6:F6). Cell B9 contains the formula =LOG10(COUNTA(B3:F3)) and, finally, cell B10 contains B8/B9.

Alternatively, H′  in cell B8 can be calculated by the formula

=LOG10(SUM(B4:F4))-SUMPRODUCT(B4:F4,LOG10(B4:F4))/SUM(B4:F4)

The sample value of H′ tends to underestimate the corresponding population index of diversity H (i.e. it is a biased estimate). Note that some categories from the population may not be present in a sample (especially in a small sample). Thus, the sample value of J’ tends to overestimate the corresponding population relative diversity index J′. This means that it is a biased estimate.

Improved Version

The following is an improved version of Shannon’s index which takes into account that some population categories may not be present in the random sample.

Alternative Shannon's IndexExamples Workbook

Click here to download the Excel workbook with the examples described on this webpage.

References

Offwell Woodland & Wildlife Trust (2016) Simpson’s diversity index. Ecological sampling methods
http://www.countrysideinfo.co.uk/simpsons.htm

Krebs, C. J. (2014) Species diversity measures
https://www.zoology.ubc.ca/~krebs/downloads/krebs_chapter_13_2017.pdf

Wikipedia (2016) Diversity index
https://en.wikipedia.org/wiki/

NIST (2016) Shannon’s diversity index 
https://www.itl.nist.gov/div898/software/dataplot/refman2/auxillar/shannon.htm

47 thoughts on “Shannon’s Diversity Index”

  1. Hi there,

    Can I utilize the Shannon diversity index if I replace species count with percentage coverage? If yes, how should I proceed in this case? and if not , what formula should i use to analyze the data ? Thank you so much for the help.

    Regards
    Frances lam

    Reply
    • Hi Frances,
      Does percentage coverage correspond to row 6 in Example 1? If so, then example shows how to proceed. If not, what is the difference between percentage coverage and row 6?
      Charles

      Reply
  2. Hi Charles:
    Thanks for this guide on this index, it really helps a lot!
    But I have encountered a little problem when practicing this process. How should I deal with the 0 values? As LOG can’t calculate this?

    Reply
    • Hello Yi Xu,
      As you have noted, for this index none of the categories can be zero. You can either not include empty categories or use a different diversity index.
      Charles

      Reply
  3. After calculating the H’ value, how do you interpret the data? Is the range value and corresponding interpretation for that? Thank youu.

    Reply
  4. Good afternoon Charles.
    Thank you for the information on Shannon-Weiner diversity index: it’s awesome.
    But, please how do I present the alternative formula in excel?

    Thanks.

    Archibald

    Reply
  5. Seems like there’s another typo. The text says “cell B5 contains the formula =A4/G4,” but I think the formula in B5 should be =B4/G4…and I’d use =B4/$G$4 so I could fill right with it.

    Reply
  6. Hi Charles, how do you calculate standard deviation for H’? I am comparing H’ for two sites and am unsure which graph type to use and how to calculate SD for a single H’ value. Thanks, Alex

    Reply
  7. Hi Charles,

    Thank you for the article, it was very helpful. But I have one question and I haven’t been able to find the answer, probably because depends on the situation.
    My question is, what is the minimum sample size to use Shannon’s Diversity Index and obtain reliable results? I sampled 13 different sites/points and found 11 different species and a total of 110 individuals.
    Do you think that this sample size is enough?

    Thank you,
    Daniel Santos

    Reply
  8. Good afternoon Charles,

    Hope all is well. I just have a simple question. If my data from a small sample has only one species, then my Shannon Weiner is one. How do I include this point on a bar graph?

    Best,
    Theresa

    Reply
  9. I was given two formulas of Shannon’s diversity index.

    a. H’ = – ∑ pi ln pi
    b. H’ = [N*log(N) – ∑ni log (ni)]/N

    For the same data, I used both of the formulas given and the results are different. I wonder where I went wrong.

    Reply
    • Hello Vivi,
      I don’t know what N*represents. In any case, on this webpage there are two formulas given which should yield the same result *on involving the n_i and the other involving the p_i).
      Charles

      Reply
  10. Hello, Dear Charles, I want to ask I have data of population for every 3 cities and GDP of these cities and urban area. Can I use for correlation or validation with Shannons diversity index ..

    Reply
  11. Hello, Dear Charles. I want to ask, can I use for Shannon Diversity Index data of population every 3 cities and GDP and I have data for the (ha)of the urban area.

    Reply
  12. Hello, first of all
    Thank you for your post it was very helpfull, but i encountered a little problem.
    My Shannon’s diversity index is superior to my H’max with Ln but also log2 or log10.
    I was wondering if you had an idea
    Best Regards

    Reply
      • Hello, thank you for your quick answer.

        My issue is my H’ value is 7,054 and my H’max 4,392 for 21 species so it’s impossible. But i applied the formula : -Sum of (pi log2 pi) so i don’t understand where can my mistake could be

        Reply
  13. Hello,

    I wanted to ask when calculating Shannon’s index, do we use number of the species or abundance number?

    Thank you

    Reply
    • Hello Monika,
      You haven’t given me enough information to give a definitive answer, but you probably need to use the frequencies (abundance numbers?) of each species.
      Charles

      Reply
  14. Hi Charles,

    What is the ‘p’ variable in the second definition of the Shannon index that you defined?
    Also do you have a reference for this that i could read up?

    Thanks

    Reply
    • Hello Andrew,
      Perhaps I am mistaken, but I understood that log base 2, 10 or e could be used. I think I saw this in Zar’s textbook.
      Charles

      Reply
  15. When calculating Shannon’s index, how do you standardize by the number of quadrats sampled when unequal sample unit size occurs among sites? Using cover data for vegetation.

    Reply
    • Hi Mary,

      I guess I am doing similar thing as you do. I try to use land cover ctaegories and compare H’ in different counties. Counties obviously have different areas and I wonder how this would (or maybe not) change my calculations. From Charles answer to you it seems that differences in areas are irrelevant. Do you agree? Thanks!

      Reply

Leave a Comment