Sorting and Removing Duplicates

Excel provides standard capabilities for sorting and eliminating duplicates. These can be accessed by Data > Sort & Filter|Sort and Data > Data Tools|Remove Duplicates. See Sorting and Filtering for more details.

You can also use Excel formulas to sort a column of data, as described in the following example.

Sorting numeric data using Excel formulas

Example 1: Sort the numeric data in column B of Figure 1.

Sorting formulas Excel

Figure 1 – Sorting numeric data using standard Excel formulas

Create a column of indices in column A by putting 1 in cell A6, the formula =A6+1 in cell A7, and then filling down (by highlighting the range A7:A15 and pressing Ctrl-D). Next place the formula =SMALL(B$6:B$15,A4) in cell C6, highlight the range C6:C15, and press Ctrl-D (to fill down). Column C now contains the same data as in column B but in sorted order.

Sorting is achieved since the formula =SMALL(B$4:B$13, d) has the value of the dth smallest element in the range B6:B15). This approach only works for numeric data. If any of the cells is empty or contains non-numeric data it will not be sorted, but instead the error value #NUM! will appear at the end of the sorted list in column C.

The column of indices (column A) is not really necessary since the same effect can be achieved using the ROW function (see column F of Figure 1). This time we put the formula =SMALL(E$6:E$15,ROW(E6)-ROW(E$6)+1) in cell F6, highlight F6:F15 and press Ctrl-D.

Sorting numeric data w/o duplicates using Excel formulas

Example 2: Sort the data from Example 1 dropping any duplicate entries.

Sorting w/o duplicates Excel

Figure 2 – Sorting without duplicates

Here the data is contained in column I. To eliminate duplicates we need to count the number of times each data item in column I occurs (column J) by placing the formula =COUNTIF(I7:I$16,”=”&I6) in cell J6, highlighting range J6:J15 and pressing Ctrl-D. We now want to retain those elements whose count is 0 in column J. We do this by placing =IF(J6=0,I6,””) in cell K6 and filling down.

Column K now contains the same elements as in column I but without duplicates. We next sort the resulting elements as we did in Example 1. The result is shown in column L. Note that the extra cells are filled with the error code #NUM!. We can combine some of the steps as shown on the right side of Figure 2. We can also remove the #NUM! entries using formulas such as =IF(ISNUMBER(P6),P6,””).

Sorting alphanumeric data using Excel formulas

Examples 3: Sort the alphanumeric data in column B of Figure 3.

Sorting text data Excel

Figure 3 – Sorting alphanumeric data using Excel formulas

The approach used in Example 1 doesn’t work with non-numeric data. This time we need to use the COUNTIF, MATCH, and INDEX functions to accomplish our goal (see Built-in Excel Functions). We rank each of the elements in column B from lowest in alphabetic order to highest in column C: e.g. cell C21 contains the formula =COUNTIF(B$21:B$30,”<“&B21). Note that duplicate entries for ape have the same rank (namely 0) and similarly the duplicate entries for dog and lion have the same ranking (4 and 7 respectively).

We next count the number of occurrences of each word in the list (column D): e.g. cell D21 contains the formula =COUNTIF(B$21:B21,B21). We add these count values from column D to the rank values from column C to obtain a revised ranking in column E. Note that all the elements in the list now have a different ranking and the blank entry has a rank of 0.

For each data element (from column B) we need to determine its rank (from column E) and put the index of that element (from column A) in the row of E with that rank. E.g. since the index of the first occurrence of lion (namely 1) has rank 8 we put a 1 in the 8th position of column F (i.e. cell F28). This is accomplished using the MATCH function: e.g. cell F21 contains the formula =MATCH(A21,E$21:E$30,0).

Finally, we use the INDEX function to list the data elements in sorted order: e.g. cell G21 contains the formula =INDEX(B$21:B$30,F21). Note that the empty entry is replaced by the error code #N/A placed at the bottom of the list. As in Example 1 we can replace these error codes by a blank (or anything else).

Sorting alphanumeric data w/o duplicates using Excel formulas

Example 4: Sort the data from Example 3 dropping any duplicate entries.

Sorting text without duplicates

 Figure 4 – Sorting alphanumeric data without duplicates

We worked hard to sort repeated elements in Example 3. This time we want to drop these elements. The trick this time is to get the resulting blanks to move to the bottom of the list (as #N/A or blanks). The required formulas for the first row are shown in Figure 5.

Sort no duplicates formulas

Figure 5 – Sample formulas from Figure 4

Note that the revised ranking in column E is required to get the blank entries to occur at the bottom of the list.

Observation: The Real Statistics Resource Pack provides a number of worksheet functions and data analysis tools that provide sorting and other reformatting capabilities, as described below.

Worksheet Functions

Real Statistics Functions: The following array functions contained in the Real Statistics Resource Pack can be helpful in sorting data and removing duplicates.

QSORT(R1, ascend, nrows, ncols) – returns an nrows ⨯ ncols array with the data in range R1 in sorted order (arranged by columns) ); ascend is an optional parameter (default = TRUE); if ascend is TRUE then the sort is in ascending order and if ascend is FALSE (or 0) the sort is in descending order

SortUnique(R1, filler) – fills the highlighted range with the data in range R1 in ascending sorted order eliminating any duplicates (arranged by columns); the highlighted range should only have one column

ExtractUnique(R1, filler) – fills the highlighted range with the data in range R1 in the order found in R1 (arranged by columns) eliminating any duplicates; the highlighted range should only have one column

NODUPES(R1, filler, b) – fills the highlighted range with the data in range R1 eliminating any duplicates (arranged by columns); if b = TRUE then the data in R1 is sorted first, and if b = FALSE then it is assumed that range R1 is already in sorted order.

The string filler is used as a filler in case the output range has more cells than R1. This second argument is optional and defaults to the error value #N/A.

For QSORT, if nrows = 0 or is omitted then the output array is the highlighted range on the active worksheet. If nrows < 0 then the output is an array of the same size and shape as R1; in both these cases, ncols  is not used and can be omitted; otherwise ncols defaults to 1.

For the above functions, data in a range is assumed to be ordered first by column and then by row. E.g. for the range A1:B2, the presumed order is A1, A2, B1, B2.

Dynamic Real Statistics Functions

There are also the following functions that are useful with dynamic arrays:

SortsUnique(R1) – outputs a column array with the data in range R1 in ascending sorted order eliminating any duplicates

ExtractsUnique(R1) – outputs a column array with the data in range R1 in the order found in R1 eliminating any duplicates

12 thoughts on “Sorting and Removing Duplicates”

  1. Thanks, this was useful for my work. However…

    You’ve lost data! In your numeric example your sorted-without-duplicates data no longer contains a 2. This is because your method is not robust for the final row.

    The same thing happened in my work and because I’m in a hurry I did an inelegant work around but perhaps you could fix elegantly for others?

    Reply
    • Ed,
      Thank you very much for finding this mistake. I have just corrected the error. The instructions now say that you should put the formula =COUNTIF(I7:I$16,”=”&I6) in cell I6.
      I really appreciate your diligence and help in making the website better.
      Charles

      Reply
    • Mitchell,

      I used the following formulas:

      cell O6 =IF(COUNTIF(N7:N$15,”=”&N6)=0,N6,””)
      cell P6 =SMALL(O$6:O$15,ROW(N6)-ROW(N$6)+1)
      cell R6 =IF(ISNUMBER(P6),P6,””)

      An alternative form for cell R6 is

      cell R6 =IFERROR(SMALL(O$6:O$15,ROW(N6)-ROW(N$6)+1),””)

      This approach only requires the original data (in column N) and column O.

      Charles

      Reply
  2. Hi Sir, Can we sort out data when column B in your example 1 is filled with with formula,

    Like B1 = COUNTIF(X$1:X$100,X1) and B is Spread to B100

    It is showing #ref

    Reply
    • Leela,

      I start with the following data in column A (starting in cell A1). I then insert the formula =COUNTIF($A$1:$A$7,A1) in cell B1. Next I copy this formula down (i.e. I highlight range B1:B7 and then press Ctrl-D). Finally, I highlight the range C1:C7, insert the array formula =QSORT(B1:B7) and press Ctrl-Shft-Enter. The result is that the values in column B are sorted.

      A B C
      34 2 1
      12 3 1
      12 3 2
      25 1 2
      34 2 3
      15 1 3
      12 3 3

      If I can do this with 7 rows, you should be able to do it with 100 rows.

      Charles

      Reply
  3. Figure 5 sample formulas…
    D36 — return only 0’s
    E36–return only 0’s
    F36– error
    G36 — error

    Can you please reply with full set of formula from A36 to G36 for sorting alphanumeric values… or send me excel sample file.. I will grateful to you.

    thanks
    Muza

    Reply
  4. Sir

    It seems there are something wrong with functions: SortUnique(R1, s); ExtractUnique(R1, s); NODUPES(R1, s, b). When the last number of R1 is a duplicate, e.g. R1=(4,5,2,9,2,-1,4,7,0,2), these three functions treat “2” as a unique number.

    Colin

    Reply
    • Colin,
      I have tried using SortUnique and NODUPES with the data range you have listed and the functions do what they are supposed to do (i.e. 2 is not repeated in the output). Please clarify or send me a worksheet with the example.
      Charles

      Reply

Leave a Comment