Count Duplicate Number Sets With Excel Formula

Do you love a complicated formula problem in Excel? A while ago, there was a “Count the Duplicate Number Sets” challenge, here on the Contextures blog. I posted my solution, and other people shared their methods for solving that problem. You can still get the Excel workbook, and try it for yourself. And here’s another solution, from David Newell.

David Newell’s Solution

Thanks for sending your solution, and all the details, David!

I’ll let David take it from here, and explain his solution, and how he created it. I’ve just added a bit of formatting, and more headings, so it’s easier to read. Also, I added space characters in the long formulas, to help them wrap better, in this blog post.

NOTE: There’s a link at the end of the post, where you can go to download the workbook with David’s solution, and the original Challenge worksheet.

David’s Introduction

My name is David Newell, and I’ve been following Contextures for years; so I must start by thanking Debra for this incredible opportunity to author a guest post on her site.

I host a twice weekly Excel user group where I work and have long looked to Debra and others for the most current information on the world of Excel and for inspiration on topics that might be of interest to my group.

One of my favorite things are challenge problems that I can use to hone my skills, and a particular challenge from more than two years ago is the premise of this post.

Problem with Original Solution

The first thing I noticed was that the proposed formula using MMULT and COUNTIF could be fooled if any individual number was repeated in its own row.

For example, if two rows looked like this – both containing 1,1,2,4,5,6 with the 1 appearing twice – then that formula returned a count of zero.

Revised Formula – Long & Scary!

At the time, I wasn’t blessed with access to Office 365 or even 2016 and could only devise a scary looking formula that worked just for sufficiently small numeric values.

But at least it seemed to handle the repeated number nuance (remembering to execute with Ctrl+Shift+Enter before Excel 365).

=SUMPRODUCT(–(FREQUENCY(MMULT(POWER(2,tblNums[[N01]:[N06]]), TRANSPOSE(SIGN(COLUMN(tblNums[[#Headers],[N01]:[N06]])))), MMULT(POWER(2,tblNums[[N01]:[N06]]), TRANSPOSE(SIGN(COLUMN(tblNums[[#Headers], [N01]:[N06]])))))>1))

Without getting into the weeds of formula dissection, the point was to look for rows with identical binary sums.

Revisit the Challenge

More recently, I’ve been able to get my hands on Office 365 and eventually remembered that I’d hoped to revisit Debra’s challenge once I had more tools at my disposal.

So, let’s have a look at what I think is a better solution.

Count Duplicate Number Sets – New Tools

Ultimately, the idea is to sort each row independently, which is now a piece of cake with functions like TEXTJOIN and SORT and a helper column or two.

Therefore, I ramped things up a notch and sought additional methods to:

count the number of duplicates without a helper column,
return the sorted data for all rows as a single spill array, and
deliver the count of unique sets from a single formula.

Use Variables with LET Function

To make things easier to follow, consider this small dataset:

And the following “variables” will be used by the LET function:

Data: tblNum[[N01]:[N03]]
Vals: COUNTA(Data)
Cols: COLUMNS(Data)
Unform: INDEX(Data,QUOTIENT(SEQUENCE(Vals,,0),Cols)+1, MOD(SEQUENCE(Vals,,0),Cols)+1)
Trick: QUOTIENT(SEQUENCE(Vals,,Vals-1,-1), Cols)+1/Unform
Resort: SORTBY(Unform,Trick,-1)
Reform: INDEX(Resort,SEQUENCE(ROWS(Data),Cols))

Formula with Variables

Using these variables, it is possible to count how many rows contain the number set of any given row…

SUMPRODUCT(–(MMULT(–(Reform=INDEX(Reform, ROW(tblNum[@])-ROW(tblNum[#Headers]),0)), SEQUENCE(Cols,,,0))=Cols))

…or to count the unique number sets in the entire table…

ROWS(UNIQUE(Reform))

How It Works

To quote Debra in her original article, “it’s complicated” to explain, but I think we can do it.

The first three variables are easy enough.

Data represents the 3×3 grid of cells that comprise the rows of columns N01 through N03.
Then Vals tells us there are 9 total values in that range, and
Cols is 3 for the number of columns involved.

The Unform Variable

Unform is a well-established but uncommon technique for turning a 2-d array into a 1-d array.

It uses QUOTIENT and MOD to create a numeric pattern that allows INDEX to grab the values in a particular order.

In this case, that pattern would be…

INDEX({1;1;1;2;2;2;3;3;3},{1;2;3;1;2;3;1;2;3})

…or as viewed in a kind of row/column syntax by pairing the two lists…

INDEX(r1c1;r1c2;r1c3;r2c1;r2c2;r2c3;r3c1;r3c2;r3c3)

…for a result of…

1;2;3;4;6;5;3;1;2

The Trick Variable

Now we have a 1-d array that can easily be sorted, only we need a way to ensure each subset of three values is sorted in place.

That’s where the Trick variable comes in.

It starts much like Unform in giving us a set of repeating integers…

QUOTIENT(SEQUENCE(Vals,,Vals-1,-1),Cols) = 2;2;2;1;1;1;0;0;0

…but then modifies those values to create a useful scaled variation by adding…

1/Unform = 1;0.5;0.3333;0.25;0.1666;0.2

…where the result is a set of values that, when sorted in descending order, will keep each row/group together while still arranging each group as desired…

3;2.5;2.3333;1.25;1.1667;1.2;0.3333;1;0.5

The Resort Variable

The Resort variable then sorts Unform using Trick in descending order to get something like this…

The Reform Variable

The final step is to Reform the data into the same number of rows and columns as the original data.

This is accomplished using an appropriately arranged SEQUENCE array…

SEQUENCE(ROWS(Data),Cols) = 1,2,3;4,5,6;7,8,9

…or…

…which is partnered with INDEX and leads to the three original rows now sorted independently…

Wrapping Things Up

With the newly rearranged data, we can turn to the SUMPRODUCT-MMULT formula to get a duplicate count for each row as illustrated in the “Dup” column of the earlier image…

=LET(Data,tblNum[[N01]:[N03]],Vals,COUNTA(Data),Cols, COLUMNS(Data),Unform, INDEX(Data, QUOTIENT(SEQUENCE(Vals,,0),Cols)+1, MOD(SEQUENCE(Vals,,0),Cols)+1),Trick, QUOTIENT(SEQUENCE(Vals,,Vals-1,-1),Cols)+1/Unform,Resort,SORTBY(Unform,Trick,-1), Reform,INDEX(Resort, SEQUENCE(ROWS(Data),Cols)), SUMPRODUCT(–(MMULT(–(Reform=INDEX(Reform,ROW(tblNum[@])-ROW(tblNum[#Headers]),0)), SEQUENCE(Cols,,,0))=Cols)))

…or use a simple ROWS-UNIQUE combo to determine the count of unique number sets…

=LET(Data,tblNum[[N01]:[N03]],Vals, COUNTA(Data),Cols, COLUMNS(Data),Unform, INDEX(Data,QUOTIENT(SEQUENCE(Vals,,0),Cols)+1, MOD(SEQUENCE(Vals,,0),Cols)+1),Trick, QUOTIENT(SEQUENCE(Vals,,Vals-1,-1),Cols)+1/Unform,Resort, SORTBY(Unform,Trick,-1), Reform, INDEX(Resort, SEQUENCE(ROWS(Data),Cols)), ROWS(UNIQUE(Reform)))

What About Textual Data?

Textual data can be handled in almost the same way. In fact, the approach for textual data will work for numeric data as well.

It involves another well-established but uncommon technique that uses MATCH (or XMATCH since we’re in 365) to assign a unique numeric value to every original value.

Consider if an Unform list of text values looked like this along with the result of =XMATCH(Unform,Unform)…

Use the XMATCH Result

We see that:

alpha is the first value in the list, so XMATCH returns 1 for every occurrence of alpha.
Likewise, beta is the second value and always returns 2.
Then gamma is always 3
delta is always 4, and so on.

So this XMATCH result, let’s call it Locate, can take the place of Unform in the Trick calculation, and everything else continues as before.

The example file contains both numeric and textual examples to help you follow along.

Revised Formulas For Textual Data

Thus the variants of the two previous formulas for handling textual data are…

=LET(Data,tblTxt[[T01]:[T03]],Vals, COUNTA(Data),Cols, COLUMNS(Data),Unform, INDEX(Data, QUOTIENT(SEQUENCE(Vals,,0),Cols)+1, MOD(SEQUENCE(Vals,,0),Cols)+1), Locate, XMATCH(Unform,Unform),Trick, QUOTIENT(SEQUENCE(Vals,,Vals-1,-1),Cols)+1/Locate, Resort, SORTBY(Unform,Trick,-1),Reform, INDEX(Resort,SEQUENCE(ROWS(Data),Cols)), SUMPRODUCT(–(MMULT(–(Reform=INDEX(Reform, ROW(tblTxt[@])-ROW(tblTxt[#Headers]),0)), SEQUENCE(Cols,,,0))=Cols)))

…and…

=LET(Data,tblTxt[[T01]:[T03]],Vals,COUNTA(Data),Cols, COLUMNS(Data),Unform, INDEX(Data, QUOTIENT(SEQUENCE(Vals,,0), Cols)+1, MOD(SEQUENCE(Vals,,0),Cols)+1), Locate, XMATCH(Unform,Unform), Trick, QUOTIENT(SEQUENCE(Vals,,Vals-1,-1),Cols)+1/Locate, Resort, SORTBY(Unform,Trick,-1), Reform,INDEX(Resort, SEQUENCE(ROWS(Data), Cols)), ROWS(UNIQUE(Reform)))

What About Sorting Columns Instead of Rows?

The techniques are very much the same, just using ROWS(Data) instead of COLUMNS(Data) and swapping some assignments for QUOTIENT and MOD.

The example file contains an example of this as well.

David’s Conclusion

Thank you again to Debra for allowing me to author this post and for the work I know she had to do in “converting” the information from my document so that it would present well online. I hope some of you will find this useful or at least interesting.

Get David’s Sample Workbook

To get David’s workbook, and to see the original data set for the formula challenge, click here to download the zipped Excel file.

The workbook is in xlsx format, and there are no macros or VBA code in the file.

And thanks again to David, for sharing his formula challenge solution, and his detailed (and well-written) description of how it works.

__________________________