- Tue 14 April 2020
- Misc
- Gordon McDonald
- #Excel, #spreadsheets
If you're used to programming in R or Python and suddenly have to do everything in Excel due to client requirements or something, you may repeatedly find yourself wondering "How do I do this thing I always do? Shouldn't it be easy?". Well the answer likely is no, it isn't as easy as it should be, but yes, there is a way.
Checking a cell for the presence of one of multiple partial text string matches
Say the data has been entered really non-uniformly. Sometimes it says 'Yes', sometimes it's a 1 etc. like in the following table:
Yes value | No value | Missing data |
---|---|---|
Yes | No | n/a |
Yes (unneccesary details) | No (unneccesary details) | - |
1 | 0 | |
I think so | Not this time | NA |
Any cell can contain one of these options and the unnecesary details might change. So really you want to search for a list of items and if any appear, mark it as true.
Example code to search any of the 'Yes' values and recode them as TRUE
:
= OR( NOT( ISERROR( SEARCH({ "Yes" , "1" , "I think so" } , cell_to_search ) ) ) )
How does it work?
- This is feeding a cell array of possible yes values
{ "Yes" , "1" , "I think so" }
into the search function along with the cell to search. This will output a cell array which contains a 1 where it matched the search string, and a #VALUE error where it didn't, for example if the cell contents was "Yes blah", it will return{1, #VALUE, #VALUE}
- The
ISERROR()
function will turn this into true/false values e.g.{FALSE, TRUE, TRUE}
- The
NOT()
function negates it to{TRUE, FALSE, FALSE}
- The
OR()
function concatenates it appropriately to a singleTRUE
value output.
Finding the unique elements in an array
If you want to find the unique elements in a column and return them sorted, it's easy. Just use =SORT(UNIQUE(cells))
. However if you want to do this for values occuring across multiple columns in an array, it's annoying.
First, select your array and give it a name, e.g. MyArray
. Then, use this horrible formula:
=SORT(UNIQUE(INDEX(MyArray,1+INT((ROW(INDIRECT("1:"&COLUMNS(MyArray)*ROWS(MyArray)))-1)/COLUMNS(MyArray)),MOD(ROW(INDIRECT("1:"&COLUMNS(MyArray)*ROWS(MyArray)))-1+COLUMNS(MyArray),COLUMNS(MyArray))+1)))
How does it work? Lets break it into parts:
- Array element iterator
ROW(INDIRECT("1:"&COLUMNS(MyArray)*ROWS(MyArray)))
. Using the formula=ROW(INDIRECT("1:10"))
will give me a cell array of{1;2;3;4;5;6;7;8;9;10}
. So to get an iterator to go through all the elements in the array, I useROW(INDIRECT("1:"&COLUMNS(MyArray)*ROWS(MyArray)))
. Let's call this iteratori
now. - Indexing the array
INDEX(array, row, column)
will pull out an element of the array. so we do appropriate modulo arithmatic on the iteratori
to get the column number -MOD(i-1+COLUMNS(MyArray),COLUMNS(MyArray))+1
, and take the floor of a division (using theINT()
function) to get the row number as row =1+INT((i-1)/COLUMNS(MyArray))
. So we getINDEX(MyArray,1+INT((i-1)/COLUMNS(MyArray)),MOD(i-1+COLUMNS(MyArray),COLUMNS(MyArray))+1)
- Now we have a vertical array/column (let's call it
col
) containing all the data which was in the array. We just have to applySORT(UNIQUE(col))
to get out only the unique elements, and sorted in alphabetical order.
Excel gotchas
- Referencing the contents of a cell will return the contents of the cell, unless it is blank. A blank cell will turn into a zero. Thus missing data can magically turn into non-missing data in the middle of your analysis. Instead of
=cell_ref
use=if(isblank(cell_ref),"",cell_ref)
to keep blanks blank. - Checking if a cell is blank using
isblank(cell_ref)
will return false if the cell contains a formula which returns a blank string as a value. In this case usecell_ref=""
instead.
Examples
In the file excel_formulas_example.xlsx are examples of the things described above.