Data Cleansing Techniques in Excel: Fix Messy Spreadsheets Fast
Messy data is the silent killer of spreadsheet accuracy. A single trailing space in a vendor name breaks your VLOOKUP. One date stored as text ruins your ageing calculations. Duplicate rows inflate your totals without warning.
Data cleansing is not glamorous work, but it is the difference between reports you can trust and reports that quietly lie to you.
Why Data Gets Messy
Before fixing the problem, it helps to understand where it comes from:
- Manual entry — different people type "ABC Services", "abc services", and "ABC Services Ltd"
- Copy-paste from PDFs — invisible characters, line breaks, and encoding artifacts sneak in
- System exports — different software formats dates, currencies, and numbers differently
- Merged data sources — combining spreadsheets from multiple departments introduces inconsistencies
The good news: Excel has built-in tools for every one of these problems.
Removing Duplicates
Duplicates are the most common data quality issue. Excel makes them easy to find and remove.
The Built-In Method
- Select your data range (including headers)
- Go to Data → Remove Duplicates
- Choose which columns to check — selecting all columns finds exact row duplicates, while selecting fewer columns (like Invoice Number alone) finds logical duplicates
Excel removes the duplicates and tells you how many were deleted.
Finding Duplicates Without Deleting
Sometimes you want to see duplicates before removing them. Use conditional formatting:
- Select the column to check (e.g., Invoice Number)
- Go to Home → Conditional Formatting → Highlight Cells Rules → Duplicate Values
- Choose a highlight colour
Now duplicates are visually flagged. Review them manually — sometimes duplicates are legitimate (two invoices from the same vendor on the same date for different work).
COUNTIF for Duplicate Detection
For more control, add a helper column with COUNTIF:
=COUNTIF(A:A, A2)
Any value greater than 1 indicates a duplicate. Filter the column to show only values above 1 to isolate all duplicate rows at once.
Fixing Text Inconsistencies
Text problems are the hardest to spot because cells look correct at a glance.
Trailing and Leading Spaces
The most common invisible problem. "ABC Services" and "ABC Services " look identical but are not equal in formulas.
=TRIM(A2)
TRIM removes all leading spaces, trailing spaces, and reduces multiple internal spaces to single spaces. Apply it to an entire column using a helper column, then paste values over the original.
Inconsistent Capitalisation
"abc services", "ABC SERVICES", and "Abc Services" are three different values to Excel.
=PROPER(A2) → "Abc Services"
=UPPER(A2) → "ABC SERVICES"
=LOWER(A2) → "abc services"
PROPER is usually the right choice for vendor and client names. UPPER works for invoice reference codes where all-caps is the standard.
Non-Printable Characters
Data copied from PDFs or web pages often contains hidden characters that break comparisons. CLEAN removes non-printable characters:
=CLEAN(A2)
For stubborn cases, combine CLEAN with TRIM:
=TRIM(CLEAN(A2))
This handles both invisible characters and extra spaces in a single formula.
Find and Replace for Bulk Fixes
When you know the specific inconsistency, Find and Replace (Ctrl+H) is the fastest fix:
- Replace "Ltd." with "Ltd" across all cells
- Replace double spaces with single spaces
- Replace a misspelled vendor name globally
Use the Options button to match case or search within formulas versus values.
Standardising Date Formats
Mixed date formats cause silent calculation errors. "01/02/2026" could be January 2nd or February 1st depending on locale.
Identifying the Problem
Dates stored as text look like dates but do not work in calculations. Test with:
=ISNUMBER(A2)
If this returns FALSE for a cell that looks like a date, the date is stored as text.
Converting Text Dates
For consistent text date formats, use DATEVALUE:
=DATEVALUE(A2)
Then format the result cell as a date. For inconsistent formats, you may need to construct the date manually:
=DATE(RIGHT(A2,4), MID(A2,4,2), LEFT(A2,2))
This extracts year, month, and day from specific positions in the text string. Adjust the positions based on your actual format.
The Nuclear Option: Text to Columns
If date conversion formulas get too complex:
- Select the date column
- Go to Data → Text to Columns
- Choose Delimited, click Next, then Next again
- In step 3, set the column format to Date and choose the correct format (DMY, MDY, YMD)
- Click Finish
Excel re-parses the entire column with the format you specified.
Cleaning Number Formats
Numbers stored as text are another silent problem. Excel sometimes flags these with a small green triangle in the corner of the cell.
Detecting Text Numbers
=ISNUMBER(A2)
Returns FALSE if the number is stored as text. Also check for:
=TYPE(A2)
Returns 1 for numbers and 2 for text.
Converting Text to Numbers
Method 1: Multiply by 1
=A2*1
This forces Excel to treat the value as a number.
Method 2: VALUE function
=VALUE(A2)
Explicitly converts text to a number.
Method 3: Paste Special
- Type the number 1 in an empty cell
- Copy that cell
- Select your text-number range
- Paste Special → Multiply
This multiplies every cell by 1, converting text to numbers in place without helper columns.
Removing Currency Symbols and Commas
Imported data often includes formatting characters:
=VALUE(SUBSTITUTE(SUBSTITUTE(A2, "$", ""), ",", ""))
This strips the dollar sign and commas, then converts to a number. Add more SUBSTITUTE layers for other symbols (£, €, spaces used as thousand separators).
Splitting and Combining Data
Text to Columns for Splitting
Data crammed into single columns needs splitting. Common cases:
- Full names → First name + Last name
- Addresses → Street, City, Postcode
- Combined codes → "INV-2026-001" → Type, Year, Number
Use Data → Text to Columns with the appropriate delimiter (comma, space, hyphen).
Flash Fill for Pattern-Based Splitting
Excel's Flash Fill (Ctrl+E) detects patterns from examples:
- In column B, manually type the first name from the first full name in column A
- Start typing the second first name
- Press Ctrl+E — Excel fills the rest of the column based on the pattern
Flash Fill works for extraction, reformatting, and combining — any transformation where you can show it a pattern through examples.
CONCAT and TEXTJOIN for Combining
To merge data from multiple columns:
=TEXTJOIN(" - ", TRUE, A2, B2, C2)
TEXTJOIN is more flexible than CONCAT because it lets you specify a delimiter and skip empty cells.
Validating Cleaned Data
After cleansing, verify your work:
Check for Remaining Issues
=SUMPRODUCT((LEN(A2:A100)-LEN(TRIM(A2:A100))))
Returns 0 if no cells in the range have extra spaces. Any positive number means spaces remain.
Count Unique Values
=SUMPRODUCT(1/COUNTIF(A2:A100, A2:A100))
Compare this to your expected number of unique vendors or categories. A mismatch suggests remaining inconsistencies.
Data Validation for Prevention
Once your data is clean, add validation rules to keep it that way:
- Select the column
- Go to Data → Validation
- Set rules: dropdown lists for categories, number ranges for amounts, date ranges for dates
Prevention is always cheaper than cleanup.
The Fastest Path to Clean Data
Manual cleansing works for one-off spreadsheets. But if you regularly import data from invoices, reports, or external systems, the pattern repeats endlessly.
The most effective approach is to clean data at the point of entry. If invoice data arrives already structured — vendor names normalised, dates formatted consistently, amounts as proper numbers — you skip the cleansing step entirely.
Automated extraction tools that convert invoices to structured Excel data solve this at the source. The data arrives clean because a machine read it consistently, not because three different people typed it three different ways.
Summary
Data cleansing in Excel comes down to six core techniques: removing duplicates with COUNTIF and Remove Duplicates, fixing text with TRIM, CLEAN, and PROPER, standardising dates with DATEVALUE and Text to Columns, converting text-numbers with VALUE and Paste Special, splitting data with Text to Columns and Flash Fill, and validating with SUMPRODUCT checks. Master these and your spreadsheets become reliable. Better yet, get clean data from the start and skip the cleanup entirely.