Back to Blog
Excel Tips

Data Cleansing Techniques in Excel: Fix Messy Spreadsheets Fast

12 February 2026·Zepely Team·9 min read

Messy data is the silent killer of spreadsheet accuracy. A single trailing space in a vendor name breaks your VLOOKUP. One date stored as text ruins your ageing calculations. Duplicate rows inflate your totals without warning.

Data cleansing is not glamorous work, but it is the difference between reports you can trust and reports that quietly lie to you.

Why Data Gets Messy

Before fixing the problem, it helps to understand where it comes from:

  • Manual entry — different people type "ABC Services", "abc services", and "ABC Services Ltd"
  • Copy-paste from PDFs — invisible characters, line breaks, and encoding artifacts sneak in
  • System exports — different software formats dates, currencies, and numbers differently
  • Merged data sources — combining spreadsheets from multiple departments introduces inconsistencies

The good news: Excel has built-in tools for every one of these problems.

Removing Duplicates

Duplicates are the most common data quality issue. Excel makes them easy to find and remove.

The Built-In Method

  1. Select your data range (including headers)
  2. Go to Data → Remove Duplicates
  3. Choose which columns to check — selecting all columns finds exact row duplicates, while selecting fewer columns (like Invoice Number alone) finds logical duplicates

Excel removes the duplicates and tells you how many were deleted.

Finding Duplicates Without Deleting

Sometimes you want to see duplicates before removing them. Use conditional formatting:

  1. Select the column to check (e.g., Invoice Number)
  2. Go to Home → Conditional Formatting → Highlight Cells Rules → Duplicate Values
  3. Choose a highlight colour

Now duplicates are visually flagged. Review them manually — sometimes duplicates are legitimate (two invoices from the same vendor on the same date for different work).

COUNTIF for Duplicate Detection

For more control, add a helper column with COUNTIF:

=COUNTIF(A:A, A2)

Any value greater than 1 indicates a duplicate. Filter the column to show only values above 1 to isolate all duplicate rows at once.

Fixing Text Inconsistencies

Text problems are the hardest to spot because cells look correct at a glance.

Trailing and Leading Spaces

The most common invisible problem. "ABC Services" and "ABC Services " look identical but are not equal in formulas.

=TRIM(A2)

TRIM removes all leading spaces, trailing spaces, and reduces multiple internal spaces to single spaces. Apply it to an entire column using a helper column, then paste values over the original.

Inconsistent Capitalisation

"abc services", "ABC SERVICES", and "Abc Services" are three different values to Excel.

=PROPER(A2)    → "Abc Services"
=UPPER(A2)     → "ABC SERVICES"
=LOWER(A2)     → "abc services"

PROPER is usually the right choice for vendor and client names. UPPER works for invoice reference codes where all-caps is the standard.

Non-Printable Characters

Data copied from PDFs or web pages often contains hidden characters that break comparisons. CLEAN removes non-printable characters:

=CLEAN(A2)

For stubborn cases, combine CLEAN with TRIM:

=TRIM(CLEAN(A2))

This handles both invisible characters and extra spaces in a single formula.

Find and Replace for Bulk Fixes

When you know the specific inconsistency, Find and Replace (Ctrl+H) is the fastest fix:

  • Replace "Ltd." with "Ltd" across all cells
  • Replace double spaces with single spaces
  • Replace a misspelled vendor name globally

Use the Options button to match case or search within formulas versus values.

Standardising Date Formats

Mixed date formats cause silent calculation errors. "01/02/2026" could be January 2nd or February 1st depending on locale.

Identifying the Problem

Dates stored as text look like dates but do not work in calculations. Test with:

=ISNUMBER(A2)

If this returns FALSE for a cell that looks like a date, the date is stored as text.

Converting Text Dates

For consistent text date formats, use DATEVALUE:

=DATEVALUE(A2)

Then format the result cell as a date. For inconsistent formats, you may need to construct the date manually:

=DATE(RIGHT(A2,4), MID(A2,4,2), LEFT(A2,2))

This extracts year, month, and day from specific positions in the text string. Adjust the positions based on your actual format.

The Nuclear Option: Text to Columns

If date conversion formulas get too complex:

  1. Select the date column
  2. Go to Data → Text to Columns
  3. Choose Delimited, click Next, then Next again
  4. In step 3, set the column format to Date and choose the correct format (DMY, MDY, YMD)
  5. Click Finish

Excel re-parses the entire column with the format you specified.

Cleaning Number Formats

Numbers stored as text are another silent problem. Excel sometimes flags these with a small green triangle in the corner of the cell.

Detecting Text Numbers

=ISNUMBER(A2)

Returns FALSE if the number is stored as text. Also check for:

=TYPE(A2)

Returns 1 for numbers and 2 for text.

Converting Text to Numbers

Method 1: Multiply by 1

=A2*1

This forces Excel to treat the value as a number.

Method 2: VALUE function

=VALUE(A2)

Explicitly converts text to a number.

Method 3: Paste Special

  1. Type the number 1 in an empty cell
  2. Copy that cell
  3. Select your text-number range
  4. Paste Special → Multiply

This multiplies every cell by 1, converting text to numbers in place without helper columns.

Removing Currency Symbols and Commas

Imported data often includes formatting characters:

=VALUE(SUBSTITUTE(SUBSTITUTE(A2, "$", ""), ",", ""))

This strips the dollar sign and commas, then converts to a number. Add more SUBSTITUTE layers for other symbols (£, €, spaces used as thousand separators).

Splitting and Combining Data

Text to Columns for Splitting

Data crammed into single columns needs splitting. Common cases:

  • Full names → First name + Last name
  • Addresses → Street, City, Postcode
  • Combined codes → "INV-2026-001" → Type, Year, Number

Use Data → Text to Columns with the appropriate delimiter (comma, space, hyphen).

Flash Fill for Pattern-Based Splitting

Excel's Flash Fill (Ctrl+E) detects patterns from examples:

  1. In column B, manually type the first name from the first full name in column A
  2. Start typing the second first name
  3. Press Ctrl+E — Excel fills the rest of the column based on the pattern

Flash Fill works for extraction, reformatting, and combining — any transformation where you can show it a pattern through examples.

CONCAT and TEXTJOIN for Combining

To merge data from multiple columns:

=TEXTJOIN(" - ", TRUE, A2, B2, C2)

TEXTJOIN is more flexible than CONCAT because it lets you specify a delimiter and skip empty cells.

Validating Cleaned Data

After cleansing, verify your work:

Check for Remaining Issues

=SUMPRODUCT((LEN(A2:A100)-LEN(TRIM(A2:A100))))

Returns 0 if no cells in the range have extra spaces. Any positive number means spaces remain.

Count Unique Values

=SUMPRODUCT(1/COUNTIF(A2:A100, A2:A100))

Compare this to your expected number of unique vendors or categories. A mismatch suggests remaining inconsistencies.

Data Validation for Prevention

Once your data is clean, add validation rules to keep it that way:

  1. Select the column
  2. Go to Data → Validation
  3. Set rules: dropdown lists for categories, number ranges for amounts, date ranges for dates

Prevention is always cheaper than cleanup.

The Fastest Path to Clean Data

Manual cleansing works for one-off spreadsheets. But if you regularly import data from invoices, reports, or external systems, the pattern repeats endlessly.

The most effective approach is to clean data at the point of entry. If invoice data arrives already structured — vendor names normalised, dates formatted consistently, amounts as proper numbers — you skip the cleansing step entirely.

Automated extraction tools that convert invoices to structured Excel data solve this at the source. The data arrives clean because a machine read it consistently, not because three different people typed it three different ways.

Summary

Data cleansing in Excel comes down to six core techniques: removing duplicates with COUNTIF and Remove Duplicates, fixing text with TRIM, CLEAN, and PROPER, standardising dates with DATEVALUE and Text to Columns, converting text-numbers with VALUE and Paste Special, splitting data with Text to Columns and Flash Fill, and validating with SUMPRODUCT checks. Master these and your spreadsheets become reliable. Better yet, get clean data from the start and skip the cleanup entirely.

Ready to automate your invoices?

Stop typing numbers into cells. Upload your invoices to Zepely and get a clean Excel spreadsheet in seconds.

Upload Invoices
No account needed. Pay only for what you use.