statistical functions in Excel — XplorExcel tutorial
Lesson 37 Advanced 12 min read

Statistical Functions in Excel: The Advanced Guide

Master STDEV, PERCENTILE, FREQUENCY, RANK, CORREL — and actually know when to use each one.

What You’ll Learn

  • The difference between STDEV.S and STDEV.P — and when each one applies
  • How to correctly enter the FREQUENCY function as an array formula
  • RANK.EQ vs RANK.AVG — and why the difference matters in real reports
  • How PERCENTILE.INC actually works (it is not the same as a percentage)
  • How to combine statistical functions into complete analytical workflows

Statistical functions in Excel saved a financial analyst three hours last week — and nearly got a different one fired the week before, because she used the wrong version of STDEV on a population dataset and presented the results to a board meeting. That second story is more common than anyone admits. Excel does not warn you when you pick the slightly-wrong function. It just quietly gives you a slightly-wrong answer, and you carry on with your day.

Here is the thing: the problem with statistical functions is almost never syntax. You can look up syntax in thirty seconds. The problem is decision-making. STDEV.S or STDEV.P? RANK.EQ or RANK.AVG? PERCENTILE.INC or PERCENTILE.EXC? These are not interchangeable, and most tutorials treat them like they are.

This is Lesson 37 at XplorExcel.com, and it is built around one goal: giving you the decision logic, not just the formula. By the end, you will know which function to reach for, why, and how to combine them into real analytical workflows that hold up in a professional setting.

The Naming Mess Excel Never Bothered to Clean Up

Before anything else, there is something you need to understand about how Excel handles statistical functions — because if you skip this, you will probably use the wrong one without realising it.

Ads loading…

When Microsoft updated its statistical functions, they did not remove the originals. They just added new versions with dotted names alongside the old ones. So now you have STDEV and STDEV.S living next to each other, doing almost the same thing but not quite. Same with RANK and RANK.EQ. PERCENTILE and PERCENTILE.INC. The old versions still work, which means Excel gives you zero indication that you might be using a legacy function.

Think of it like having two GPS apps on your phone. Both get you somewhere. One is three years out of date. Your phone does not flag which is which.

From this point on in this lesson, we use only the modern dotted versions: STDEV.S, STDEV.P, RANK.EQ, RANK.AVG, PERCENTILE.INC, PERCENTILE.EXC. If you are working from old files that use the legacy names, the results will likely be the same, but it is worth updating them for precision.

⚠️ COMMON MISTAKE

Using STDEV without knowing it defaults to sample-based logic. If your data represents a full population rather than a subset, switch to STDEV.P. The difference seems small, but in formal analysis it matters — and Excel will never warn you that you are using the wrong one.

Average, Median, and Mode: They Are Not Interchangeable

Why MEDIAN Often Beats AVERAGE in Skewed Data

AVERAGE adds everything up and divides by the count. Simple. Also unreliable the moment you have outliers. One executive salary in a dataset of 150 employees can drag the average up by thousands and make the typical salary look much higher than it actually is.

MEDIAN returns the middle value. It does not care about extremes. For most real-world business data, MEDIAN tells the more honest story. Use AVERAGE on clean, symmetrical data with no significant outliers. Use MEDIAN when you know or suspect skew.

MODE.SNGL vs MODE.MULT — Handling Multi-Modal Datasets

MODE.SNGL returns the single most frequently occurring value. MODE.MULT returns all values that tie for most frequent — but it must be entered as an array formula, and if there are no repeating values at all, MODE.SNGL throws a NUM error. Worth checking before you build anything that depends on it.

Use MODE when frequency of occurrence is the actual insight you need, not the mathematical average.

Measuring Spread: The Excel STDEV Function and When Each Version Applies

STDEV.S vs STDEV.P — The Sample vs Population Decision

STDEV.S is for samples. A sample is a subset of a larger group. If you surveyed 200 customers out of a total customer base of 50,000, your 200 responses are a sample. STDEV.S applies Bessel’s correction, which slightly increases the result to compensate for a sample underestimating how spread out the full population really is.

STDEV.P is for populations. A population means every single member of the group you are measuring. If you have every sales figure for every day of the year and want to describe the spread of that specific year, that is a population. No correction needed.

Syntax

=STDEV.S(number1, [number2], …)

=STDEV.P(number1, [number2], …)

Ads loading…

Advertisement-X

=VAR.S(number1, [number2], …)

=VAR.P(number1, [number2], …)

VAR.S and VAR.P — When You Need Variance Instead

VAR.S and VAR.P follow the same sample versus population logic but return variance instead of standard deviation. Variance is standard deviation squared. Standard deviation is almost always easier to report because it lives in the same unit as your original data. Variance is more useful when you are feeding results into further statistical calculations.

💡 PRO TIP

Want to flag outliers automatically? Add a helper column with this formula: =(A2-AVERAGE($A$2:$A$31))/STDEV.S($A$2:$A$31). Any result above 2 or below -2 is more than two standard deviations from the mean. That is your outlier threshold. Colour-code it with conditional formatting and you have a live anomaly detector.

Real-world example: An HR analyst working with salary data from 150 employees — a sample from a larger multinational — uses STDEV.S to calculate spread and flags anyone earning more than two standard deviations above the mean. That could indicate a retention risk, or a data entry error. Either way, it is worth investigating.

Excel PERCENTILE Formula: What Percentile Actually Means

PERCENTILE.INC vs PERCENTILE.EXC — Which to Use and When

The 75th percentile does not mean 75% of the maximum value. It means 75% of values in the dataset fall below that point. That distinction matters when you start using this in reports.

PERCENTILE.INC(array, k) returns the value at the k-th percentile, where k is a decimal between 0 and 1. The INC means the range is inclusive: 0 and 1 are valid inputs. For most business use cases, this is the right choice.

PERCENTILE.EXC(array, k) excludes the endpoints. Valid k values run strictly between 0 and 1. This is the statistically stricter version, preferred in academic and research contexts.

Syntax

=PERCENTILE.INC(array, k)

=PERCENTILE.EXC(array, k)

=QUARTILE.INC(array, quart)  → quart: 1=25th, 2=50th, 3=75th

QUARTILE.INC — Splitting Data Into Quarters

QUARTILE.INC is a convenience shortcut that applies the same logic for the four standard quartile points. QUARTILE.INC(array, 1) gives the 25th percentile. QUARTILE.INC(array, 2) gives the median. QUARTILE.INC(array, 3) gives the 75th percentile.

⚠️ COMMON MISTAKE

Treating percentile and percentage as the same thing. A student who scores 62 out of 100 on a difficult exam might still be in the 85th percentile if most others scored below 62. The percentile is about relative position within the group — not the absolute value.

Real-world example: A sales manager uses PERCENTILE.INC to divide the team into four performance tiers. Reps below the 25th percentile go into a coaching programme. Reps above the 75th percentile qualify for the accelerated bonus structure. This approach adjusts naturally as the team grows or changes, unlike fixed revenue targets that go stale.

Excel FREQUENCY Function: Building Distributions the Right Way

Setting Up Bins Correctly (A Step Most Tutorials Skip)

Trust me on this one: FREQUENCY is the most incorrectly used statistical function in Excel. Not because it is complicated, but because most tutorials show you the syntax and skip the part about how to actually enter it. Then you copy the formula, get a column of zeros, and assume you did something wrong.

FREQUENCY(data_array, bins_array) counts how many values from your data fall into each bin. The bins array contains the upper boundary of each interval. The result is always one row longer than your bins array, because the final cell captures all values that exceed the highest bin.

Entering FREQUENCY as an Array Formula

How to Enter the FREQUENCY Function Correctly

  1. Put your data in a column — for example, exam scores in A2:A101.
  2. Define your bins in a separate column. For exam scores, enter 50, 60, 70, 80, 90 in D2:D6.
  3. Select the output range. Five bins means six output cells — select E2:E7.
  4. Type the formula: =FREQUENCY(A2:A101, D2:D6)
  5. Press Ctrl + Shift + Enter to enter it as an array formula. Curly braces will appear around the formula in the formula bar.
  6. Read the output: values at or below 50, between 51–60, 61–70, 71–80, 81–90, and everything above 90.

💡 PRO TIP

On Excel 365 or Excel 2019+, FREQUENCY spills automatically without the Ctrl + Shift + Enter step. But if you are sharing files with colleagues on older versions, use the array entry method anyway to avoid broken formulas on their end.

Connecting FREQUENCY Output to a Histogram Chart

Once your FREQUENCY output is ready, select it alongside your bin labels and insert a clustered bar chart. Set the gap width to zero in the format options and you have a histogram. For the full chart setup process, see Lesson 38: Dashboards in Excel for how to take this output further into a complete analytical dashboard.

Real-world example: A university administrator uses FREQUENCY to count how many students fell in each 10-point score band across 300 exam results. The histogram reveals a bimodal distribution — two clear peaks rather than one. That is a sign the exam may have tested two different skill levels in one paper, and it prompts a curriculum review.

Excel RANK Formula: RANK.EQ vs RANK.AVG and Why the Difference Matters

What Happens When Two Values Tie — And Why It Matters

RANK.EQ(value, ref, order): when two values tie, they both get the same rank, and the next rank is skipped. Two people tied at 3rd both get rank 3. Nobody gets rank 4. For the order argument: 0 means descending (highest = rank 1).

RANK.AVG(value, ref, order): when two values tie, they both get the average of the ranks they would have occupied. Two people tied for 3rd and 4th place both get 3.5.

SituationUse RANK.EQUse RANK.AVG
Leaderboard or report✓ Whole-number ranks, ties share lowest
Competition or scoring✓ Clear tied position
Feeding into further stats✓ Preserves statistical properties
Academic ranking analysis✓ No artificial rank gaps

Building a Dynamic Leaderboard with RANK.EQ + LARGE

RANK.EQ updates automatically when your source data changes. Combine it with LARGE(array, 1), LARGE(array, 2), LARGE(array, 3) to pull the top three values into a summary section, then use INDEX MATCH to retrieve the corresponding names. For conditional ranking workflows — ranking within a group rather than across the whole dataset — see Lesson 36: Financial Functions in Excel for the conditional aggregation logic that pairs naturally with this approach.

CORREL and COVARIANCE.S: Does One Thing Actually Drive Another?

Reading a Correlation Coefficient Correctly

CORREL(array1, array2) returns the Pearson correlation coefficient between two datasets. The result always sits between -1 and 1. Close to 1 means the two variables tend to rise and fall together. Close to -1 means when one rises, the other tends to fall. Close to 0 means no clear linear relationship.

Here is the thing everyone gets wrong about correlation: a high CORREL result does not mean one thing causes the other. Ice cream sales and drowning rates are positively correlated. The cause is summer, not ice cream. Always pair correlation findings with actual domain knowledge before making decisions.

Syntax

=CORREL(array1, array2)

=COVARIANCE.S(array1, array2)

Practical Example: Does Marketing Spend Correlate With Revenue?

A marketing analyst uses CORREL to compare monthly ad spend against monthly revenue across 24 months. A strong positive correlation supports the argument for increasing the budget. A weak or negative correlation suggests the spend is not the driver — and warrants a harder look at where revenue is actually coming from. COVARIANCE.S measures the same direction without standardising the result, making it harder to interpret standalone. CORREL is more practical for most analytical work.

Combining Statistical Functions in Excel for Real Analysis

RANK Inside IF for Conditional Ranking

Rather than ranking all sales reps across the whole company, you can rank within regions by counting how many values in the same region exceed the current value, then adding 1. No helper columns needed. This is where COUNTIFS becomes your best friend — and it is worth revisiting the array formula mechanics from Lesson 36 if nested logic is causing headaches.

FREQUENCY + Chart Pipeline for Grade Distribution

Define your bins, enter FREQUENCY as an array formula, select the output alongside your labels, insert a bar chart, set gap width to zero. That is a complete distributional view of your data from a single function. The full Excel statistical analysis workflow, from raw numbers to chart, takes under five minutes once you have the bin setup right.

Common Errors and How to Avoid Them

Blanks and Text Values Breaking Statistical Functions

Blanks are handled silently. STDEV.S, AVERAGE, and CORREL skip blank cells automatically — but they also skip numbers stored as text. That means your sample size might be smaller than you think. Quick check: COUNT returns only numeric cells; COUNTA returns all non-empty cells. If those two numbers do not match, something in your range is formatted as text.

The #NUM! and #DIV/0! Errors Explained

⚠️ COMMON MISTAKE

NUM errors in PERCENTILE mean your k value is outside the valid range — check that it sits between 0 and 1 inclusive for PERCENTILE.INC.

DIV/0 errors in AVERAGE appear when there are no numeric values in the range at all. Wrap with IFERROR if the range might be empty.

FREQUENCY returning all zeros almost always means one of two things: the data range and bins are on completely different scales, or the formula was not entered as an array. Re-enter using Ctrl + Shift + Enter.

Quick Reference: Statistical Functions Cheat Sheet

FunctionWhat It DoesUse When
AVERAGEMean of numeric valuesClean, symmetrical data
MEDIANMiddle valueSkewed data, outliers present
MODE.SNGLMost frequent valueFrequency is the insight
STDEV.SStandard deviation (sample)Subset of a larger population
STDEV.PStandard deviation (population)Full dataset, no sampling
PERCENTILE.INCValue at k-th percentileBusiness benchmarking
FREQUENCYCount values in each binDistributions, histograms
RANK.EQRank, ties share lowestLeaderboards, competitions
RANK.AVGRank, ties get averageStatistical calculations
CORRELPearson correlation (-1 to 1)Relationship analysis
LARGE / SMALLk-th largest / smallest valueDynamic top-N extracts

🧪 Try It Yourself

Build a dataset of 30 fictional employee salaries ranging from around 28,000 to 120,000. Then work through each step below in order.

  1. Calculate AVERAGE and MEDIAN. Write down both and note the gap — it tells you something about the shape of your data.
  2. Run STDEV.S across all 30 salaries. Add a helper column: =(A2-AVERAGE($A$2:$A$31))/STDEV.S($A$2:$A$31). Flag any result above 2 or below -2.
  3. Use PERCENTILE.INC to find the 25th, 50th, and 75th percentile values. Compare the 50th result to your MEDIAN — they should match.
  4. Set up bins at 30000, 50000, 70000, 90000, 110000 in a separate column. Select six output cells, enter FREQUENCY with Ctrl + Shift + Enter. Verify the counts add up to 30.
  5. Use RANK.EQ to rank all 30 salaries from highest to lowest.
  6. Add a second column of fictional years of experience. Run CORREL between salary and experience. Is the relationship positive? How strong?

The point is not just to get a result — it is to notice what each function reveals about the same dataset that the others miss entirely.

📚 External Resources

Microsoft Docs — STDEV.S Function Reference

Official parameter definitions and edge-case behaviour for all STDEV variants.

Exceljet — FREQUENCY Function Guide

Visual examples and alternative approaches for array-based formulas. Worth bookmarking.

Ads loading…

Advertisement-X