
Master STDEV, PERCENTILE, FREQUENCY, RANK, CORREL — and actually know when to use each one.
What You’ll Learn
Statistical functions in Excel saved a financial analyst three hours last week — and nearly got a different one fired the week before, because she used the wrong version of STDEV on a population dataset and presented the results to a board meeting. That second story is more common than anyone admits. Excel does not warn you when you pick the slightly-wrong function. It just quietly gives you a slightly-wrong answer, and you carry on with your day.
Here is the thing: the problem with statistical functions is almost never syntax. You can look up syntax in thirty seconds. The problem is decision-making. STDEV.S or STDEV.P? RANK.EQ or RANK.AVG? PERCENTILE.INC or PERCENTILE.EXC? These are not interchangeable, and most tutorials treat them like they are.
This is Lesson 37 at XplorExcel.com, and it is built around one goal: giving you the decision logic, not just the formula. By the end, you will know which function to reach for, why, and how to combine them into real analytical workflows that hold up in a professional setting.
Before anything else, there is something you need to understand about how Excel handles statistical functions — because if you skip this, you will probably use the wrong one without realising it.
When Microsoft updated its statistical functions, they did not remove the originals. They just added new versions with dotted names alongside the old ones. So now you have STDEV and STDEV.S living next to each other, doing almost the same thing but not quite. Same with RANK and RANK.EQ. PERCENTILE and PERCENTILE.INC. The old versions still work, which means Excel gives you zero indication that you might be using a legacy function.
Think of it like having two GPS apps on your phone. Both get you somewhere. One is three years out of date. Your phone does not flag which is which.
From this point on in this lesson, we use only the modern dotted versions: STDEV.S, STDEV.P, RANK.EQ, RANK.AVG, PERCENTILE.INC, PERCENTILE.EXC. If you are working from old files that use the legacy names, the results will likely be the same, but it is worth updating them for precision.
⚠️ COMMON MISTAKE
Using STDEV without knowing it defaults to sample-based logic. If your data represents a full population rather than a subset, switch to STDEV.P. The difference seems small, but in formal analysis it matters — and Excel will never warn you that you are using the wrong one.
AVERAGE adds everything up and divides by the count. Simple. Also unreliable the moment you have outliers. One executive salary in a dataset of 150 employees can drag the average up by thousands and make the typical salary look much higher than it actually is.
MEDIAN returns the middle value. It does not care about extremes. For most real-world business data, MEDIAN tells the more honest story. Use AVERAGE on clean, symmetrical data with no significant outliers. Use MEDIAN when you know or suspect skew.
MODE.SNGL returns the single most frequently occurring value. MODE.MULT returns all values that tie for most frequent — but it must be entered as an array formula, and if there are no repeating values at all, MODE.SNGL throws a NUM error. Worth checking before you build anything that depends on it.
Use MODE when frequency of occurrence is the actual insight you need, not the mathematical average.
STDEV.S is for samples. A sample is a subset of a larger group. If you surveyed 200 customers out of a total customer base of 50,000, your 200 responses are a sample. STDEV.S applies Bessel’s correction, which slightly increases the result to compensate for a sample underestimating how spread out the full population really is.
STDEV.P is for populations. A population means every single member of the group you are measuring. If you have every sales figure for every day of the year and want to describe the spread of that specific year, that is a population. No correction needed.
Syntax
=STDEV.S(number1, [number2], …)
=STDEV.P(number1, [number2], …)
Advertisement-X
=VAR.S(number1, [number2], …)
=VAR.P(number1, [number2], …)
VAR.S and VAR.P follow the same sample versus population logic but return variance instead of standard deviation. Variance is standard deviation squared. Standard deviation is almost always easier to report because it lives in the same unit as your original data. Variance is more useful when you are feeding results into further statistical calculations.
💡 PRO TIP
Want to flag outliers automatically? Add a helper column with this formula: =(A2-AVERAGE($A$2:$A$31))/STDEV.S($A$2:$A$31). Any result above 2 or below -2 is more than two standard deviations from the mean. That is your outlier threshold. Colour-code it with conditional formatting and you have a live anomaly detector.
Real-world example: An HR analyst working with salary data from 150 employees — a sample from a larger multinational — uses STDEV.S to calculate spread and flags anyone earning more than two standard deviations above the mean. That could indicate a retention risk, or a data entry error. Either way, it is worth investigating.
The 75th percentile does not mean 75% of the maximum value. It means 75% of values in the dataset fall below that point. That distinction matters when you start using this in reports.
PERCENTILE.INC(array, k) returns the value at the k-th percentile, where k is a decimal between 0 and 1. The INC means the range is inclusive: 0 and 1 are valid inputs. For most business use cases, this is the right choice.
PERCENTILE.EXC(array, k) excludes the endpoints. Valid k values run strictly between 0 and 1. This is the statistically stricter version, preferred in academic and research contexts.
Syntax
=PERCENTILE.INC(array, k)
=PERCENTILE.EXC(array, k)
=QUARTILE.INC(array, quart) → quart: 1=25th, 2=50th, 3=75th
QUARTILE.INC is a convenience shortcut that applies the same logic for the four standard quartile points. QUARTILE.INC(array, 1) gives the 25th percentile. QUARTILE.INC(array, 2) gives the median. QUARTILE.INC(array, 3) gives the 75th percentile.
⚠️ COMMON MISTAKE
Treating percentile and percentage as the same thing. A student who scores 62 out of 100 on a difficult exam might still be in the 85th percentile if most others scored below 62. The percentile is about relative position within the group — not the absolute value.
Real-world example: A sales manager uses PERCENTILE.INC to divide the team into four performance tiers. Reps below the 25th percentile go into a coaching programme. Reps above the 75th percentile qualify for the accelerated bonus structure. This approach adjusts naturally as the team grows or changes, unlike fixed revenue targets that go stale.
Trust me on this one: FREQUENCY is the most incorrectly used statistical function in Excel. Not because it is complicated, but because most tutorials show you the syntax and skip the part about how to actually enter it. Then you copy the formula, get a column of zeros, and assume you did something wrong.
FREQUENCY(data_array, bins_array) counts how many values from your data fall into each bin. The bins array contains the upper boundary of each interval. The result is always one row longer than your bins array, because the final cell captures all values that exceed the highest bin.
How to Enter the FREQUENCY Function Correctly
💡 PRO TIP
On Excel 365 or Excel 2019+, FREQUENCY spills automatically without the Ctrl + Shift + Enter step. But if you are sharing files with colleagues on older versions, use the array entry method anyway to avoid broken formulas on their end.
Once your FREQUENCY output is ready, select it alongside your bin labels and insert a clustered bar chart. Set the gap width to zero in the format options and you have a histogram. For the full chart setup process, see Lesson 38: Dashboards in Excel for how to take this output further into a complete analytical dashboard.
Real-world example: A university administrator uses FREQUENCY to count how many students fell in each 10-point score band across 300 exam results. The histogram reveals a bimodal distribution — two clear peaks rather than one. That is a sign the exam may have tested two different skill levels in one paper, and it prompts a curriculum review.
RANK.EQ(value, ref, order): when two values tie, they both get the same rank, and the next rank is skipped. Two people tied at 3rd both get rank 3. Nobody gets rank 4. For the order argument: 0 means descending (highest = rank 1).
RANK.AVG(value, ref, order): when two values tie, they both get the average of the ranks they would have occupied. Two people tied for 3rd and 4th place both get 3.5.
| Situation | Use RANK.EQ | Use RANK.AVG |
|---|---|---|
| Leaderboard or report | ✓ Whole-number ranks, ties share lowest | — |
| Competition or scoring | ✓ Clear tied position | — |
| Feeding into further stats | — | ✓ Preserves statistical properties |
| Academic ranking analysis | — | ✓ No artificial rank gaps |
RANK.EQ updates automatically when your source data changes. Combine it with LARGE(array, 1), LARGE(array, 2), LARGE(array, 3) to pull the top three values into a summary section, then use INDEX MATCH to retrieve the corresponding names. For conditional ranking workflows — ranking within a group rather than across the whole dataset — see Lesson 36: Financial Functions in Excel for the conditional aggregation logic that pairs naturally with this approach.
CORREL(array1, array2) returns the Pearson correlation coefficient between two datasets. The result always sits between -1 and 1. Close to 1 means the two variables tend to rise and fall together. Close to -1 means when one rises, the other tends to fall. Close to 0 means no clear linear relationship.
Here is the thing everyone gets wrong about correlation: a high CORREL result does not mean one thing causes the other. Ice cream sales and drowning rates are positively correlated. The cause is summer, not ice cream. Always pair correlation findings with actual domain knowledge before making decisions.
Syntax
=CORREL(array1, array2)
=COVARIANCE.S(array1, array2)
A marketing analyst uses CORREL to compare monthly ad spend against monthly revenue across 24 months. A strong positive correlation supports the argument for increasing the budget. A weak or negative correlation suggests the spend is not the driver — and warrants a harder look at where revenue is actually coming from. COVARIANCE.S measures the same direction without standardising the result, making it harder to interpret standalone. CORREL is more practical for most analytical work.
Rather than ranking all sales reps across the whole company, you can rank within regions by counting how many values in the same region exceed the current value, then adding 1. No helper columns needed. This is where COUNTIFS becomes your best friend — and it is worth revisiting the array formula mechanics from Lesson 36 if nested logic is causing headaches.
Define your bins, enter FREQUENCY as an array formula, select the output alongside your labels, insert a bar chart, set gap width to zero. That is a complete distributional view of your data from a single function. The full Excel statistical analysis workflow, from raw numbers to chart, takes under five minutes once you have the bin setup right.
Blanks are handled silently. STDEV.S, AVERAGE, and CORREL skip blank cells automatically — but they also skip numbers stored as text. That means your sample size might be smaller than you think. Quick check: COUNT returns only numeric cells; COUNTA returns all non-empty cells. If those two numbers do not match, something in your range is formatted as text.
⚠️ COMMON MISTAKE
NUM errors in PERCENTILE mean your k value is outside the valid range — check that it sits between 0 and 1 inclusive for PERCENTILE.INC.
DIV/0 errors in AVERAGE appear when there are no numeric values in the range at all. Wrap with IFERROR if the range might be empty.
FREQUENCY returning all zeros almost always means one of two things: the data range and bins are on completely different scales, or the formula was not entered as an array. Re-enter using Ctrl + Shift + Enter.
| Function | What It Does | Use When |
|---|---|---|
| AVERAGE | Mean of numeric values | Clean, symmetrical data |
| MEDIAN | Middle value | Skewed data, outliers present |
| MODE.SNGL | Most frequent value | Frequency is the insight |
| STDEV.S | Standard deviation (sample) | Subset of a larger population |
| STDEV.P | Standard deviation (population) | Full dataset, no sampling |
| PERCENTILE.INC | Value at k-th percentile | Business benchmarking |
| FREQUENCY | Count values in each bin | Distributions, histograms |
| RANK.EQ | Rank, ties share lowest | Leaderboards, competitions |
| RANK.AVG | Rank, ties get average | Statistical calculations |
| CORREL | Pearson correlation (-1 to 1) | Relationship analysis |
| LARGE / SMALL | k-th largest / smallest value | Dynamic top-N extracts |
🧪 Try It Yourself
Build a dataset of 30 fictional employee salaries ranging from around 28,000 to 120,000. Then work through each step below in order.
The point is not just to get a result — it is to notice what each function reveals about the same dataset that the others miss entirely.
📚 External Resources
Official parameter definitions and edge-case behaviour for all STDEV variants.
Visual examples and alternative approaches for array-based formulas. Worth bookmarking.
Previous Lesson
← Financial Functions in ExcelNext Lesson
Dashboards in Excel →Advertisement-X