Descriptive Statistics Tool In Excel For Mac

Excel for Office 365 Excel for Office 365 for Mac Excel 2019 Excel 2016 Excel 2019 for Mac Excel 2013 Excel 2010 Excel 2007 Excel 2016 for Mac Excel for Mac 2011 More. Less If you need to develop complex statistical or engineering analyses, you. To perform data analysis on the remainder of the worksheets, recalculate the analysis tool for each worksheet. The Analysis ToolPak includes the tools described in the following sections. To access these tools, click Data Analysis in the Analysis group on the Data tab. The Descriptive Statistics feature of data analysis tools is part of the 'Analysis ToolPak' add-in provided with Windows Excel, and it is not available for Excel 2011 for Mac. Instead, Microsoft recommends a third-party alternative.

Most of the time when you run statistics, you want to use statistical software. These tools are built to do calculations like t-tests, chi-square tests, correlations, and so on. Excel isn’t meant for data analysis. But that doesn’t mean you can’t do it.

Unfortunately, Excel’s statistical functions aren’t always intuitive. And they usually give you esoteric results. So instead of using stats functions, we’re going to use the go-to Excel statistics add-in: the Data Analysis Toolpak.

The Toolpak, despite its rather unfortunate spelling, includes a wide range of useful statistics functionality. Let’s see what we can do with Excel statistics.

Adding the Excel Data Analysis Toolpak

While you can do stats without the Data Analysis Toolpak, it’s much easier with it. To install the Toolpak in Excel 2016, go to File > Options > Add-ins.

Click Go next to “Manage: Excel Add-ins.”

In the resulting window, check the box next to Analysis Toolpak and then click OK.

If you correctly added the Data Analysis Toolpak to Excel, you’ll see a Data Analysis button in the Data tab, grouped into the Analysis section:

If you want even more power, be sure to check out Excel’s other add-insPower Up Excel with 10 Add-Ins to Process, Analyze & Visualize Data Like a ProPower Up Excel with 10 Add-Ins to Process, Analyze & Visualize Data Like a ProVanilla Excel is amazing, but you can make it even more powerful with add-ins. Whatever data you need to process, chances are someone created an Excel app for it. Here's a selection.Read More.

Descriptive Statistics in Excel

No matter what statistical test you’re running, you probably want to get Excel’s descriptive statistics first. This will give you information on means, medians, variance, standard deviation and error, kurtosis, skewness, and a variety of other figures.

Running descriptive statistics in Excel is easy. Click Data Analysis in the Data tab, select Descriptive Statistics, and select your input range. Click the arrow next to the input range field, click-and-drag to select your data, and hit Enter (or click the corresponding down arrow), as in the GIF below.

After that, make sure to tell Excel whether your data has labels, if you want the output in a new sheet or on the same one, and if you want summary statistics and other options.

After that, hit OK, and you’ll get your descriptive statistics:

Student’s t-Test in Excel

The t-test is one of the most basic statistical tests, and it’s easy to compute in Excel with the Toolpak. Click the Data Analysis button and scroll down until you see the t-test options.

You have three choices:

  • t-Test: Paired Two Sample for Means should be used when your measurements or observations were paired. Use this when you took two measurements of the same subjects, such as measuring blood pressure before and after an intervention.
  • t-Test: Two-Sample Assuming Equal Variances should be used when your measurements are independent (which usually means they were done on two different subject groups). We’ll discuss the “equal variances” part in a moment.
  • t-Test: Two-Sample Assuming Unequal Variances is also for independent measurements, but is used when your variances are unequal.

To test whether the variances of your two samples are equal, you’ll need to run an F-test. Find F-Test Two-Sample for Variances in the Analysis Tools list, select it, and click OK.

Enter your two datasets in the input range boxes. Leave the alpha value at 0.05 unless you have reason to change it — if you don’t know what that means, just leave. Finally, click OK.

Excel will give you the results in a new sheet (unless you selected Output Range and a cell in your current sheet):

You’re looking at the P-value here. If it’s less than 0.05, you have unequal variances. So to run the t-test, you should use the unequal variances option.

To run a t-test, select the appropriate test from the Analysis Tools window and select both sets of your data in the same manner as you did for the F-test. Leave the alpha value at 0.05, and hit OK.

The results include everything you need to report for a t-test: the means, degrees of freedom (df), t statistic, and the P-values for both one- and two-tailed tests. If the P-value is less than 0.05, the two samples are significantly different.

If you’re not sure whether to use a one- or two-tailed t-test, check out this explainer from UCLA.

ANOVA in Excel

The Excel Data Analysis Toolpak offers three types of analysis of variance (ANOVA). Unfortunately, it doesn’t give you the ability to run the necessary follow-up tests like Tukey or Bonferroni. But you can see if there’s a relationship between a few different variables.

Here are the three ANOVA tests in Excel:

  • ANOVA: Single Factor analyzes variance with one dependent variable and one independent variable. It’s preferable to using multiple t-tests when you have more than two groups.
  • ANOVA: Two-Factor with Replication is similar to the paired t-test; it involves multiple measurements on single subjects. The “two-factor” part of this test indicates that there are two independent variables.
  • ANOVA: Two-Factor without Replication involves two independent variables, but no replication in measurement.

We’ll be going over the single-factor analysis here. In our example, we’ll be looking at three sets of numbers, labeled “Intervention 1,” “Intervention 2,” and “Intervention 3.” To run an ANOVA, click Data Analysis, then select ANOVA: Single Factor.

Select the input range and make sure to tell Excel whether your groups are in columns or rows. I’ve also selected “Labels in first row” here so that the group names are displayed in the results.

After hitting OK, we get the following results:

Note that the P-value is less than 0.05, so we have a significant result. That means there’s a significant difference between at least two of the groups in the test. But because Excel doesn’t provide tests to determine which groups differ, the best you can do is look at the averages displayed in the summary. In our example, Intervention 3 looks like it’s probably the one that differs.

3m sponge brand

This isn’t statistically sound. But if you just want to see if there’s a difference, and see which group is probably causing it, it’ll work.

Two-factor ANOVA is more complicated. If you want to learn more about when to use the two-factor method, see this video from Sophia.org and the “without replication” and “with replication” examples from Real Statistics.

Correlation in Excel

Calculating correlation in Excel is much simpler than the t-test or an ANOVA. Use the Data Analysis button to open the Analysis Tools window and select Correlation.

Select your input range, identify your groups as columns or rows, and tell Excel whether you have labels. After that, hit OK.

You won’t get any measures of significance, but you can see how each group is correlated with the others. A value of one is an absolute correlation, indicating that the values are exactly the same. The closer to one the correlation value, the stronger the correlation.

Regression in Excel

Regression is one of the most commonly used statistical tests in industry, and Excel packs a surprising amount of power for this calculation. We’ll run a quick multiple regression in Excel here. If you’re not familiar with regression, check out HBR’s guide to using regression for business.

Let’s say our dependent variable is blood pressure, and our two independent variables are weight and salt intake. We want to see which is a better predictor of blood pressure (or if they’re both good).

Click Data Analysis and select Regression. You need to be careful when filling out the input range boxes this time. The Input Y Range box should contain your single dependent variable. The Input X Range box can include multiple independent variables. For a simple regression, don’t worry about the rest (though remember to tell Excel if you selected labels).

Here’s what our calculation looks like:

After hitting OK, you’ll get a big list of results. I’ve highlighted the P-value here for both weight and salt intake:

As you can see, the P-value for weight is greater than 0.05, so there’s no significant relationship there. The P-value for salt, however, is below 0.05, indicating that it’s a good predictor of blood pressure.

If you’re planning on presenting your regression data, remember that you can add a regression line to a scatterplot in Excel. It’s a great visual aidHow to Visualize Your Data Analysis with Excel's Power ToolsHow to Visualize Your Data Analysis with Excel's Power ToolsExcel is killing it with its advanced data management features. Once you have used one of the new tools, you will want them all. Become a master of your data analysis with power tools!Read More for this analysis.

Excel Statistics: Surprisingly Capable

While Excel isn’t known for its statistical power, it actually packs some really useful functionality. Especially once you download the Data Analysis Toolpak statistics add-in. I hope you’ve learned how to use the Toolpak, and that you can now play around on your own to figure out how to use more of its functions.

With this now under your belt, take your Excel skills to the next level with our articles on using Excel’s Goal Seek feature for more data crunching, mastering IF statements in Excel, searching for values with vlookup4 Excel Lookup Functions to Search Spreadsheets Efficiently4 Excel Lookup Functions to Search Spreadsheets EfficientlySearching a large Excel spreadsheet isn't always easy. Use lookup formulas to save time and search spreadsheets efficiently.Read More, and adding drop-down lists as cells in Excel.

I’ve also linked to other sites that have good statistics tutorials where we had to skip over confusing concepts. Be sure to check out our guide to free statistics resourcesLearn Statistics for Free with These 6 ResourcesLearn Statistics for Free with These 6 ResourcesStatistics has a reputation of a subject that's difficult to understand. But learning from the right resource will help you understand survey results, election reports, and your stats class assignments in no time.Read More, too.

Explore more about: Microsoft Excel, Spreadsheet.

  1. Thanks for the article. I think there is a little mistake (probably an oversight!) in doing F-test.
    When performing an F-test in excel, you HAVE to make sure that the variance of variable 1 is greater than the variance of variable 2 due to the fact that F-value = var(X1)/var(X2).
    In your case, you have to switch the columns!!

  2. Excellent statistical topics, and article, in Excel usage. I seem to find nothing more interesting than reading about, and practicing, the vast potentials of Excel and various programming languages. There is also the fact of making one's self more marketable, for employment, by honing these technological skills. Once considered an Excel guru, I realized that what I did not know, or what I fell out of touch with, reduced my expertise (VBA). Now I treat every reading of Excel as if to be a new beginning. Along the way I have stumbled into some pretty neat clarifications and tricks that have triggered an broad array of ideas. The quest for what I do not know is everlasting. Thank you!

Home > Articles

  1. Descriptive Statistics in Excel
Page 1 of 5Next >

Drawing on his immense experience helping organizations gain value from statistical methods, Conrad Carlberg shows when and how to use Excel, when and how to use R instead, and how to use them together to get the best from both. Here he discusses how descriptive statistics tools in Excel and R can help you understand the distribution of the variables in your data set.

This chapter is from the book
R for Microsoft® Excel Users: Making the Transition for Statistical Analysis

This chapter is from the book

This chapter is from the book

R for Microsoft® Excel Users: Making the Transition for Statistical Analysis

Regardless of the sort of analysis you have in mind for a particular data set, you want to understand the distribution of the variables in that set. The reasons vary from the mundane (someone entered an impossible value for a variable) to the technical (different sample sizes accompanying different variances).

Any of those events could happen, whether the source of the data is a sales ledger, a beautifully designed medical experiment or a study of political preferences. No matter what the cause, if your data set contains any unexpected values you want to know about it. Then you can take steps to correct data entered in error, or to adjust your decision rules if necessary, or even to replicate the experiment if it looks like something might have gone wrong with the methodology.

The point is that sophisticated multivariate analyses such as factor analysis with Varimax rotation or Cox Proportional Hazards Regression do not alert you when someone entered a patient’s body temperature on Wednesday morning as 986 degrees instead of 98.6 degrees. In that case, the results of your sophisticated procedure might turn out cockeyed, but you would have no special reason to suspect a missing decimal point as the cause of your findings.

You can save yourself a lot of subsequent grief if you just look over some preliminary descriptive statistics based on your data set. If a mean value, the range of the observed values, or their standard deviation looks unusual, you probably should verify and validate the way the data is collected, entered and stored before too much time passes.

Descriptive Statistics in Excel

If you’re using Excel to analyze the data, either as a preliminary check or as your principal numeric application, one way to carry out this sort of work is to point Excel’s various worksheet functions at the data set. MIN( ) gets you the minimum in a set of values, MAX( ) gets you the maximum, and MAX( ) − MIN( ) gets you the range. COUNT( ) counts the numeric values, AVERAGE( ) returns the mean, and either STDEV.S( ) or STDEV.P( ) gets you the standard deviation.

Excel comes equipped with a Descriptive Statistics tool in the Data Analysis add-in (which was at one time termed the Analysis ToolPak or ATP). The Descriptive Statistics tool is good news and bad news.

The good news is that using the tool on an existing data set gets you up to 16 different descriptive statistics, and you don’t have to enter a single function on the worksheet. See Figure 2.1.

Figure 2.1 You can analyze multiple variables in one pass.

Figure 2.1 shows 11 values of two different variables in the range A1:B12. A univariate analysis of the variable named MPG appears in the range E3:F18, and a similar analysis of IPS appears in G3:H18. Notice that each statistic reported pertains to one variable only: None of the statistics correlates, for example, MPG with IPS, or reports the means of IPS according to specific values of MPG. The reported statistics are exclusively univariate.

You get them without having to know that Excel has a STDEV.S( ) function that reports the standard deviation of a sample of records, or that the standard errors reported in the fourth row are the standard error of the mean of each variable. The reported statistics are largely useful when you’re getting acquainted with a data set. You’ll normally want to know a lot more than that, of course, but it’s a good place to start.

The bad news is that the reported statistics represent static values. For example, in Figure 2.1, cell F3 contains the value 26 rather than this formula:

=AVERAGE(A2:A12)

Therefore, if you want to change the input values in A1:A12 by adding or deleting a record, or editing an existing value, none of the statistics would change in response. A worksheet formula recalculates when the values it points to change, but the results calculated by the Descriptive Statistics tool do not: They don’t point to anything because they’re just numbers.

Using the Descriptive Statistics Tool

Using the tool is easy enough. You’ll want your active worksheet to have one or more lists or tables, such as the two shown in columns A and B of Figure 2.1. Go to the Ribbon’s Data tab and locate the Data Analysis link in the Analyze group. When you click it, you see the list box shown in Figure 2.2.

Figure 2.2 Descriptive Statistics is the list box entry you want for these analyses.

Click Descriptive Statistics and click OK. You’ll get the dialog box shown in Figure 2.3.

Figure 2.3 It’s usually best to keep the output on the active worksheet.

Then take these steps:

  1. Click in the Input Range box and drag through the range your data occupies. In Figure 2.1, that’s A1:B12. If you have one variable, the range might be A1:A25, or if you have three variables, the range might be B1:D85. Don’t worry if your variables have different numbers of records, but leave unoccupied cells empty and don’t substitute #N/A for an empty cell.

  2. Accept the default of Grouped By Columns.

  3. If the first row in the input range has labels such as variable names, fill the Labels in First Row checkbox. This instructs Excel not to treat a label as a legitimate data value, which would usually result in an error message that the input range contains non-numeric data.

  4. Click the Output Range option button, unless you want the results to be written to a new workbook or a new worksheet. (Worksheet Ply is just an old term for worksheet.) Careful! If you click the Output Range option button, the Input Range box is re-activated and gets filled with any cell or range that you click next. First, click the Output Range edit box and only then indicate where you want the output to begin.

  5. Fill the Summary Statistics checkbox if you want the statistics shown in row 3 through row 15 in Figure 2.1.

  6. Fill the Confidence Level for Mean checkbox if you want to put a confidence interval around the mean. Enter the confidence level you want in the edit box (often that will be 90, 95, or 99, taken as a percent).

  7. Fill the Kth Largest and the Kth smallest checkboxes if you want that information. Also supply a value for K. That is, if you want the 5th largest value, fill the checkbox and enter 5 in the edit box. (I can’t recall the last time I needed either of these two statistics.)

  8. Click OK. Within a few seconds you should see results such as those shown in columns E through H of Figure 2.1.

Understanding the Results

Here’s a closer look at some of the statistics shown in Figure 2.1. Most of them are precisely what you would expect (mean, median, mode, range, minimum, maximum, sum, count, kth largest, and smallest values). The following may require a little additional information.

Standard Error of the Mean

Descriptive Statistics Tool In Excel For Mac

Suppose that the 11 values in A2:A12 of Figure 2.1 are a sample from a population. Now suppose that you took hundreds or even thousands of similar 11-record samples from the same population. Each of those samples would have its own mean value, such as the one shown in cell F3 of Figure 2.1. If you calculated the standard deviation of all those mean values, you would have a statistic called the standard error of the mean. The value in cell F4 of Figure 2.1 estimates that value, so that you don’t have to actually take hundreds or thousands of additional samples. You can calculate that estimate using this formula:

where:

  • is the standard error of the mean.

  • S is the sample standard deviation.

  • n is the number of records in the sample (the count).

The standard error of the mean is often useful when you want to test the difference between an obtained sample mean and a hypothesized value. It is also an integral part of a confidence interval placed around a sample mean.

Standard Deviation

Excel (and the general field of statistics) offers two types of standard deviation:

  • Your data constitutes a population. For example, you might have 100 items made in a special production run, after which the mold was broken.

  • Your data constitutes a sample. You might have 100 items sampled randomly from an ongoing process that yields millions of products per year.

In the first case, where you have an entire population of values, the formula for the standard deviation uses the actual count of values, n, in its denominator. In the second case you are estimating the population’s standard deviation on the basis of your sample, and you use n1 instead of n in the denominator.

In Excel, you use the worksheet function STDEV.P( ) when your data is the population. You use STDEV.S( ) when you have a sample. Possession of a sample occurs much more frequently than possession of a population, so the Descriptive Statistics tool returns the value that you would get if you were using the STDEV.S( ) function.

Sample Variance

The variance is the square of the standard deviation, and it’s subject to the same conditions as the standard deviation: That is, the variance has a population form (n) and a sample form (n − 1). The Descriptive Statistics tool returns the sample form of the variance.

Skewness

Mining target

Skewness measures the symmetry in a distribution of values. An asymmetric distribution is said to be skewed. One popular way of calculating skewness is the average of the cubed standard scores (also termed z-scores):

The formula that Excel uses approaches the value of the average cubed z-score as n increases:

A symmetric distribution is expected to have a skewness of 0.

Kurtosis

Kurtosis measures the degree to which a distribution of scores is taller or flatter with respect to its width. Here’s one textbook definition of kurtosis:

It’s very similar to one definition of skewness, except here the z-scores are raised to the fourth instead of the third power. A distribution such as the normal curve would have an average z-score, raised to the fourth power, of 3. Therefore, 3 is subtracted in the formula so that a normal curve would have kurtosis of 0.

Again, Excel’s formula for kurtosis is slightly different and attempts to remove bias from the calculation of kurtosis based on a sample:

I’m inflicting these formulas on you so that if you need to contrast the reason that R returns different values for skewness and kurtosis than Excel, you’ll be better placed to understand the differences.

Confidence Level

The Descriptive Statistics tool’s results refer to the confidence interval value as the Confidence Level (see cell E18 in Figure 2.1). That’s a misnomer. The confidence level is the percentage of sample confidence intervals that you expect to capture the population mean: typically, 90%, 95%, or 99%.

In contrast, the Descriptive Statistics tool reports the quantity that you add to and subtract from the calculated mean so as to arrive at the confidence interval. That quantity is calculated, using Excel function syntax, as

=T.INV.2T(0.05,10)*F4

where cell F4 contains the standard error of the mean. Excel’s T.INV.2T( ) worksheet function returns the positive t value that cuts off some percentage of the area under the t-distribution, such that the remaining percentage is divided evenly between the two tails. So, this use of the function

=T.INV.2T(.05,10)

returns 2.23. That means

  • Take a t-distribution, which is very similar to a normal curve but is a little flatter in its center and a little thicker in its tails. Its shape depends partly on the number of observations in the samples used to build the distribution. In this example, the number of observations is 11 and therefore the degrees of freedom is 10.

  • The mean of that t-distribution is 0. If you cut the distribution at two points, −2.23 and +2.23, you’ll find that 95% of the area under the curve is between those two cuts.

  • And of course, 5% is outside the cuts, with 2.5% in the distribution’s right tail and 2.5% in its left tail.

Tool In Excel Crossword

But we’re not working in a scale of t values, which has a standard deviation that’s slightly larger than 1.

Instead, we’re working with a scale that in Figure 2.1 describes whatever MPG is, with a standard error of 2.63. So, to cut off 2.5% of the distribution at each tail, we multiply ±2.23 by 2.63 or ±5.86. Adding ±5.86 to the mean of 26 shown in cell F4 of Figure 2.1 gives a 95% confidence interval of from 20.14 to 31.86. Another way of saying this is that we want to go up from the mean by 2.23 standard deviations. In this scale, a standard deviation is 2.63 units, so we go up from the mean by 2.23 × 2.63 units. We go down from the mean by the same amount. The resulting range of values is the 95% confidence interval for this data.

You interpret the confidence interval as follows: If you were to take 100 random samples, each from the same population, and put a 95% confidence interval around each of 100 sample means, 95 of those confidence intervals would capture the mean of the population. It is more rational to suppose that the one confidence interval that you did construct is one of the 95 that captures the population mean than one of the 5 that do not. That’s what’s meant by saying that a confidence interval makes you 95% confident that it captures the true population mean.

Using the Excel Descriptive Statistics Tool on R’s Pizza File

Let’s take a look at what Excel’s Descriptive Statistics tool has to say about the numeric variables in R’s pizza delivery database. The first step is to export that database so it’s available to Excel. To arrange that, take these steps:

  1. Start R.

  2. Load the DescTools package using this R command:

  3. library(DescTools)

  4. Export the data frame named d.pizza by entering this command:

  5. write.csv(d.pizza, file=“C:/Users/Jean/Desktop/pizza.csv”)

You can use any legitimate destination name and location for the file. It will be formatted as a comma-separated values file, or csv file, so you might as well use csv as the filename extension. Also notice the use of forward slash (/) rather than backslashes () in the file path: R interprets a backslash as an escape character. You can instead use the forward slash (/) or two consecutive backslashes ().

When control returns to the R interface—it only takes a second or two—quit R and start Excel. Open the csv file from the location where you stored it. It looks something like the worksheet shown in Figure 2.4.

Figure 2.4 A csv file that’s opened in Excel has none of the frills such as currency formats that we’re used to seeing in Excel workbooks.

In Figure 2.4, notice the NA values sprinkled among the legitimate values. The NA values mean not available and, because they’re text, the Descriptive Statistics tool has difficulty dealing with them when they’re supposed to be numbers.

If you want to use Excel’s worksheet functions, you’re in the clear. When a function such as SUM( ) or AVERAGE( ) or STDEV.S( ) encounters a text value such as NA, it simply ignores it. See Figure 2.5 for some examples.

Figure 2.5 Excel ignores records on a pairwise basis for correlation functions such as CORREL( ).

The first ten records in the Pizza database from DescTools appear in cells A2:A11 of Figure 2.5, where the delivery temperature is recorded. This formula is entered in cell A13:

=AVERAGE(A2:A11)

Notice that the formula’s argument includes the text value NA in cell A5. The formula returns the result 46.00.

This formula is entered in cell A16:

=AVERAGE(A2,A3,A4,A6,A7,A8,A9,A10,A11)

The second AVERAGE formula includes every cell in A2:A11 except A5. Both AVERAGE( ) formulas return the same result, 46.00, and you can depend on Excel worksheet functions such as AVERAGE( ) and SUM( ) to ignore text values such as NA when they expect numeric values.

Still in Figure 2.5, the range D2:E11 contains the same records as A2:A11, but both the temperature and the minutes to delivery appear. If you want to calculate the correlation between those two variables on the basis of those records, Excel manages the missing data by ignoring the entire fourth record because it has a missing value for one of the variables. Notice the correlation between the values in D2:D11 and E2:E11, and −0.7449. That value is returned by this formula:

=CORREL(D2:D11,E2:E11)

In the range G2:H10 I have omitted the record with one missing value. The formula that calculates the correlation for that range of data is

=CORREL(G2:G10,H2:H10)

Both formulas return the same result, so the CORREL( ) function ignores the full record when either of the values is text.

Therefore, you’re in good shape if you decide to get the summary statistics from a data set such as the Pizza database using worksheet functions. Things aren’t as straightforward if you decide you want to use the Descriptive Statistics tool in Excel’s Data Analysis add-in.

If you point the Descriptive Statistics tool at a worksheet range that contains text data, it returns an error message to the effect that the range contains text data. See Figure 2.6.

Figure 2.6 Although all the worksheet functions operate smoothly with text values, the Descriptive Statistics tool doesn’t.

To get the Descriptive Statistics tool to process the data, you have to replace the text data with something else. Obviously, you can’t just make up numbers and enter them in place of the NA values. As it turns out, though, the Descriptive Statistics tool can manage if you replace the text values with empty cells.

So, one possibility is to do a global replace of the NA values with nothing—that is, in the Find and Replace dialog box, enter NA in the Find What box and enter nothing in the Replace With edit box. Excel treats a truly empty cell as missing data, and functions such as AVERAGE( ) and SUM( ) return the correct summary values for those cells that contain numeric values.

Unfortunately, with a data set as full of NA values as R’s Pizza data set, those replacement empty cells cause another problem. With as many as 1209 records in the worksheet, you’re apt to want to select a variable for analysis by clicking its label (for example, cell M1 in Figure 2.4), then holding down the Ctrl key and pressing the down arrow. This sequence selects all the cells from the active one to the final contiguous non-empty cell below the one that’s active.

But that probably just takes you partway through the data set—perhaps even just 10 records. As soon as you replace an NA value, making (say) M5 an empty cell, your search for the actual final value in the column ends at M4.

An acceptable workaround is to select the entire column by clicking its column header to enter, for example, M:M in the Input Range box (see Figure 2.3). This slows processing down just a bit, but it’s better than hitting the down arrow once for every blank cell in the data’s column.

Another possibility is to use a worksheet function that is designed to handle text values in an otherwise numeric field. AVERAGEA( ) treats text values as legitimate zero values, however, and in averages there’s a difference between a missing value and a zero value. That is, this formula:

=AVERAGEA(1,2,“NA,“4,5)

returns 2.4. Treating NA as a zero value means that the sum is 12 and the count is 5, so the result is 2.4. It’s likely that the result you’re after completely ignores the NA value, so the sum would still be 12 but the count would be 4, and the result would be 3.

There’s no entirely satisfactory solution to these problems in Excel. You could define names based on the top row of the selection, and use a defined name instead of a worksheet address in the Input Range edit box. But Excel does not include the column headers as part of a range that’s named using that method, and so Descriptive Statistics cannot show the variable’s label as part of its results. If only from the standpoint of convenience, then, R is probably the better bet than Excel in this sort of situation.

Descriptive Statistics Tool In Excel

Related Resources

  • Book $43.99

Rogue Crossword

  • Book $31.99

Add Descriptive Statistics To Excel

  • Book $55.99