This quick guide is intended as a reference for using Microsoft Excel in BIOL212 - Biostatistics at Hobart & William Smith Colleges. It is not intended to be a complete guide.

1. Spreadsheet basics

1.1 Overview

  • The basic layout of a spreadsheet in Excel is in Fig. 1 below and includes the following elements.
    • File menu: For basic commands like creating a new spreadsheet, opening a file, and saving a file.
    • Quick access toolbar: Icons for Save, Undo, and Redu functions.
    • Ribbon: Contains icons commonly used tools organized by groups (e.g., Home, Insert, etc.).
    • Active cell: A selected cell in bold outline where data can be entered.
    • Formula bar: Data and formulas in a cell are viewed here.
    • Expand formula bar: Makes the formula bar bigger.
    • Worksheet navigation: Shows the names of worksheets in the spreadsheet file. A worksheet is activated by selecting it.
    • Insert worksheet: Inserts a new worksheet.
    • Zoom: Controls zoom of the cells.
Figure 1. Basic overview of Excel.

Figure 1. Basic overview of Excel.

1.2 Spreadsheet layout

  • Spreadsheets consist of a matrix of cells organized in columns (A, B, C, etc.) and rows (1, 2, 3, etc.).
  • Each cell is designated by an address with its column letter and row number. For example, cell B4 (column B, row 4) is highlighted in the figure below.
Figure 2. Note the cursor at cell B4

Figure 2. Note the cursor at cell B4

  • Each spreadsheet file in Excel is designated with an .xlsx extension.
  • Each spreadsheet by default has a single worksheet labeled “Sheet1”. Additional worksheets can be added by clicking the “+” icon, and you can label each worksheet by right-clicking the worksheet name, clicking “Rename”, and entering a new name.
Figure 3. Each spreadsheet file can have multiple worksheets.

Figure 3. Each spreadsheet file can have multiple worksheets.

2. Data

2.1 Data types and entry

  • Three types of data can be entered in a spreadsheet: text, numbers, and formulas.
  • Data is entered into a cell by clicking on it once to make it active, typing the data, and pressing enter. Double-clicking on a cell will allow you to see a cursor while you type (Fig. 4).
  • Data can be deleted from a cell by activating it and pressing the delete (mac) or backspace (pc) key.
Figure 4. Biomass (tons per hectare) in replicate old growth foersts and regrown forests. There are two variables. The data is text for the categorical variable (Forest Type) and numbers for the quantitative variable (Biomass).Cell B11 is activated where biomass data was just entered. This is generally how we will organize data in this course. Each column in the spreadsheet should represent a differnet variable, and each row represents a replicate (forests in this case).

Figure 4. Biomass (tons per hectare) in replicate old growth foersts and regrown forests. There are two variables. The data is text for the categorical variable (Forest Type) and numbers for the quantitative variable (Biomass).Cell B11 is activated where biomass data was just entered. This is generally how we will organize data in this course. Each column in the spreadsheet should represent a differnet variable, and each row represents a replicate (forests in this case).

2.2 Data selections

  • Data can be selected in a single cell by clicking on the cell of interest. You can also select multiple cells:
    • Select an entire row or column by clicking on the row/column label.
    • Select multiple cells directly by dragging your cursor across the cells.
    • Select nonadjacent cells by holding the command (mac) or control (pc) key and clicking each cell.
    • Select all cells in a worksheet by clicking the gray area where the column aand row labels intersect.
  • When a single cell is selected, you can use the keyboard to change the active cell:
    • “Enter” moves the active cell one cell down.
    • “Tab” moves the active cell one cell to the right.
    • Arrow keys move the active cell one cell in the direction of the arrow selected. You can select multiple cells with the arrow keys by holding the “Shift” key at the same time.

2.3 Copying and pasting data

  • Copying data in a cell can be accomplished by activating the cell, and either clicking the Edit menu and selecting “Copy”, or by using the shortcut command-c (mac) or control-c (pc).
  • Pasting data can be accomplished by activating the cell, and either clicking the Edit meu and selecting “Paste”, or by using the shortcut command-v (mac) or control-v (pc).

2.3 Finding data

  • Use the Find option under the Edit menu to find a specific data point.
    • You can search the entire worksheet when a single cell is selected.
    • To restrict your search within specific cells, select the range of cells first.

2.4 Filling data

  • Sometimes you will want to create a variable with a series of data. For example, perhaps we want to label each of the 10 forests in our forest biomass dataset as forest 1, 2, 3, etc. In Excel you can autofill a series. To do so, start by entering the first few data points (e.g., 1,2,3). Then highlight the data you entered, select the handle at the bottom right (solid square), and drag down or across to fill the cells (Fig. 5).
Figure 5. To create labels for each forest, I first inserted a new column to the left of Forest Type by selecting 'Column' under the Insert menu. I then entered the first few values (1,2,3). When those values are highlighted, a solid square appears at the bottom of the selection.

Figure 5. To create labels for each forest, I first inserted a new column to the left of Forest Type by selecting ‘Column’ under the Insert menu. I then entered the first few values (1,2,3). When those values are highlighted, a solid square appears at the bottom of the selection.

  • You can apply series of nubmers like above. You can also use the autofill function to copy a single value. For example if I just highlighted “3” and then dragged the handle, all the cells below would be filled with “3”.
  • You can create a series of text as well. Chronological text is recognized by Excel (“January”, “February”, etc.).

2.5 Sorting data

  • Often it is helpful to sort data, for example from lowest to highest or highest to lowest. Let’s say you have five numbers in a column that you want to sort. To sort the you would do the following (Fig. 6):
    • Highlight the five cells.
    • Select “Sort” under the Data menu
    • Choose the order of the sort. For numeric data, this will be “smallest to largest” or “largest to smallest”.
Figure 6. Sort menu for five numeric values selected in column A.

Figure 6. Sort menu for five numeric values selected in column A.

  • When we have spreadsheets with multiple variables, you typically would not want to sort only one variable at a time. Instead, you should highlight all the cells with data before clicking sort. You can then indicate the variable you’d like to sort the dataset by. For example, see the figure below for how to sort the forest biomass data by biomass (Fig. 7).
Figure 7. Sorting by biomass, largest to smallest.

Figure 7. Sorting by biomass, largest to smallest.

3. Formulas

  • Spreadsheet programs like Excel are powerful because they allow you to write mathematical formulae, much like calculators.
  • A formula begins with the equal sign (=), followed by the operation you wish to complete. For example, if you want to add 2 + 2, in a cell, do the following:
    • Click on the cell in which you’d like to enter the formula
    • Enter an equal sign (=) to start the formula
    • Enter the rest of the formula (2+2)
    • The complete function should look like this: =2+2. Don’t add any spaces.
    • Press enter when you’re done, and Excel computes the answer immediately.
    • Note that the formula always appears in the formula bar, whereas the result of the formula is returned in the selected cell.
  • The following are standard mathematical operators used in Excel:
Figure 8. Mathematical operators in Excel.

Figure 8. Mathematical operators in Excel.

  • Mutiple operators can be used in a single formula, but be careful about order of operations. Multiplcation and division will be completed before addition and subtraction.
    • For example, suppose you want to calculate 5 plus the quantity of 20 divided by 2. If you enter =5+20/2, Excel’s result will be 15 .The 20/2 operation is calculated first.
    • Use parentheses to clearly specify your order of operations. If you type =5+(20/2), you’ll also get 15. But if you type =(5+20)/2, you’ll get 12.5. *It’s good practice to use parentheses any time you have multiple operations.
  • Formulas can be specified with cells instead of numbers. If you specify cells, Excel will use the value in the cells you specify to compute an answer. For example, let’s go back to our forest biomass data. Suppose you wanted to calculate the sum of biomass across the 5 regrown forests and the sum of biomass across the 5 old growth forests. You can type a formula with the addition operator that adds the cell address of each value to be added (Fig. 9).
    • There are two ways to write formulas with cell addresses. You can type out each cell address manually, or you can click on the cell you’d like to include in the formula. To use the point-and-click approach, you need to start your formula with an equals sign (=), then click on cells as you’d like to add them.
Figure 9. Using formulas with cell addresses. Note that Excel highlights the cells included with different colors.

Figure 9. Using formulas with cell addresses. Note that Excel highlights the cells included with different colors.

  • Now suppose you wanted to calculate the average biomass for each group, which is the sum of the biomasses divided by the sample size (5). I’ve already calculated the sums for each group, so now I just need to write a formula that specifies the sum and divided by 5 (Fig. 10:
Figure 10. Using formulas with cell addresses.

Figure 10. Using formulas with cell addresses.

*Excel is dynamic, so if you change the value of a cell that’s referenced in a formula, the formula is recomputed. For example, suppose I had “2” in cell A1 and “2” in cell A2. If I write =A1+A2 in cell A3, I’d get 4. But if I change the value in A1 to 10, the result in A3 would change to 12.

4. Functions

4.1 Background

  • Functions in Excel are pre-defined formulas. They range from simple mathematical operations, such as summing a series of numbers, to more complex statistical calculations. The general steps to using a function include:
    • Enter an equals sign (=) to begin the function.
    • Specify the name of the function. Each function in Excel has a different name. For example, the function to sum a series of numbers is called SUM.
    • Specify arguments for the function within parentheses. Each function has a set of arguments that are needed to compute an answer. Some functions have multiple arguments, in which case they are separated by commas. *Let’s look at an example. Going back to our forest biomass data, let’s try to use a function to compute the sum of biomass values for regrown forests:
    • Activate the cell you’d like to use for the function by clicking on it.
    • Enter an equal sign (=) to start the function.
    • Type the name of the sum function, followed by a foreward parenthesis to begin entering arguments: =sum(
    • The arguments for the sum function are simply the cells to sum. The biomass values for regrown forests are in cells B2 through B6. Because they’re organized consecutively in column B, we can refer to the set of cells as B2:B6. Close the parentheses to end the function and press enter.
    • The complete function is =sum(B2:B6). When you press enter, the result is computed.
    Figure 11. Using functions in Excel.

    Figure 11. Using functions in Excel.

  • Note that you can apply a function like this across any set of values.
    • Functions are especially helpful when you need to compute things involving a lot of cell values. Suppose you needed to sum a set of 2000 values from B1 through B2000. You could do this by writing =B1+B2+B3+B4+B5+… etc. It’s significantly simpler and faster to use the sum function: =sum(B1:B2000)
    • You can refer to cells in a row too. If you want to sum cells A2 through A50, you’d write =sum(A2:A50).
    • If you want to include non-consecutive cells, enter the additional cells as another argument separated by a comma. For example, to sum cells A2 through A50 and cell A100, you’d write =sum(A2:A50, A100).
    • You can also add multiple sets of cells. For example, to sum A2 to A50 and C2 to C50, write =sum(A2:A50, C2:C50). *There are many functions in Excel. For example, you can use the AVERAGE function to compute the mean biomass in each forest type:
    Figure 12. Using functions to calculate mean forest biomass.

    Figure 12. Using functions to calculate mean forest biomass.

4.2 Commonly used functions

  • Mean: AVERAGE
  • Median: MEDIAN
  • Maximum: MAX
  • Minimum: MIN
  • Variance: VAR.S
  • Standard deviation: STDEV.S
  • Quartiles: QUARTILE.INC

5. Figures & tables

  • In this section I’ll use video tutorials to show you how to make different types if figures.

5.1 Histograms

IMAGE ALT TEXT HERE

5.2 Box plots

IMAGE ALT TEXT HERE

5.3 Scatterplots

IMAGE ALT TEXT HERE

5.4 Bar graphs (frequency data)

IMAGE ALT TEXT HERE

5.5 Grouped bar plots (frequency data)

IMAGE ALT TEXT HERE

5.6 Tables

IMAGE ALT TEXT HERE