Data Types, Variables, and Arithmetic Operators
Why Learn Data Types and Variables in Data Analytics?
Imagine you are analyzing sales performance for a retail brand. You receive data from three store locations:
- Store A:
1250.50dollars (a decimal value) - Store B:
980dollars (an integer value) - Store C:
"Closed"(a text message representing that the store was closed due to a holiday)
To report the average sales across active stores, you need to add the numbers and divide by two. However, if you try to add 1250.50 + 980 + "Closed" in a calculator or program, it will fail because you cannot add words to numbers!
To solve this data problem, you need to store these data points in variables, identify their data types (numbers vs text), filter out invalid values, and use arithmetic operators to compute the results. Let's learn how R handles these fundamental components.
Interactive Learning
This eBook is designed to be fully interactive. The code blocks you see throughout the chapters run entirely within your web browser using WebR (WebAssembly R). This means there is no round trip to a remote server—your code executes locally and instantly on your machine.
How to use code blocks:
- View: Read the code provided in the editor.
- Edit: You can click directly inside the code block to modify the values or logic. Your input is saved in your browser. You can reset it to the original value at any time.
- Run: When ready, click on the Run Code button to execute your code snippet.
- Observe: The results will appear immediately below in the Output block.
- Persistent Context: All R code cells on the same page share a single active R session. If you define a variable
x <- 5in the first editor and run it, that variablexwill be available in any subsequent R code blocks you execute on that page! Remember that execution order is what matters, not the visual top-to-bottom layout of cells. - Refreshing the Page: Refreshing your browser will wipe the active R session's memory (removing all defined variables). However, the code changes you wrote in the boxes are preserved.
- Reset Code Button: If you want to undo your edits and restore a code block to its original textbook state, click Reset Code.
- Clear Cache Button: R packages (like
ggplot2anddplyr) are cached locally in your browser's IndexedDB storage for fast loading. If a package download becomes corrupted or fails, click Clear Cache in any R editor. This will securely delete the R engine cache database and reload the page.
1. Assignment and Variables
In R, we assign values to variables using the left-assignment operator <- (formed by a less-than symbol < and a hyphen -).
Reversibility of Assignment: The Right-Assignment Operator ->
R also allows you to reverse the direction of assignment using the right-assignment operator ->. This pushes the value on the left into the variable name on the right:
# Right-assignment example
1250.50 -> sales_a
980.00 -> sales_b
# Perform arithmetic calculation
sales_a + sales_b -> total_sales
total_sales / 2 -> avg_sales
# Output the results
print(total_sales)
print(avg_sales)
While both <- and -> are syntactically valid, left-assignment <- is the industry-wide standard and is highly recommended for readability.
Assignment vs. Function Arguments: <- vs. =
A common point of confusion is when to use the equals sign (=) instead of an arrow.
- Use
<-(or->) for assigning values to variables in your active environment. - Use
=only for passing named keyword arguments into functions (e.g.,mean(data, na.rm = TRUE)).
[!WARNING] The Function Argument Assignment Trap If you write an arrow instead of an equals sign inside a function call (e.g.,
mean(data, na.rm <- TRUE)), R will actually perform a variable assignment in your active workspace, creating a variable namedna.rmset toTRUE, rather than cleanly passing the parameter to the function. Always use=inside function call parentheses!
Try it out: Run the code block above in your browser. Feel free to edit the sales values or add a sales_c variable to see how the output changes.
2. Common R Data Types
Every value in R belongs to a class (data type). R automatically determines the type of a variable when you assign a value to it (dynamic typing).
| Data Type (Class) | Description | Example |
|---|---|---|
| numeric | Real numbers / decimals (double-precision) | sales <- 1250.50 |
| integer | Whole numbers (denoted by an L suffix) |
count <- 10L |
| character | Text strings (wrapped in double or single quotes) | status <- "Closed" |
| logical | Boolean flags (TRUE or FALSE, abbreviated as T or F) |
is_active <- TRUE |
| complex | Complex numbers with imaginary parts | val <- 3 + 2i |
| NULL | Represents the absence of a value | missing_val <- NULL |
Identifying Data Types
To check the data type of a variable, use the class() function:
status <- "Closed"
print(class(status))
sales_a <- 1250.50
print(class(sales_a))
3. Rules for Variable Names
- Must start with a letter or a dot (though dots are usually reserved for system variables).
- Can contain letters, digits, underscores (
_), and periods (.). - Cannot start with a number or an underscore.
- Cannot be a reserved keyword (e.g.,
if,else,while,function,TRUE,FALSE).
R code traditionally uses dots (sales.average) or underscores (sales_average) to separate words. Today, the R community highly recommends snake_case (sales_average) for compatibility with database standards and readability.
4. Arithmetic Operators
R supports all standard mathematical operators:
| Operation | Operator | Example | Result |
|---|---|---|---|
| Addition | + |
10 + 5 |
15 |
| Subtraction | - |
10 - 5 |
5 |
| Multiplication | * |
10 * 5 |
50 |
| Division | / |
10 / 4 |
2.5 |
| Integer Division | %/% |
10 %/% 4 |
2 (removes remainder) |
| Modulo | %% |
10 %% 4 |
2 (returns remainder) |
| Exponentiation | ^ (or **) |
2 ^ 3 |
8 |
# Exploring integer division and modulo
division <- 11 / 3
integer_division <- 11 %/% 3
remainder <- 11 %% 3
print(division) # 3.666667
print(integer_division) # 3
print(remainder) # 2
Hands-on Exercises
Exercise 1: Calculating Data Analytics Metrics
A marketing campaign had a budget of $5,000. It generated 12,500 clicks and 450 actual sales (conversions). Write R code to:
- Store budget, clicks, and conversions in variables.
- Calculate the Cost Per Click (CPC) (Budget divided by Clicks).
- Calculate the Conversion Rate (Conversions divided by Clicks, multiplied by 100 to get a percentage).
- Print both metrics.
# Write your code below and click Run Code
Click to view Answer
budget <- 5000
clicks <- 12500
conversions <- 450
cpc <- budget / clicks
conversion_rate <- (conversions / clicks) * 100
print(paste("Cost Per Click (CPC): $", cpc))
print(paste("Conversion Rate:", conversion_rate, "%"))
Exercise 2: Odd or Even Records
In data preprocessing, you often want to split a dataset based on row numbers (e.g., extracting odd rows or even rows). Use the modulo operator (%%) to check if a specific row ID is even or odd.
- Assign
row_id <- 27 - Divide the row ID by 2 using the modulo operator to check the remainder (if it is 1, the row is odd; if it is 0, the row is even).
- Print the remainder.
# Write your code below and click Run Code
Click to view Answer
row_id <- 27
remainder <- row_id %% 2
print(paste("Remainder is:", remainder))
# Since remainder is 1, the row is odd!