Vectorization (Loops vs. Vectors)
Why Learn Vectorization in Data Analytics?
Imagine you have a list of prices for items in an inventory: c(10.00, 25.50, 8.00, 120.00). You need to calculate the price after applying an 8% sales tax to every single item.
In most programming languages, you would have to write a loop to look at each price individually, multiply it by 1.08, and save it back to a new list.
In R, however, you can do this:
prices <- c(10.00, 25.50, 8.00, 120.00)
taxed_prices <- prices * 1.08
print(taxed_prices)
R multiplies every element in the vector by 1.08 automatically, without you writing a single loop! This concept is called Vectorization. It is R's chief superpower, making your data analytics code cleaner, faster, and highly readable.
1. Element-wise Arithmetic
In R, arithmetic operations on vectors are performed element-by-element.
Operations between a Vector and a Single Number
numbers <- c(1, 2, 3, 4)
print(numbers + 10) # 11 12 13 14
print(numbers * 2) # 2 4 6 8
print(numbers ^ 2) # 1 4 9 16
Operations between Two Vectors of Equal Length
If you perform arithmetic on two vectors of the same length, R pairs them up index-by-index:
qty <- c(2, 5, 10)
price <- c(10, 20, 5)
total_costs <- qty * price
print(total_costs) # 20 100 50
2. R's Vector Recycling Rule
What happens if you perform arithmetic on vectors of different lengths? R will recycle (repeat) the elements of the shorter vector to match the length of the longer vector.
long_vec <- c(1, 2, 3, 4)
short_vec <- c(10, 20)
# short_vec gets recycled to c(10, 20, 10, 20)
result <- long_vec + short_vec
print(result) # 11 22 13 24
[!WARNING] Recycling Warning If the length of the longer vector is not a multiple of the shorter vector's length, R will still recycle but will display a warning:
longer object length is not a multiple of shorter object length.
3. Logical Vector Filtering (Boolean Masking)
Vectorization also applies to logical comparisons, returning a logical vector. We can feed this logical vector back into the square brackets [ ] of the original vector to filter it:
temps <- c(18, 25, 32, 15, 29)
# Step 1: Create a logical vector
hot_days <- temps > 25
print(hot_days) # FALSE FALSE TRUE FALSE TRUE
# Step 2: Use the logical vector to filter
hot_temps <- temps[hot_days]
print(hot_temps) # 32 29
4. Functional Iteration: purrr::map()
While vectorization handles simple arithmetic and logical operations, you often need to apply complex operations or custom functions across lists, data frame columns, or nested collections.
The tidyverse includes the purrr package, which provides the map() family of functions as an elegant, type-safe alternative to writing for loops.
Basic Mapping: map()
map(sequence, function) takes a list or vector and applies the function to each element. It always returns a list:
library(purrr)
samples <- list(c(1, 5, 2), c(10, 4), c(-3, 8, 9))
# Find the maximum value in each sub-vector
max_vals <- map(samples, max)
print(max_vals) # Returns a list of maximums
Type-Safe Vector Outputs: map_*()
If you know that all output values will be of a specific type, you can use a type-safe variant to return a standard atomic vector instead of a list:
map_dbl(): Returns a double-precision (decimal) numeric vector.map_int(): Returns an integer vector.map_lgl(): Returns a logical (boolean) vector.map_chr(): Returns a character (string) vector.
# Return a standard numeric vector instead of a list
max_vector <- map_dbl(samples, max)
print(max_vector) # 5 10 9
print(class(max_vector)) # "numeric"
Formula Syntax: Anonymous Functions
Instead of defining a separate custom function, you can write concise, one-line calculations on the fly using R's formula syntax ~. The symbol .x represents the individual item being processed:
numbers <- c(2, 4, 5, 7)
# Square each number
squared <- map_dbl(numbers, ~ .x^2)
print(squared) # 4 16 25 49
Mapping over Two Lists: map2()
If you need to iterate over two lists or vectors in parallel (passing two inputs into your function at once), use map2() or its type-safe variants like map2_dbl(). The symbols .x and .y represent items from the first and second sequences, respectively:
list_a <- c(10, 20, 30)
list_b <- c(1, 2, 3)
# Add elements in parallel: .x + .y
sums <- map2_dbl(list_a, list_b, ~ .x + .y)
print(sums) # 11 22 33
Hands-on Exercises
Exercise 1: Fahrenheit to Celsius
You have a vector of temperatures in Fahrenheit: c(32, 50, 77, 104).
The conversion formula to Celsius is: (Fahrenheit - 32) * 5 / 9.
Write R code to:
- Store the Fahrenheit temperatures in a vector.
- Apply the conversion formula in a single vectorized expression (no loops!).
- Print the converted Celsius temperatures.
# Write your code below and click Run Code
Click to view Answer
fahrenheit <- c(32, 50, 77, 104)
celsius <- (fahrenheit - 32) * 5 / 9
print(celsius) # Output: 0 10 25 40
Exercise 2: Vector Filtering
You are analyzing a vector of monthly conversion numbers: c(150, 80, 220, 95, 300, 110).
Write R code to:
- Filter the vector to find all monthly conversions that are greater than or equal to 100 using a vectorized comparison.
- Calculate the average conversion rate of only these high-performing months.
- Print the final average.
# Write your code below and click Run Code
Click to view Answer
conversions <- c(150, 80, 220, 95, 300, 110)
# Filter conversions >= 100
high_months <- conversions[conversions >= 100]
print(high_months) # 150 220 300 110
# Calculate mean
avg_high <- mean(high_months)
print(paste("Average of high performing months:", avg_high)) # Output: 195