Vectors (Homogeneous Ordered Collections)
Why Learn Vectors in Data Analytics?
Imagine you are tracking daily revenue for a small local business over a single week:
- Monday:
$120 - Tuesday:
$150 - Wednesday:
$90 - Thursday:
$200 - Friday:
$250 - Saturday:
$300 - Sunday:
$180
Instead of creating seven separate variables (revenue_mon, revenue_tue, etc.) which makes calculations a nightmare, you want a single, ordered container.
In R, this structure is a Vector. With a vector, you can:
- Find the total weekly revenue in one command.
- Retrieve the revenue of specific days (e.g., Wednesday).
- Omit specific days (e.g., removing the weekends).
- Filter for days that exceeded a certain sales target.
Let's learn how to create and manipulate vectors, which form the bedrock of all data operations in R.
1. Creating Vectors: The c() Function
In R, we create vectors using the combine/concatenate function c(). All elements in a vector must be of the same data type (homogeneous).
# Creating a numeric vector of weekly revenues
revenues <- c(120, 150, 90, 200, 250, 300, 180)
print(revenues)
The Implicit Coercion Rule
If you try to combine different data types in a vector, R will silently convert (coerce) them to a single type, usually the most flexible one (character):
mixed_vector <- c(10, "Closed", TRUE)
print(mixed_vector)
# Output: "10" "Closed" "TRUE" (All converted to character strings!)
2. R Indexing (1-Based Indexing)
R uses 1-based indexing. The first element in a vector is at index 1, not 0!
revenues <- c(120, 150, 90, 200, 250, 300, 180)
# Get the first element (Monday)
print(revenues[1]) # 120
# Get the third element (Wednesday)
print(revenues[3]) # 90
3. Negative Indexing (Omitting Elements)
In Python, a negative index like [-1] returns the last element.
In R, a negative index excludes/omits that element from the output!
revenues <- c(120, 150, 90, 200, 250, 300, 180)
# Return all revenues EXCEPT the first one (Monday)
print(revenues[-1]) # 150 90 200 250 300 180
# Return all revenues EXCEPT the weekends (positions 6 and 7)
print(revenues[-c(6, 7)]) # 120 150 90 200 250
4. Slicing and Ranges
To extract a sub-vector, use the colon : operator to generate a range of indices:
revenues <- c(120, 150, 90, 200, 250, 300, 180)
# Slice from Wednesday (3) to Friday (5)
print(revenues[3:5]) # 90 200 250
5. Useful Built-in Vector Functions
R features powerful vector statistics functions:
length(x): Returns the count of elements.sum(x): Adds all elements together.mean(x): Returns the arithmetic average.sort(x): Sorts the vector (defaults to ascending; usedecreasing = TRUEfor descending).min(x)/max(x): Returns the lowest/highest value.
revenues <- c(120, 150, 90, 200, 250, 300, 180)
total_rev <- sum(revenues)
avg_rev <- mean(revenues)
max_rev <- max(revenues)
print(paste("Total:", total_rev, "| Average:", avg_rev, "| Max:", max_rev))
Hands-on Exercises
Exercise 1: Weekly Revenue Audit
Given the daily revenue vector: c(120, 150, 90, 200, 250, 300, 180)
Write R code to:
- Calculate and print the average weekday revenue (first 5 elements).
- Calculate and print the total weekend revenue (last 2 elements).
- Determine the difference between the average weekday revenue and the average weekend revenue.
# Write your code below and click Run Code
Click to view Answer
revenues <- c(120, 150, 90, 200, 250, 300, 180)
weekday_rev <- revenues[1:5]
weekend_rev <- revenues[6:7]
avg_weekday <- mean(weekday_rev)
total_weekend <- sum(weekend_rev)
print(paste("Average Weekday Revenue: $", avg_weekday))
print(paste("Total Weekend Revenue: $", total_weekend))
difference <- mean(weekend_rev) - avg_weekday
print(paste("Weekend average exceeds Weekday average by: $", difference))
Exercise 2: Removing Anomalies
You have a vector of sensor readings: c(22.1, 23.5, 999.0, 21.8, 22.4). The value 999.0 is an obvious sensor error at index position 3.
Write R code to:
- Store the readings in a vector.
- Use negative indexing to remove the faulty third reading.
- Compute and print the average temperature of the remaining correct readings.
# Write your code below and click Run Code
Click to view Answer
readings <- c(22.1, 23.5, 999.0, 21.8, 22.4)
clean_readings <- readings[-3]
avg_temp <- mean(clean_readings)
print(paste("Cleaned Average Temperature:", avg_temp))