MOBI BOOT CAMP CORP. logoLearning Buddy
  • SIGN IN
  • Introduction
  • Setup
  • 1A: Fundamental Building Blocks
  • 1B: Compound Statements
  • 2: Ordered Collection
  • 3: Key-Value Map and Structures
  • 4: More Data types
    • Strings and Factors
    • Time and Space Complexity in R
  • 5: Iteration Constructs
  • 6: Other constructs
  • 7. Regex
  • 8. Date and Time
  • Revision
  • Practice Exercise

Time and Space Complexity in R

Why Learn Time and Space Complexity in Data Analytics?

Imagine you are analyzing server access logs for a large streaming platform. Your dataset contains 10 million records. You want to write a script to search for an anomaly pattern:

  • If you write an inefficient search algorithm that runs in quadratic time (O(N2)O(N^2)O(N2)), it will require 100 trillion operations—your computer will run for days or crash!
  • If you design a linear search (O(N)O(N)O(N)), it takes only 10 million operations (a fraction of a second).
  • If you use a constant-time lookup (O(1)O(1)O(1)) using a key-value mapping, it takes 1 operation (instantaneous).

In data science, datasets are often massive. Code that runs quickly on a sample of 10 rows might become completely unusable when run on 10 million rows. Understanding Complexity (Big O) allows you to predict how your R code will scale before you write it.


1. Time vs. Space Complexity

  • Time Complexity: Measures how the CPU execution time scales as the input size (NNN) grows.
  • Space Complexity: Measures how much RAM/memory your algorithm allocates as the input size (NNN) grows.

Complexity is measured using Big O notation, which focuses on the worst-case scenario (the upper limit of resources needed).


2. Linear Time Complexity: O(N)

A search through a vector is a classic example of O(N)O(N)O(N) complexity. R must inspect each element one by one from the beginning until it finds the target.

# Linear search function in R
is_present <- function(target, vec) {
  for (item in vec) {
    if (item == target) {
      return(TRUE)
    }
  }
  return(FALSE)
}

readings <- c(15, 23, 89, 45, 12, 90)
print(is_present(45, readings)) # TRUE

If the vector has NNN elements:

  • Best-case: The target is at index 1 (1 step).
  • Worst-case: The target is at the very end or not present at all (NNN steps).
  • Big O: O(N)O(N)O(N) (Linear scaling).

3. Constant Time Complexity: O(1)

Accessing a specific element in an R vector by its index (like vec[5]) or retrieving a value from a named list using a key (like config$port) occurs in constant time.

config <- list(host = "localhost", port = 5432)

# Constant time lookup
print(config$port) # 1 step, regardless of list size!

Because R can calculate the exact memory address of the item immediately, the time taken is independent of how large the list or vector is.

  • Big O: O(1)O(1)O(1) (Constant scaling).

4. Quadratic Time Complexity: O(N^2)

If you write a nested loop (a loop inside another loop), the execution time scales quadratically. This is common when comparing every item in a dataset against every other item.

# Inefficient duplicate checker
has_duplicates <- function(vec) {
  n <- length(vec)
  for (i in 1:n) {
    for (j in 1:n) {
      if (i != j && vec[i] == vec[j]) {
        return(TRUE)
      }
    }
  }
  return(FALSE)
}

If the vector size is NNN:

  • The outer loop runs NNN times.
  • For every step of the outer loop, the inner loop also runs NNN times.
  • Total operations: N×N=N2N \times N = N^2N×N=N2.
  • Big O: O(N2)O(N^2)O(N2) (Quadratic scaling). Doubling the data size makes the code run four times slower!

Hands-on Exercises

Exercise 1: Identifying Complexity Scaling

If an O(N2)O(N^2)O(N2) algorithm takes 3 seconds to process a data frame of 1,000 rows:

  1. How long will it take to process 2,000 rows? (Hint: The data size doubled).
  2. How long will an O(N)O(N)O(N) linear algorithm take if it processes 1,000 rows in 3 seconds?
# Write your calculations / code in the R editor to verify math if needed
Click to view Answer
# 1. For O(N^2) complexity:
# If N doubles (2 * N), the steps increase by 2^2 = 4 times.
# 3 seconds * 4 = 12 seconds!

# 2. For O(N) complexity:
# If N doubles (2 * N), the steps increase by 2 times.
# 3 seconds * 2 = 6 seconds!

Exercise 2: Nested Loops Check

Look at the R code below:

matrix_sum <- function(n) {
  total <- 0
  for (i in 1:n) {
    total <- total + i
  }
  return(total)
}

Is the time complexity of matrix_sum O(1)O(1)O(1), O(N)O(N)O(N), or O(N2)O(N^2)O(N2)? Test running it with different values of n in the editor to see how it performs.

# Run test calls in the editor
Click to view Answer
# Answer: The complexity is O(N) (Linear Time).
# There is only a single loop running from 1 to n, so the number of additions 
# scales linearly with the input size n.
Privacy Policy | Terms & Conditions