MOBI BOOT CAMP CORP. logoLearning Buddy
  • SIGN IN
  • Introduction
  • Setup
  • 1A: Fundamental Building Blocks
  • 1B: Compound Statements
  • 2: Ordered Collection
  • 3: Key-Value Map and Structures
  • 4: More Data types
  • 5: Iteration Constructs
  • 6: Other constructs
    • Custom Functions
    • Packages & Libraries
    • Error Handling
  • 7. Regex
  • 8. Date and Time
  • Revision
  • Practice Exercise

Custom Functions

Why Learn Custom Functions in Data Analytics?

Imagine you are cleaning monthly user growth rates across several departments (e.g., Marketing, Sales, Engineering). The data cleaning pipeline requires:

  1. Converting negative rates to absolute values (representing absolute growth magnitude).
  2. Multiplying by 100 to convert to a percentage.
  3. Rounding the result to 2 decimal places.

If you copy and paste these three math steps for every department, your code becomes bloated and hard to maintain. If you decide to change the rounding to 3 decimal places later, you must find and update every single copy!

Instead, you want to package this calculation into a reusable box:

clean_growth <- function(rate) {
  rounded <- round(abs(rate) * 100, digits = 2)
  return(rounded)
}

In R, this is a User-Defined Function. It allows you to modularize your analytics, reduce duplication, and debug errors easily.


1. Defining a Custom Function

In R, functions are assigned to variables using the standard <- operator, followed by the function keyword, argument list, and braces { }.

Syntax

function_name <- function(param1, param2) {
  # statements
  return(value)
}

2. Implicit vs. Explicit Returns

In R, you can return values in two ways:

  • Explicit: Using the return() function.
  • Implicit: R automatically returns the value of the last statement evaluated inside the function body.
# Explicit Return
multiply_explicit <- function(x, y) {
  return(x * y)
}

# Implicit Return (Recommended R idiom for simple functions)
multiply_implicit <- function(x, y) {
  x * y # This is the last line, so it gets returned automatically!
}

print(multiply_implicit(5, 4)) # 20

3. Parameter Defaults

You can set default values for arguments. If a default is set, you don't have to provide that argument when calling the function:

calculate_target <- function(base, growth_rate = 0.05) {
  base * (1 + growth_rate)
}

# Uses default growth_rate of 0.05
print(calculate_target(100))      # 105

# Overrides default with 0.10
print(calculate_target(100, 0.10)) # 110

4. Modifying Outer Scope: The Super-Assignment Operator <<-

By default, variables created inside a function are local to that function and disappear when the function ends.

If you want to modify a variable defined in the outer environment (equivalent to Python's global keyword), R uses the super-assignment operator <<-:

total_runs <- 0

log_run <- function() {
  # Modify variable in the outer parent environment
  total_runs <<- total_runs + 1 
}

log_run()
log_run()
print(total_runs) # 2

5. Variable Arguments: The Ellipsis ...

In R, you can accept an arbitrary number of arguments (equivalent to Python's *args and **kwargs) using three dots ... (the ellipsis). This is commonly used to pass parameters down to nested built-in functions:

# A custom mean calculator that passes any extra settings (like na.rm) to mean()
custom_mean <- function(data, ...) {
  print("Calculating average...")
  mean(data, ...)
}

dataset <- c(10, 20, NA, 30)

# Pass na.rm = TRUE through the ellipsis to the inner mean() function
print(custom_mean(dataset, na.rm = TRUE)) # 20

6. Tidy Evaluation in Functions: Embracing {{ }}

If you write custom wrapper functions that interact with the tidyverse (especially packages like dplyr and ggplot2), you will run into a challenge known as indirection.

Tidyverse functions use data masking, which allows you to refer to columns in a table without using quotes (e.g. typing Age instead of "Age"). However, if you try to pass an unquoted column name directly into your own function, R will look for a global variable with that name and fail:

library(dplyr)
# This function will FAIL!
grouped_mean_fail <- function(df, group_var, mean_var) {
  df |>
    group_by(group_var) |>
    summarize(mean(mean_var))
}

# Throws error: object 'model' not found
# grouped_mean_fail(mpg, model, hwy)

The Fix: Embracing with curly-curly {{ }}

To pass unquoted column names as arguments to your function, you must embrace them using double curly braces {{ }}. This instructs R to evaluate the argument inside the context of the data frame:

# This function WORKS!
grouped_mean_success <- function(df, group_var, mean_var) {
  df |>
    group_by({{ group_var }}) |>
    summarize(mean({{ mean_var }}))
}

# Successfully groups by model and calculates average highway mileage!
result <- grouped_mean_success(mpg, model, hwy)
print(head(result))

Hands-on Exercises

Exercise 1: Clean and Standardize Metric

Write a function called normalize_metric that takes a numeric vector, finds the difference of each element from the mean of the vector, and divides it by the standard deviation (sd()).

  1. Define normalize_metric <- function(vec) { ... }
  2. The formula to return is (vec - mean(vec)) / sd(vec). Use implicit return.
  3. Test your function by calling it with c(10, 20, 30) and print the result.
# Write your code below and click Run Code
Click to view Answer
normalize_metric <- function(vec) {
  (vec - mean(vec)) / sd(vec)
}

test_vec <- c(10, 20, 30)
print(normalize_metric(test_vec))
# Output: -1  0  1

Exercise 2: Discount Calculator with Defaults

Write a function calculate_price that takes a raw price and a discount percentage (defaulting to 0.10). Write R code to:

  1. Define the function. The formula is price * (1 - discount).
  2. Call it with a price of 100 and no discount (verify it returns 90).
  3. Call it with a price of 150 and a discount of 0.20 (verify it returns 120).
# Write your code below and click Run Code
Click to view Answer
calculate_price <- function(price, discount = 0.10) {
  price * (1 - discount)
}

# Test 1
print(calculate_price(100)) # 90

# Test 2
print(calculate_price(150, 0.20)) # 120
Privacy Policy | Terms & Conditions