PyTorch Basics: Tensors and Automatic Differentiation

PyTorch is a powerful, open-source machine learning library developed by Facebook's AI Research lab (FAIR). It's known for its flexibility, Python-friendly interface, and strong GPU acceleration, making it a favorite in the research community.

At its core, PyTorch provides two fundamental features:

An n-dimensional Tensor object (similar to NumPy's ndarray) that can be accelerated on GPUs.
A system for automatic differentiation, which is essential for training neural networks.

Tensors: The Building Blocks of PyTorch

A Tensor is the primary data structure in PyTorch. It's a multi-dimensional array, very similar to a NumPy array.

import torch

# Create a 2x3 tensor (2 rows, 3 columns)
x = torch.tensor([[1, 2, 3], [4, 5, 6]])
print(x)

# Create a tensor with random numbers
rand_tensor = torch.rand(3, 4)  # 3 rows, 4 columns
print(rand_tensor)

Reshaping Tensors with .view()

You can easily change the shape of a tensor without changing its data using the .view() method. This is crucial for preparing data to be fed into different layers of a neural network.

x = torch.randn(4, 4)  # A 4x4 tensor
y = x.view(16)  # Reshaped into a 1D tensor of size 16
z = x.view(-1, 8)  # Reshaped into a 2x8 tensor.
# The -1 infers the dimension from the others.
print(x.size(), y.size(), z.size())
# Output: torch.Size([4, 4]) torch.Size([16]) torch.Size([2, 8])

GPU Acceleration

One of PyTorch's biggest advantages is the ability to seamlessly move computations from the CPU to a GPU. GPUs can perform matrix operations much faster than CPUs, which dramatically speeds up the training of deep learning models.

You can move a tensor to the GPU using the .cuda() or .to(device) method.

# Check if a GPU is available
if torch.cuda.is_available():
    device = torch.device("cuda")

    # Create a tensor on the CPU
    x_cpu = torch.randn(1000, 1000)

    # Move it to the GPU
    x_gpu = x_cpu.to(device)

    # Perform a computation on the GPU
    y_gpu = x_gpu @ x_gpu  # Matrix multiplication

    print("Computation was done on the GPU!")

Hardware Acceleration: CPU vs. GPU vs. TPU

While PyTorch makes it easy to switch between hardware, choosing the right processor depends on the size of your model and your budget.

Feature	CPU	GPU	TPU
Full Name	Central Processing Unit	Graphics Processing Unit	Tensor Processing Unit
Best For	Prototyping, simple models, and data cleaning.	Training most Deep Learning models; High-precision arithmetic.	Massive-scale training (LLMs) that takes weeks or months.
Advantage	Versatile; no special setup.	Massive speed boost; excellent for scientific high-precision tasks.	Extreme efficiency for long-running, massive matrix operations.
Disadvantage	Extremely slow for deep learning.	Higher cost; limited by on-board memory (VRAM).	Not recommended for high-precision math; less flexible.

Why is a TPU "More Efficient" than a GPU?

It comes down to how they handle data flow at the hardware level:

GPU: The "Parallel Worker" (General Matrix Math) A GPU has thousands of small cores designed for parallel processing. To perform matrix multiplication, it follows a repetitive Fetch-Compute-Store cycle:
- Fetch: Grab two numbers from memory.
- Compute: Multiply them in a core.
- Store: Write the result back to memory (Registers). While this is much faster than a CPU, the GPU spends a significant amount of energy and time simply moving data in and out of memory between every calculation step.
TPU: The "Assembly Line" (Systolic Array) The TPU uses a specialized architecture called a Systolic Array. Instead of the traditional cycle, it works like a high-speed, hardware-level assembly line:
- Data enters the array once.
- It flows through thousands of connected multipliers and adders without ever touching memory again until the very end.
- The output of one calculation is passed directly to its neighbor as the input for the next. This eliminates the memory bottleneck, allowing the TPU to achieve extreme efficiency and throughput for large-scale tensor operations.

Note on Precision

GPUs are designed for versatility. They contain dedicated hardware for High-Precision (64-bit) math, making them the gold standard for scientific simulations where every decimal place matters.
TPUs sacrifice high precision to gain raw speed. Since neural networks are "robust" (they don't need 20 decimal places to work well), TPUs use lower-precision math to cram more calculation units onto a single chip.

Automatic Differentiation with autograd

This is where the magic happens for training neural networks. PyTorch's autograd engine can automatically calculate the gradients (derivatives) of any computation performed on tensors.

To enable this, you need to set the requires_grad attribute of a tensor to True. PyTorch will then build a computation graph that tracks all operations performed on that tensor.

# Create a tensor and tell PyTorch to track its operations
x = torch.ones(2, 2, requires_grad=True)
print(x)

# Perform an operation
y = x + 2

# y was created from an operation, so it has a grad_fn
print(y)
print(y.grad_fn)

# Perform more operations
z = y * y * 3
out = z.mean()

print(z, out)

Backpropagation: Calculating the Gradients

Once you have a final scalar output (like a loss value), you can call .backward() on it. PyTorch will then traverse the computation graph backward and compute the gradients of that output with respect to the original tensors that had requires_grad=True.

# Let's backpropagate from 'out'
out.backward()

# The gradients are now stored in the .grad attribute of the original tensor
print(x.grad)

This process of automatically computing gradients is the foundation of backpropagation in neural networks. It allows the model to figure out how to adjust its weights to reduce the error, without us having to manually derive the complex calculus.