Simulation

Simulation in R

Tutorial Name: Codes With Pankaj Website: www.codeswithpankaj.com


Table of Contents

  1. Introduction to Simulation

  2. Generating Random Numbers

    • Generating Random Numbers from Uniform Distribution

    • Generating Random Numbers from Normal Distribution

    • Generating Random Numbers from Other Distributions

  3. Setting the Random Number Seed

  4. Simulating a Linear Model

    • Generating Predictors

    • Generating Response Variable

    • Fitting the Simulated Data to a Linear Model

  5. Random Sampling

    • Simple Random Sampling

    • Sampling with Replacement

    • Stratified Sampling

  6. Best Practices for Simulation in R


1. Introduction to Simulation

Simulation is a powerful tool in data science and statistics, allowing you to generate data under controlled conditions and test models or hypotheses. In R, you can simulate data for various distributions, set random seeds for reproducibility, create linear models, and perform random sampling. This tutorial will guide you through the essential simulation techniques.


2. Generating Random Numbers

R provides a suite of functions for generating random numbers from various distributions. These functions are essential for simulations, allowing you to create datasets that mimic real-world data.

2.1 Generating Random Numbers from Uniform Distribution

The runif() function generates random numbers from a uniform distribution between 0 and 1.

Example:

# Generating 10 random numbers from a uniform distribution
random_numbers <- runif(10)
print(random_numbers)

You can also specify a different range by providing the min and max arguments.

Example:

# Generating 10 random numbers between 1 and 100
random_numbers <- runif(10, min = 1, max = 100)
print(random_numbers)

2.2 Generating Random Numbers from Normal Distribution

The rnorm() function generates random numbers from a normal (Gaussian) distribution with a specified mean and standard deviation.

Example:

# Generating 10 random numbers from a normal distribution with mean 0 and SD 1
random_numbers <- rnorm(10, mean = 0, sd = 1)
print(random_numbers)

2.3 Generating Random Numbers from Other Distributions

R provides functions for generating random numbers from various other distributions, such as:

  • rbinom() for binomial distribution.

  • rexp() for exponential distribution.

  • rgamma() for gamma distribution.

Example:

# Generating 10 random numbers from a binomial distribution with n = 10 and p = 0.5
random_numbers <- rbinom(10, size = 10, prob = 0.5)
print(random_numbers)

3. Setting the Random Number Seed

Setting the random number seed ensures that your results are reproducible. This is crucial when sharing your code or when you need consistent results across multiple runs.

Example:

# Setting the seed for reproducibility
set.seed(42)

# Generating random numbers with the seed set
random_numbers <- runif(10)
print(random_numbers)

By setting the seed, you can ensure that the same random numbers are generated every time you run the code.


4. Simulating a Linear Model

Simulating data for a linear model involves generating predictor variables (independent variables) and a response variable (dependent variable) based on a linear relationship with some added noise.

4.1 Generating Predictors

You can simulate predictor variables from any distribution. For simplicity, let's generate them from a uniform distribution.

Example:

# Generating 100 predictor variables
x <- runif(100, min = 0, max = 10)

4.2 Generating Response Variable

The response variable y is generated based on a linear relationship with x, along with some normally distributed random noise.

Example:

# Generating the response variable with noise
y <- 5 + 2 * x + rnorm(100, mean = 0, sd = 1)

In this example, the true model is y = 5 + 2 * x, and random noise is added to simulate real-world data.

4.3 Fitting the Simulated Data to a Linear Model

Once you have simulated the data, you can fit a linear model using the lm() function.

Example:

# Fitting a linear model to the simulated data
model <- lm(y ~ x)
summary(model)

The summary() function provides details about the fitted model, including coefficients, R-squared, and statistical significance.


5. Random Sampling

Random sampling is essential in simulation and data analysis, allowing you to select a subset of data for analysis or testing. R provides several functions for performing random sampling.

5.1 Simple Random Sampling

The sample() function allows you to randomly sample from a vector or dataset.

Example:

# Simple random sampling of 5 elements from a vector
sampled_data <- sample(1:100, 5)
print(sampled_data)

5.2 Sampling with Replacement

You can sample with replacement by setting the replace argument to TRUE.

Example:

# Sampling with replacement
sampled_data <- sample(1:10, 15, replace = TRUE)
print(sampled_data)

5.3 Stratified Sampling

Stratified sampling involves dividing the population into strata and then sampling from each stratum. You can use the dplyr package to perform stratified sampling.

Example:

# Stratified sampling using dplyr
library(dplyr)

# Creating a dataset with a group variable
data <- data.frame(group = rep(1:3, each = 10), value = runif(30))

# Stratified sampling: selecting 5 samples from each group
sampled_data <- data %>%
  group_by(group) %>%
  sample_n(5)

print(sampled_data)

6. Best Practices for Simulation in R

  • Set Random Seed: Always set a random seed for reproducibility.

  • Use Appropriate Distributions: Choose the distribution that best fits your simulation needs.

  • Simulate Realistic Data: Add noise and variability to mimic real-world data.

  • Check Results: Validate your simulation results by comparing them with theoretical expectations.


Conclusion

Simulation is a powerful tool in R, enabling you to generate data, test models, and perform random sampling with ease. By mastering techniques like generating random numbers, setting the random seed, simulating linear models, and random sampling, you can effectively conduct simulations for a wide range of applications.

For more tutorials and resources, visit Codes With Pankaj at www.codeswithpankaj.com.

Last updated