A common source of confusion is the difference between, for example, the functions dxxx() and rxxx(), where xxx is the name of the distribution of interest. For instance, xxx could be norm, binom, or multinom for the normal, binomial, and multinomial distributions, respectively. In this section, we will focus on the normal distribution as an example but keep in mind that the same ideas apply for other distributions as well.

The function dnorm() calculates \(N(x|\mu,\sigma^2)=\frac{1}{\sqrt{2\pi \sigma^2}} exp(\frac{1}{2 \sigma^2}(x-\mu)^2)\) as long as you supply it with the required inputs: \(x,\mu,\sigma^2\). This function is useful because it would be annoying to have to explicitly type this equation into R all the time. I typically use dnorm() to show the theoretical shape of the distribution. For instance, say \(\mu=1\) and \(\sigma^2=4\) and let’s pick a range of x values. This is what we get:

On the other hand, the function rnorm() generates samples from this distribution. As a result, we are required to provide this function with the number of samples to be generate (n) and \(\mu,\sigma^2\). Same as above, let’s assume that \(\mu=1\) and \(\sigma^2=4\). Here is a list of 5 random variables generated by this function:

## [1] -0.2529076  1.3672866 -0.6712572  4.1905616  1.6590155

In general, we will expect these values to be close to 1 (the mean) and very few values in the tails but this need not be the case. In this particular draw, we got a 4.19, which is not that close to 1. It turns out that if we generate tons of random variables from this distribution and create a histogram, we can approximate the theoretical shape we drew previously using dnorm. This should not be surprising because the theoretical distribution is nothing more than the histogram you would expect if you had an infinite number of samples.



Comments?

Send me an email at