Probability and Statistics — Formula Book (GTU: BE03000251)
Probability and Statistics — Formula Book
GTU: BE03000251 — Semester 3
Unit 1: Basic Probability
Basic concepts
- Experiment, Outcome, Sample space \(S\).
- Event: subset \(A\subseteq S\).
- Probability measure \(P\) satisfies: \(0\le P(A)\le1\), \(P(S)=1\), countable additivity.
Axioms and simple results
\(P(\varnothing)=0,\qquad P(S)=1.\)
\(P(A^c)=1-P(A).\)
Addition rule
\(P(A\cup B)=P(A)+P(B)-P(A\cap B).\)
For mutually exclusive events (\(A\cap B=\varnothing\)): \(P(A\cup B)=P(A)+P(B)\).
Conditional probability and independence
\(P(A\mid B)=\dfrac{P(A\cap B)}{P(B)},\qquad P(B)>0.\)
Multiplication rule: \(P(A\cap B)=P(A\mid B)P(B)=P(B\mid A)P(A).\)
Events \(A\) and \(B\) are independent if \(P(A\cap B)=P(A)P(B)\). For \(n\) events mutual independence requires all finite intersections equal product of probabilities.
Total probability & Bayes' theorem
If \(\{A_i\}\) partition \(S\) with \(P(A_i)>0\), then \(P(B)=\sum_i P(B\mid A_i)P(A_i).\)
Bayes' theorem: \(\;P(A_k\mid B)=\dfrac{P(B\mid A_k)P(A_k)}{\sum_i P(B\mid A_i)P(A_i)}.\)
Bernoulli trials and binomial coefficient
For \(n\) independent trials with success probability \(p\),
\[
P(\text{exactly } r \text{ successes})=\binom{n}{r}p^r(1-p)^{\,n-r},\qquad r=0,1,\dots,n.
\]
\(\binom{n}{r}=\dfrac{n!}{r!(n-r)!}.\)
Random variables, PMF, PDF, CDF
A random variable \(X\) is a real-valued function on \(S\).
Discrete: PMF \(p_X(x)=P(X=x),\ \sum_x p_X(x)=1.\)
Continuous: PDF \(f_X\) with \(P(a\le X\le b)=\int_a^b f_X(x)\,dx,\ \int f_X=1.\)
CDF: \(F_X(x)=P(X\le x)=\begin{cases}
\int_{-\infty}^x f_X(t)\,dt & \text{(continuous)}\\[4pt]
\sum_{t\le x} p_X(t) & \text{(discrete)}
\end{cases}\)
Expectation, variance and standard deviation
Expectation:
\(E[X]=\begin{cases}
\sum_x x p_X(x), & \text{discrete}\\[4pt]
\int_{-\infty}^\infty x f_X(x)\,dx, & \text{continuous}
\end{cases}\)
Variance:
\(\operatorname{Var}(X)=E[(X-\mu)^2]=E[X^2]-\big(E[X]\big)^2.\)
Standard deviation: \(\sigma=\sqrt{\operatorname{Var}(X)}\).
Unit 2: Special Probability Distributions
Binomial distribution
If \(X\sim \operatorname{Bin}(n,p)\),
\[P(X=r)=\binom{n}{r}p^r(1-p)^{n-r},\quad r=0,1,\dots,n.\]
\(E[X]=np,\qquad \operatorname{Var}(X)=np(1-p).\)
Poisson distribution
If \(X\sim \operatorname{Pois}(\lambda)\),
\[P(X=r)=\frac{e^{-\lambda}\lambda^r}{r!},\quad r=0,1,2,\dots\]
\(E[X]=\lambda,\qquad \operatorname{Var}(X)=\lambda.\)
Normal distribution
If \(X\sim N(\mu,\sigma^2)\),
\[
f_X(x)=\frac{1}{\sigma\sqrt{2\pi}} \exp\!\bigg(-\frac{(x-\mu)^2}{2\sigma^2}\bigg),\quad x\in\mathbb{R}.
\]
Standard normal \(Z\sim N(0,1)\), \(Z=(X-\mu)/\sigma\).
Exponential distribution
If \(X\sim \operatorname{Exp}(\lambda)\),
\[
f_X(x)=\begin{cases}\lambda e^{-\lambda x},& x\ge0,\\ 0,&x<0.\end{cases}
\]
Mean \(=1/\lambda\), variance \(=1/\lambda^2\), median \(= (\ln 2)/\lambda\).
Gamma distribution
If \(X\sim \Gamma(k,\lambda)\) (shape \(k\), rate \(\lambda\)):
\[
f_X(x)=\frac{\lambda^k}{\Gamma(k)}x^{k-1}e^{-\lambda x},\quad x>0.
\]
Mean \(=k/\lambda\), variance \(=k/\lambda^2\).
Unit 3: Basic Statistics (Grouped and Ungrouped Data)
This section includes formulae for ungrouped and grouped data.
Ungrouped data (individual observations)
Suppose observations \(x_1,x_2,\dots,x_n\).
Arithmetic mean: \(\bar{x}=\dfrac{1}{n}\sum_{i=1}^n x_i.\)
Sample variance (unbiased): \(s^2=\dfrac{1}{n-1}\sum_{i=1}^n (x_i-\bar{x})^2.\)
Raw moment: \(m'_r=\dfrac{1}{n}\sum_{i=1}^n x_i^r.\)
Central moment: \(m_r=\dfrac{1}{n}\sum_{i=1}^n (x_i-\bar{x})^r.\)
Coefficient of variation: \(\mathrm{CV}=\dfrac{s}{\bar{x}}\times 100\%\) (for positive mean).
Grouped Data (Frequency Distribution)
Let class midpoints \(x_i\), class width \(h\) (equal classes), frequencies \(f_i\), total \(N=\sum f_i\).
Mean (grouped): \(\bar{x} = \dfrac{1}{N} \sum_i f_i x_i.\)
Median (grouped, continuous approximation):
\[
\text{Median} = L + \frac{\frac{N}{2} - F}{f_m} \times h,
\]
where \(L\) = lower boundary of median class, \(F\) = cumulative frequency before median class, \(f_m\) = frequency of median class.
Mode (grouped):
\[
\text{Mode} = L + \frac{f_m - f_{m-1}}{(2f_m - f_{m-1} - f_{m+1})} \times h.
\]
Variance (grouped, population form):
\[
\sigma^2 = \frac{1}{N}\sum_i f_i (x_i - \bar{x})^2.
\]
Moments for Grouped Data
Raw moments: \(m_r' = \dfrac{1}{N}\sum_i f_i x_i^r.\)
Central moments: \(m_r = \dfrac{1}{N}\sum_i f_i (x_i - \bar{x})^r.\)
Relation between moments about assumed mean \(a\) and true mean:
\(\begin{aligned}
m_1' &= a + m_1'',\\[6pt]
m_2 &= m_2'' - (m_1'')^2,\\[6pt]
m_3 &= m_3'' - 3m_2''m_1'' + 2(m_1'')^3,\\[6pt]
m_4 &= m_4'' - 4m_3''m_1'' + 6m_2''(m_1'')^2 - 3(m_1'')^4.
\end{aligned}\)
Percentiles / Quartiles (grouped)
\(P_k = L + \dfrac{\frac{k}{100}N - F}{f_c}\times h\)
Measures of Skewness and Kurtosis
Let \(\mu_k\) be the \(k\)-th central moment.
Coefficient of skewness (Karl Pearson's moment coefficient): \(\beta_1 = \dfrac{\mu_3^2}{\mu_2^3}.\)
Coefficient of kurtosis: \(\beta_2 = \dfrac{\mu_4}{\mu_2^2}.\)
Grouped vs. Ungrouped quick formulas (summary)
Ungrouped: \(\bar{x}=\dfrac{1}{n}\sum x_i\) Grouped: \(\bar{x}=\dfrac{1}{N}\sum f_i x_i\)
Ungrouped: \(s^2=\dfrac{1}{n-1}\sum (x_i-\bar{x})^2\) Grouped: \(s^2=\dfrac{1}{N-1}\sum f_i (x_i-\bar{x})^2\)
Correlation and Regression
Correlation
Sample covariance: \(S_{xy}=\sum_{i=1}^n (x_i-\bar{x})(y_i-\bar{y})=\sum x_i y_i - n\bar{x}\bar{y}.\)
Pearson correlation coefficient:
\[
r=\frac{\sum (x_i-\bar{x})(y_i-\bar{y})}{\sqrt{\sum (x_i-\bar{x})^2\,\sum (y_i-\bar{y})^2}}.
\]
Regression lines
Regression of \(y\) on \(x\): \(\hat{y}=a + b x,\quad b_{yx}=\dfrac{S_{xy}}{S_{xx}},\quad a=\bar{y}-b\bar{x}.\)
Relation with \(r\): \(b_{yx}=r\frac{s_y}{s_x},\qquad b_{xy}=r\frac{s_x}{s_y}.\)
Product of slopes: \(b_{yx}b_{xy}=r^2.\)
Spearman rank correlation
If \(d_i\) are rank differences,
\[
R_s = 1 - \frac{6\sum d_i^2}{n(n^2 - 1)}.
\]
Unit 4: Applied Statistics — Hypothesis Testing
General procedure
- State \(H_0\) and \(H_1\).
- Choose significance level \(\alpha\).
- Determine test statistic under \(H_0\).
- Compute observed value / p-value.
- Decision: reject or fail to reject \(H_0\).
Large-sample tests (Z-tests)
One-sample mean (known \(\sigma\)):
\(Z=\dfrac{\bar{X}-\mu_0}{\sigma/\sqrt{n}}\sim N(0,1)\).
Small-sample tests (Student's t)
One-sample t:
\(t=\dfrac{\bar{X}-\mu_0}{s/\sqrt{n}} \sim t_{n-1}.\)
t-test for correlation
Testing \(H_0: \rho=0\):
\(t=\dfrac{r\sqrt{n-2}}{\sqrt{1-r^2}}\sim t_{n-2}.\)
F-test and Chi-square tests
F-test: \(F=\dfrac{s_1^2}{s_2^2}\sim F_{n_1-1,n_2-1}\).
Chi-square goodness-of-fit:
\(\chi^2=\sum_{i=1}^k \dfrac{(O_i-E_i)^2}{E_i}\sim \chi^2_{k-c-1}.\)
Unit 5: Curve Fitting by Least Squares
Fitting straight line \(y=a+bx\)
Normal equations:
\[
\sum y = na + b\sum x,\qquad \sum xy = a\sum x + b\sum x^2.
\]
Solution:
\[
b=\frac{n\sum xy - (\sum x)(\sum y)}{n\sum x^2 - (\sum x)^2},\qquad a=\bar{y}-b\bar{x}.
\]
Fitting second degree parabola \(y=a+bx+cx^2\)
Normal equations:
\[
\begin{cases}
\sum y = n a + b\sum x + c\sum x^2,\\[4pt]
\sum xy = a\sum x + b\sum x^2 + c\sum x^3,\\[4pt]
\sum x^2y = a\sum x^2 + b\sum x^3 + c\sum x^4.
\end{cases}
\]
Fitting exponential curve \(y = ae^{bx}\)
Take \(\ln\): \(Y=\ln y, A=\ln a \Rightarrow Y = A + bx.\)
Normal equations for \(Y\):
\(\sum Y = nA + b\sum x,\qquad \sum xY = A\sum x + b\sum x^2.\)
Then \(a=e^A\).
Fitting power curve \(y = ax^b\)
Take logs: \(Y=\ln y,\ X=\ln x,\ A=\ln a \Rightarrow Y=A+bX.\)
Normal equations:
\(\sum Y = nA + b\sum X,\qquad \sum XY = A\sum X + b\sum X^2.\)
Fitting geometric (exponential-type) curve \(y = ab^x\)
Take logs: \(\ln y = \ln a + x \ln b\). Let \(A=\ln a,\ B=\ln b\).
Additional Important Formulae
\(E[aX+b]=aE[X]+b,\qquad \operatorname{Var}(aX+b)=a^2\operatorname{Var}(X).\)
If \(X,Y\) independent, \(E[XY]=E[X]E[Y],\ \operatorname{Var}(X+Y)=\operatorname{Var}(X)+\operatorname{Var}(Y).\)
Sampling distributions (brief)
For \(X_i\) i.i.d. with mean \(\mu\), variance \(\sigma^2\),
\(\operatorname{Var}(\bar{X})=\dfrac{\sigma^2}{n}\). By CLT, \(\bar{X}\approx N(\mu,\sigma^2/n)\) for large \(n\).
Probability inequalities
Markov: \(P(X\ge a)\le \dfrac{E[X]}{a}\) for \(a>0\).
Chebyshev: \(P(|X-\mu|\ge k\sigma)\le \dfrac{1}{k^2}.\)
Appendices
Appendix A: Table of common distributions summary
| Distribution | PMF / PDF | Mean, Variance |
| Binomial \((n,p)\) |
\(\binom{n}{r}p^r(1-p)^{n-r}\) |
\(np,\ np(1-p)\) |
| Poisson \((\lambda)\) |
\(e^{-\lambda}\lambda^r/r!\) |
\(\lambda,\ \lambda\) |
| Normal \((\mu,\sigma^2)\) |
\(\dfrac{1}{\sigma\sqrt{2\pi}}e^{-(x-\mu)^2/(2\sigma^2)}\) |
\(\mu,\ \sigma^2\) |
| Exponential \((\lambda)\) |
\(\lambda e^{-\lambda x},\ x\ge0\) |
\(1/\lambda,\ 1/\lambda^2\) |
| Gamma \((k,\lambda)\) |
\(\dfrac{\lambda^k}{\Gamma(k)}x^{k-1}e^{-\lambda x}\) |
\(k/\lambda,\ k/\lambda^2\) |
Appendix B: Quick reference for grouped data formulas
- Mean: \(\bar{x}=\dfrac{1}{N}\sum f_i x_i\).
- Median: \(L + \dfrac{N/2 - F}{f_m}\,h\).
- Mode: \(L + \dfrac{f_m-f_{m-1}}{2f_m-f_{m-1}-f_{m+1}}\,h\).
- Variance: \(\dfrac{1}{N}\sum f_i (x_i-\bar{x})^2\) (population).
Comments
Post a Comment