Skip to main content

Probability and Statistics GTU: BE03000251 Semester 3 Formula

Probability and Statistics — Formula Book (GTU: BE03000251)

Probability and Statistics — Formula Book

GTU: BE03000251 — Semester 3

Unit 1: Basic Probability

Basic concepts

  • Experiment, Outcome, Sample space \(S\).
  • Event: subset \(A\subseteq S\).
  • Probability measure \(P\) satisfies: \(0\le P(A)\le1\), \(P(S)=1\), countable additivity.

Axioms and simple results

\(P(\varnothing)=0,\qquad P(S)=1.\)
\(P(A^c)=1-P(A).\)

Addition rule

\(P(A\cup B)=P(A)+P(B)-P(A\cap B).\)

For mutually exclusive events (\(A\cap B=\varnothing\)): \(P(A\cup B)=P(A)+P(B)\).

Conditional probability and independence

\(P(A\mid B)=\dfrac{P(A\cap B)}{P(B)},\qquad P(B)>0.\)
Multiplication rule: \(P(A\cap B)=P(A\mid B)P(B)=P(B\mid A)P(A).\)

Events \(A\) and \(B\) are independent if \(P(A\cap B)=P(A)P(B)\). For \(n\) events mutual independence requires all finite intersections equal product of probabilities.

Total probability & Bayes' theorem

If \(\{A_i\}\) partition \(S\) with \(P(A_i)>0\), then \(P(B)=\sum_i P(B\mid A_i)P(A_i).\)
Bayes' theorem: \(\;P(A_k\mid B)=\dfrac{P(B\mid A_k)P(A_k)}{\sum_i P(B\mid A_i)P(A_i)}.\)

Bernoulli trials and binomial coefficient

For \(n\) independent trials with success probability \(p\),

\[ P(\text{exactly } r \text{ successes})=\binom{n}{r}p^r(1-p)^{\,n-r},\qquad r=0,1,\dots,n. \]
\(\binom{n}{r}=\dfrac{n!}{r!(n-r)!}.\)

Random variables, PMF, PDF, CDF

A random variable \(X\) is a real-valued function on \(S\).

Discrete: PMF \(p_X(x)=P(X=x),\ \sum_x p_X(x)=1.\)

Continuous: PDF \(f_X\) with \(P(a\le X\le b)=\int_a^b f_X(x)\,dx,\ \int f_X=1.\)

CDF: \(F_X(x)=P(X\le x)=\begin{cases} \int_{-\infty}^x f_X(t)\,dt & \text{(continuous)}\\[4pt] \sum_{t\le x} p_X(t) & \text{(discrete)} \end{cases}\)

Expectation, variance and standard deviation

Expectation: \(E[X]=\begin{cases} \sum_x x p_X(x), & \text{discrete}\\[4pt] \int_{-\infty}^\infty x f_X(x)\,dx, & \text{continuous} \end{cases}\)
Variance: \(\operatorname{Var}(X)=E[(X-\mu)^2]=E[X^2]-\big(E[X]\big)^2.\)

Standard deviation: \(\sigma=\sqrt{\operatorname{Var}(X)}\).


Unit 2: Special Probability Distributions

Binomial distribution

If \(X\sim \operatorname{Bin}(n,p)\), \[P(X=r)=\binom{n}{r}p^r(1-p)^{n-r},\quad r=0,1,\dots,n.\]
\(E[X]=np,\qquad \operatorname{Var}(X)=np(1-p).\)

Poisson distribution

If \(X\sim \operatorname{Pois}(\lambda)\), \[P(X=r)=\frac{e^{-\lambda}\lambda^r}{r!},\quad r=0,1,2,\dots\]
\(E[X]=\lambda,\qquad \operatorname{Var}(X)=\lambda.\)

Normal distribution

If \(X\sim N(\mu,\sigma^2)\), \[ f_X(x)=\frac{1}{\sigma\sqrt{2\pi}} \exp\!\bigg(-\frac{(x-\mu)^2}{2\sigma^2}\bigg),\quad x\in\mathbb{R}. \]

Standard normal \(Z\sim N(0,1)\), \(Z=(X-\mu)/\sigma\).

Exponential distribution

If \(X\sim \operatorname{Exp}(\lambda)\), \[ f_X(x)=\begin{cases}\lambda e^{-\lambda x},& x\ge0,\\ 0,&x<0.\end{cases} \]
Mean \(=1/\lambda\), variance \(=1/\lambda^2\), median \(= (\ln 2)/\lambda\).

Gamma distribution

If \(X\sim \Gamma(k,\lambda)\) (shape \(k\), rate \(\lambda\)): \[ f_X(x)=\frac{\lambda^k}{\Gamma(k)}x^{k-1}e^{-\lambda x},\quad x>0. \]
Mean \(=k/\lambda\), variance \(=k/\lambda^2\).

Unit 3: Basic Statistics (Grouped and Ungrouped Data)

This section includes formulae for ungrouped and grouped data.

Ungrouped data (individual observations)

Suppose observations \(x_1,x_2,\dots,x_n\).

Arithmetic mean: \(\bar{x}=\dfrac{1}{n}\sum_{i=1}^n x_i.\)
Sample variance (unbiased): \(s^2=\dfrac{1}{n-1}\sum_{i=1}^n (x_i-\bar{x})^2.\)
Raw moment: \(m'_r=\dfrac{1}{n}\sum_{i=1}^n x_i^r.\)
Central moment: \(m_r=\dfrac{1}{n}\sum_{i=1}^n (x_i-\bar{x})^r.\)
Coefficient of variation: \(\mathrm{CV}=\dfrac{s}{\bar{x}}\times 100\%\) (for positive mean).

Grouped Data (Frequency Distribution)

Let class midpoints \(x_i\), class width \(h\) (equal classes), frequencies \(f_i\), total \(N=\sum f_i\).

Mean (grouped): \(\bar{x} = \dfrac{1}{N} \sum_i f_i x_i.\)
Median (grouped, continuous approximation): \[ \text{Median} = L + \frac{\frac{N}{2} - F}{f_m} \times h, \] where \(L\) = lower boundary of median class, \(F\) = cumulative frequency before median class, \(f_m\) = frequency of median class.
Mode (grouped): \[ \text{Mode} = L + \frac{f_m - f_{m-1}}{(2f_m - f_{m-1} - f_{m+1})} \times h. \]
Variance (grouped, population form): \[ \sigma^2 = \frac{1}{N}\sum_i f_i (x_i - \bar{x})^2. \]

Moments for Grouped Data

Raw moments: \(m_r' = \dfrac{1}{N}\sum_i f_i x_i^r.\)
Central moments: \(m_r = \dfrac{1}{N}\sum_i f_i (x_i - \bar{x})^r.\)

Relation between moments about assumed mean \(a\) and true mean:

\(\begin{aligned} m_1' &= a + m_1'',\\[6pt] m_2 &= m_2'' - (m_1'')^2,\\[6pt] m_3 &= m_3'' - 3m_2''m_1'' + 2(m_1'')^3,\\[6pt] m_4 &= m_4'' - 4m_3''m_1'' + 6m_2''(m_1'')^2 - 3(m_1'')^4. \end{aligned}\)

Percentiles / Quartiles (grouped)

\(P_k = L + \dfrac{\frac{k}{100}N - F}{f_c}\times h\)

Measures of Skewness and Kurtosis

Let \(\mu_k\) be the \(k\)-th central moment.

Coefficient of skewness (Karl Pearson's moment coefficient): \(\beta_1 = \dfrac{\mu_3^2}{\mu_2^3}.\)
Coefficient of kurtosis: \(\beta_2 = \dfrac{\mu_4}{\mu_2^2}.\)

Grouped vs. Ungrouped quick formulas (summary)

Ungrouped:     \(\bar{x}=\dfrac{1}{n}\sum x_i\)       Grouped:     \(\bar{x}=\dfrac{1}{N}\sum f_i x_i\)
Ungrouped:     \(s^2=\dfrac{1}{n-1}\sum (x_i-\bar{x})^2\)     Grouped:     \(s^2=\dfrac{1}{N-1}\sum f_i (x_i-\bar{x})^2\)

Correlation and Regression

Correlation

Sample covariance: \(S_{xy}=\sum_{i=1}^n (x_i-\bar{x})(y_i-\bar{y})=\sum x_i y_i - n\bar{x}\bar{y}.\)
Pearson correlation coefficient: \[ r=\frac{\sum (x_i-\bar{x})(y_i-\bar{y})}{\sqrt{\sum (x_i-\bar{x})^2\,\sum (y_i-\bar{y})^2}}. \]

Regression lines

Regression of \(y\) on \(x\): \(\hat{y}=a + b x,\quad b_{yx}=\dfrac{S_{xy}}{S_{xx}},\quad a=\bar{y}-b\bar{x}.\)
Relation with \(r\): \(b_{yx}=r\frac{s_y}{s_x},\qquad b_{xy}=r\frac{s_x}{s_y}.\)

Product of slopes: \(b_{yx}b_{xy}=r^2.\)

Spearman rank correlation

If \(d_i\) are rank differences, \[ R_s = 1 - \frac{6\sum d_i^2}{n(n^2 - 1)}. \]

Unit 4: Applied Statistics — Hypothesis Testing

General procedure

  1. State \(H_0\) and \(H_1\).
  2. Choose significance level \(\alpha\).
  3. Determine test statistic under \(H_0\).
  4. Compute observed value / p-value.
  5. Decision: reject or fail to reject \(H_0\).

Large-sample tests (Z-tests)

One-sample mean (known \(\sigma\)): \(Z=\dfrac{\bar{X}-\mu_0}{\sigma/\sqrt{n}}\sim N(0,1)\).

Small-sample tests (Student's t)

One-sample t: \(t=\dfrac{\bar{X}-\mu_0}{s/\sqrt{n}} \sim t_{n-1}.\)

t-test for correlation

Testing \(H_0: \rho=0\): \(t=\dfrac{r\sqrt{n-2}}{\sqrt{1-r^2}}\sim t_{n-2}.\)

F-test and Chi-square tests

F-test: \(F=\dfrac{s_1^2}{s_2^2}\sim F_{n_1-1,n_2-1}\).
Chi-square goodness-of-fit: \(\chi^2=\sum_{i=1}^k \dfrac{(O_i-E_i)^2}{E_i}\sim \chi^2_{k-c-1}.\)

Unit 5: Curve Fitting by Least Squares

Fitting straight line \(y=a+bx\)

Normal equations: \[ \sum y = na + b\sum x,\qquad \sum xy = a\sum x + b\sum x^2. \]
Solution: \[ b=\frac{n\sum xy - (\sum x)(\sum y)}{n\sum x^2 - (\sum x)^2},\qquad a=\bar{y}-b\bar{x}. \]

Fitting second degree parabola \(y=a+bx+cx^2\)

Normal equations: \[ \begin{cases} \sum y = n a + b\sum x + c\sum x^2,\\[4pt] \sum xy = a\sum x + b\sum x^2 + c\sum x^3,\\[4pt] \sum x^2y = a\sum x^2 + b\sum x^3 + c\sum x^4. \end{cases} \]

Fitting exponential curve \(y = ae^{bx}\)

Take \(\ln\): \(Y=\ln y, A=\ln a \Rightarrow Y = A + bx.\)

Normal equations for \(Y\): \(\sum Y = nA + b\sum x,\qquad \sum xY = A\sum x + b\sum x^2.\)

Then \(a=e^A\).

Fitting power curve \(y = ax^b\)

Take logs: \(Y=\ln y,\ X=\ln x,\ A=\ln a \Rightarrow Y=A+bX.\)

Normal equations: \(\sum Y = nA + b\sum X,\qquad \sum XY = A\sum X + b\sum X^2.\)

Fitting geometric (exponential-type) curve \(y = ab^x\)

Take logs: \(\ln y = \ln a + x \ln b\). Let \(A=\ln a,\ B=\ln b\).


Additional Important Formulae

\(E[aX+b]=aE[X]+b,\qquad \operatorname{Var}(aX+b)=a^2\operatorname{Var}(X).\)
If \(X,Y\) independent, \(E[XY]=E[X]E[Y],\ \operatorname{Var}(X+Y)=\operatorname{Var}(X)+\operatorname{Var}(Y).\)

Sampling distributions (brief)

For \(X_i\) i.i.d. with mean \(\mu\), variance \(\sigma^2\), \(\operatorname{Var}(\bar{X})=\dfrac{\sigma^2}{n}\). By CLT, \(\bar{X}\approx N(\mu,\sigma^2/n)\) for large \(n\).

Probability inequalities

Markov: \(P(X\ge a)\le \dfrac{E[X]}{a}\) for \(a>0\).
Chebyshev: \(P(|X-\mu|\ge k\sigma)\le \dfrac{1}{k^2}.\)

Appendices

Appendix A: Table of common distributions summary

DistributionPMF / PDFMean, Variance
Binomial \((n,p)\) \(\binom{n}{r}p^r(1-p)^{n-r}\) \(np,\ np(1-p)\)
Poisson \((\lambda)\) \(e^{-\lambda}\lambda^r/r!\) \(\lambda,\ \lambda\)
Normal \((\mu,\sigma^2)\) \(\dfrac{1}{\sigma\sqrt{2\pi}}e^{-(x-\mu)^2/(2\sigma^2)}\) \(\mu,\ \sigma^2\)
Exponential \((\lambda)\) \(\lambda e^{-\lambda x},\ x\ge0\) \(1/\lambda,\ 1/\lambda^2\)
Gamma \((k,\lambda)\) \(\dfrac{\lambda^k}{\Gamma(k)}x^{k-1}e^{-\lambda x}\) \(k/\lambda,\ k/\lambda^2\)

Appendix B: Quick reference for grouped data formulas

  • Mean: \(\bar{x}=\dfrac{1}{N}\sum f_i x_i\).
  • Median: \(L + \dfrac{N/2 - F}{f_m}\,h\).
  • Mode: \(L + \dfrac{f_m-f_{m-1}}{2f_m-f_{m-1}-f_{m+1}}\,h\).
  • Variance: \(\dfrac{1}{N}\sum f_i (x_i-\bar{x})^2\) (population).
Prepared for GTU: BE03000251 — Probability & Statistics

Comments

Popular posts from this blog

Mathematics in Indus valley civilization

Mathematics and the Indus Valley Civilization Mathematics and the Indus Valley Civilization 1. Historical Context of the Indus Valley Civilization Geographical Setting: The IVC was located in present-day northwest India and pakistan , primarily along the Indus River and its tributaries. Major cities included Harappa, Mohenjo-Daro, and Dholavira, known for their sophisticated urban planning. Timeframe: The civilization flourished between 3300 BCE and 1300 BCE, making it contemporary with ancient Mesopotamia and Egypt. It is believed to have declined around 1300 BCE due to various factors, including climate change and shifts in river patterns. Urban Planning: Cities were characterized by well-planned street grids, advanced drainage systems, and standardized fired brick structures. The use of mathematics was evident in the dimensions of buildings and the layout of streets. 2. Mathematical Knowledge in...

History of the October 1852 Calendar

History of the October 1582 Calendar The October 1582 Calendar: A Historic Transition The year 1582 marked a pivotal moment in the history of timekeeping. This year witnessed the adoption of the Gregorian calendar , which replaced the Julian calendar in several Catholic countries. This transition was introduced by Pope Gregory XIII to correct inaccuracies in the Julian calendar, particularly in the calculation of the spring equinox and Easter. Why the Change Was Necessary The Julian calendar, introduced by Julius Caesar in 45 BCE, was based on a solar year of 365.25 days. However, the true solar year is approximately 365.2422 days long. This slight discrepancy of about 11 minutes per year caused the calendar to drift gradually over centuries, misaligning with astronomical events like equinoxes. By the 16th century, the spring equinox had shifted by approximately 10 days, affecting the date of Easter . This was a significant concer...

Mathematics UZB Telegram Group

MATEMATIKA UZB Telegram Group Welcome to the MATEMATIKA UZB Telegram Group About the Group MATEMATIKA UZB is a dedicated Telegram group for math enthusiasts in Uzbekistan. The group is focused on solving various mathematics problems and sharing knowledge. Whether you're a student, teacher, or math lover, this community is a great place to discuss different mathematical topics, solve problems together, and improve your skills. In this group, you'll find: Problem-solving sessions Collaborative learning and discussions Support for various mathematics problems Helpful resources and guides for learning Group Rules Please be mindful of the following group rules to e...