{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Lecture 3: Basic algorithms" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Random generation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Random numbers can be generated with the `random` module." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import random\n", "number = random.randint(1, 100)\n", "print(number)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Generate 10 numbers between 1 and 100:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "numbers = []\n", "for i in range(10):\n", " numbers.append(random.randint(1, 100))\n", "print(numbers)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Same with using Python's list generator expressions:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "numbers = [random.randint(1, 100) for _ in range(10)]\n", "print(numbers)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Help:*" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "numbers = [10, 20, 30, 40, 50]\n", "print(numbers)\n", "\n", "# iteration\n", "halves = []\n", "for x in numbers:\n", " halves.append(x / 2)\n", "print(halves)\n", "\n", "# list generation\n", "halves = [x / 2 for x in numbers]\n", "print(halves)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Pseudo-random numbers" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There is no real randomicity in computer science (unless you build a device which measures e.g. cosmic radiation). These numbers generated are so called pseudo-random numbers. They are generated by a deterministic mathematical algorithm along a uniform distribution.\n", "\n", "The following algorithm will always generate the same number, because we seeded the algorithm a fix initial value:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "random.seed(42) # To reproduce the same results\n", "data = [random.randint(1, 100) for _ in range(10)]\n", "print(data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "By default the seed value is related to the milliseconds passed since the start of the computer, hence it will look like real \"random\" numbers." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Summation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let $f: [m..n] \\rightarrow H$ be a function. Let the addition operator $+$ be defined over the elements of $H$, which is an associative operation with a left identity element $0$. Our task is to summarize the values of $f$ function over the interval.\n", "Formally:\n", "$$s = \\sum_{i=m}^{n} f(i)$$" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Counting" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Given the $[m..n]$ interval, count the number of items inside it.\n", "Formally:\n", "$$s = \\sum_{\\substack{i=m}}^{n} 1$$" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Maximum search" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let $f: [m..n] \\rightarrow H$ be a function, $m \\le n$. Over the elements of $H$ let a total ordering relation be defined (reflexivity, antisymmetry, tranzitivity and connexcity), with a symbol $\\le$, for the strict version $<$.\n", "Our task is to determine the greatest value in the interval. Also determine an element of the interval, where function $f$ evalutes to this greatest value.\n", "Formally:\n", "$$max = f(ind) \\wedge \\forall i \\in [m..n]: f(i) \\le f(ind)$$" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Linear search" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let $\\beta: [m..n] \\rightarrow \\mathbb{L}$ condition be defined. Determine the first element of the interval which fulfills the condition (if any).\n", "Formally:\n", "\\begin{equation*}\n", "\t\\begin{aligned}\n", "\t\tl = (\\exists i \\in [m..n]:\\beta(i)) \\\\\n", "\t\tl \\rightarrow (ind \\in [m..n] \\wedge \\beta(ind) \\wedge \\forall i \\in [m..ind-1]: \\neg\\beta(i))\n", "\t\\end{aligned}\n", "\\end{equation*}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Beta condition" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Introduce an `is_even` function which determine whether a number is even or not:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def is_even(number):\n", " return number % 2 == 0" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Solution" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Conditional summation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The algorithm of summation can be further generalized when a $\\beta: [m..n] \\rightarrow \\mathbb{L}$ condition is defined to restrict the set of elements.\n", "\n", "Let $f: [m..n] \\rightarrow H$ be a function and $\\beta: [m..n] \\rightarrow \\mathbb{L}$ a condition. Let the addition operator $+$ be defined over the elements of $H$, which is an associative operation with a left identity element $0$. Our task is to summarize the values of $f$ function over the interval where the $\\beta$ condition is fulfilled.\n", "Formally:\n", "$$s = \\sum_{\\substack{i=m\\\\ \\beta(i)}}^{n} f(i)$$" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Conditional counting" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let $\\beta: [m..n] \\rightarrow \\mathbb{L}$ be a condition. Count how many items of the interval fulfills the condition!\n", "Formally:\n", "$$s = \\sum_{\\substack{i=m\\\\ \\beta(i)}}^{n} 1$$" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Conditional maximum search" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The algorithm of maximum search can be further generalized with combining the $\\beta: [m..n] \\rightarrow \\mathbb{L}$ condition used in *linear search* as a restriction. Note that now the existence of a maximum value is not guaranteed.\n", "\n", "Let $f: [m..n] \\rightarrow H$ be a function and $\\beta: [m..n] \\rightarrow \\mathbb{L}$ a condition. Over the elements of $H$ let a total ordering relation be defined (reflexivity, antisymmetry, tranzitivity and connexcity), with a symbol $\\le$, for the strict version $<$.\n", "Our task is to determine the greatest value in the interval which fulfills the $\\beta$ condition. Also determine an element of the interval, where function $f$ evalutes to this greatest value.\n", "Formally:\n", "\\begin{equation*}\n", "\t\\begin{aligned}\n", "\t\tl = (\\exists i \\in [m..n]:\\beta(i)) \\\\\n", "\t\tl \\rightarrow (\\beta(ind) \\wedge max = f(ind) \\wedge \\forall i \\in [m..n]: \\beta(i) \\rightarrow f(i) \\le f(ind))\n", "\t\\end{aligned}\n", "\\end{equation*}" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exercise" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Task:** read the area and population data from `02_neighbours.csv`. Calculate the population density for each neighbouring country. Determine which country has the highest population density." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Approach 1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "By manually iterating through all the row with the `iterrows()` function. Example:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "\n", "neighbours_df = pd.read_csv(\"02_neighbours.csv\")\n", "for index, row in neighbours_df.iterrows():\n", " print(\"Index: %d, Name: %s\" % (index, row[\"Country\"]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now calculate the maximum density:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Approach 2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "By adding a new column to the *Pandas DataFrames*:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "\n", "def population_density(row):\n", " return row['Population'] / row['Area']\n", "\n", "neighbours_df = pd.read_csv(\"02_neighbours.csv\")\n", "neighbours_df['Density'] = neighbours_df.apply(population_density, axis = 1)\n", "\n", "display(neighbours_df) # display is a special Jupyter Notebook function to provide a pretty display of complex data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then use the `max()` and `idxmax()` function of the *Pandas DataFrames*:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Logarithmic search (optional)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Also called binary search.*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let $f: [m..n] \\rightarrow H$ be a *monotonically increasing* function. Over the elements of $H$ let a total ordering relation be defined (reflexivity, antisymmetry, tranzitivity and connexcity), with a symbol $\\le$, for the strict version $<$.\n", "Determine whether function $f$ evaluates to a given value $h \\in H$ at any location. If yes, specify such a location.\n", "Formally:\n", "$$l = (\\exists i \\in [m..n]: f(i) = h) \\wedge l \\rightarrow f(ind) = h$$" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.6" } }, "nbformat": 4, "nbformat_minor": 2 }