{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Lecture 3: Basic algorithms" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Random generation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Random numbers can be generated with the `random` module." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import random\n", "number = random.randint(1, 100)\n", "print(number)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Generate 10 numbers between 1 and 100:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "numbers = []\n", "for i in range(10):\n", " numbers.append(random.randint(1, 100))\n", "print(numbers)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Same with using Python's list generator expressions:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "numbers = [random.randint(1, 100) for _ in range(10)]\n", "print(numbers)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Help:*" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "numbers = [10, 20, 30, 40, 50]\n", "print(numbers)\n", "\n", "# iteration\n", "halves = []\n", "for x in numbers:\n", " halves.append(x / 2)\n", "print(halves)\n", "\n", "# list generation\n", "halves = [x / 2 for x in numbers]\n", "print(halves)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Pseudo-random numbers" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There is no real randomicity in computer science (unless you build a device which measures e.g. cosmic radiation). These numbers generated are so called pseudo-random numbers. They are generated by a deterministic mathematical algorithm along a uniform distribution.\n", "\n", "The following algorithm will always generate the same number, because we seeded the algorithm a fix initial value:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "random.seed(42) # To reproduce the same results\n", "data = [random.randint(1, 100) for _ in range(10)]\n", "print(data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "By default the seed value is related to the milliseconds passed since the start of the computer, hence it will look like real \"random\" numbers." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Summation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let $f: [m..n] \\rightarrow H$ be a function. Let the addition operator $+$ be defined over the elements of $H$, which is an associative operation with a left identity element $0$. Our task is to summarize the values of $f$ function over the interval.\n", "Formally:\n", "$$s = \\sum_{i=m}^{n} f(i)$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Theoretical way" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "result = 0\n", "for i in range(0, len(data)):\n", " result += data[i]\n", "print(\"Sum: %d\" % result)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Pythonic way" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "result = 0\n", "for value in data:\n", " result += value\n", "print(\"Sum: %d\" % result)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Built-in function" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "result = sum(data)\n", "print(\"Sum: %d\" % result)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Counting" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Given the $[m..n]$ interval, count the number of items inside it.\n", "Formally:\n", "$$s = \\sum_{\\substack{i=m}}^{n} 1$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Theoretical way" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "result = 0\n", "for i in range(0, len(data)):\n", " result += 1\n", "print(\"Count: %d\" % result)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Pythonic way" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "result = sum([1 for _ in data])\n", "print(\"Count: %d\" % result)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Built-in function" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "result = len(data)\n", "print(\"Count: %d\" % result)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Maximum search" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let $f: [m..n] \\rightarrow H$ be a function, $m \\le n$. Over the elements of $H$ let a total ordering relation be defined (reflexivity, antisymmetry, tranzitivity and connexcity), with a symbol $\\le$, for the strict version $<$.\n", "Our task is to determine the greatest value in the interval. Also determine an element of the interval, where function $f$ evalutes to this greatest value.\n", "Formally:\n", "$$max = f(ind) \\wedge \\forall i \\in [m..n]: f(i) \\le f(ind)$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Theoretical way" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "result = data[0]\n", "index = 0\n", "for i in range(1, len(data)):\n", " if data[i] > result:\n", " result = data[i]\n", " index = i\n", "print(\"Max: %d, Index: %d\" % (result, index))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Pythonic way" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "result = data[0]\n", "index = 0\n", "for idx, value in enumerate(data[1:], start = 1):\n", " if value > result:\n", " result = value\n", " index = idx\n", "print(\"Max: %d, Index: %d\" % (result, index))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Built-in function" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "result = max(data)\n", "print(\"Max: %d\" % result)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Linear search" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let $\\beta: [m..n] \\rightarrow \\mathbb{L}$ condition be defined. Determine the first element of the interval which fulfills the condition (if any).\n", "Formally:\n", "\\begin{equation*}\n", "\t\\begin{aligned}\n", "\t\tl = (\\exists i \\in [m..n]:\\beta(i)) \\\\\n", "\t\tl \\rightarrow (ind \\in [m..n] \\wedge \\beta(ind) \\wedge \\forall i \\in [m..ind-1]: \\neg\\beta(i))\n", "\t\\end{aligned}\n", "\\end{equation*}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Beta condition" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Introduce an `is_even` function which determine whether a number is even or not:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def is_even(number):\n", " return number % 2 == 0" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Theoretical way" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "result = 0\n", "index = 0\n", "found = False\n", "i = 0\n", "while not found and i < len(data):\n", " if is_even(data[i]):\n", " result = data[i]\n", " index = i\n", " found = True\n", " i += 1\n", "if found:\n", " print(\"Linear search: %d, Index: %d\" % (result, index))\n", "else:\n", " print(\"Linear search did not found an appropriate item\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Pythonic way" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "result = [x for x in data if is_even(x)]\n", "if len(result) > 0:\n", " print(\"Linear search: %d\" % result)\n", "else:\n", " print(\"Linear search did not found an appropriate item\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Built-in function" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "result = filter(is_even, data)\n", "print(\"Linear search: %d\" % next(result, None))\n", "#print(\"Linear search: %s\" % list(result))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Conditional summation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The algorithm of summation can be further generalized when a $\\beta: [m..n] \\rightarrow \\mathbb{L}$ condition is defined to restrict the set of elements.\n", "\n", "Let $f: [m..n] \\rightarrow H$ be a function and $\\beta: [m..n] \\rightarrow \\mathbb{L}$ a condition. Let the addition operator $+$ be defined over the elements of $H$, which is an associative operation with a left identity element $0$. Our task is to summarize the values of $f$ function over the interval where the $\\beta$ condition is fulfilled.\n", "Formally:\n", "$$s = \\sum_{\\substack{i=m\\\\ \\beta(i)}}^{n} f(i)$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Theoretical way" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "result = 0\n", "for i in range(0, len(data)):\n", " if is_even(data[i]):\n", " result += data[i]\n", "print(\"Sum: %d\" % result)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Pythonic way" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "result = 0\n", "for value in data:\n", " if is_even(value):\n", " result += value\n", "print(\"Sum: %d\" % result)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Built-in function" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "result = sum(filter(is_even, data))\n", "print(\"Sum: %d\" % result)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Conditional counting" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let $\\beta: [m..n] \\rightarrow \\mathbb{L}$ be a condition. Count how many items of the interval fulfills the condition!\n", "Formally:\n", "$$s = \\sum_{\\substack{i=m\\\\ \\beta(i)}}^{n} 1$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Theoretical way" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "result = 0\n", "for i in range(0, len(data)):\n", " if is_even(data[i]):\n", " result += 1\n", "print(\"Count: %d\" % result)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Pythonic way" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "result = sum([1 for x in data if is_even(x)])\n", "print(\"Count: %d\" % result)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Built-in function" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "result = len(list(filter(is_even, data)))\n", "print(\"Count: %d\" % result)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Conditional maximum search" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The algorithm of maximum search can be further generalized with combining the $\\beta: [m..n] \\rightarrow \\mathbb{L}$ condition used in *linear search* as a restriction. Note that now the existence of a maximum value is not guaranteed.\n", "\n", "Let $f: [m..n] \\rightarrow H$ be a function and $\\beta: [m..n] \\rightarrow \\mathbb{L}$ a condition. Over the elements of $H$ let a total ordering relation be defined (reflexivity, antisymmetry, tranzitivity and connexcity), with a symbol $\\le$, for the strict version $<$.\n", "Our task is to determine the greatest value in the interval which fulfills the $\\beta$ condition. Also determine an element of the interval, where function $f$ evalutes to this greatest value.\n", "Formally:\n", "\\begin{equation*}\n", "\t\\begin{aligned}\n", "\t\tl = (\\exists i \\in [m..n]:\\beta(i)) \\\\\n", "\t\tl \\rightarrow (\\beta(ind) \\wedge max = f(ind) \\wedge \\forall i \\in [m..n]: \\beta(i) \\rightarrow f(i) \\le f(ind))\n", "\t\\end{aligned}\n", "\\end{equation*}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Theoretical way" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "result = None\n", "index = -1\n", "for i in range(0, len(data)):\n", " if is_even(data[i]) and (result == None or data[i] > result):\n", " result = data[i]\n", " index = i\n", "print(\"Max: %d, Index: %d\" % (result, index))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Pythonic way" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "result = None\n", "index = -1\n", "for idx, value in enumerate(data):\n", " if is_even(value) and (result == None or value > result):\n", " result = value\n", " index = idx\n", "print(\"Max: %d, Index: %d\" % (result, index))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Built-in function" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "result = max(filter(is_even, data))\n", "print(\"Max: %d\" % result)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exercise" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Task:** read the area and population data from `02_neighbours.csv`. Calculate the population density for each neighbouring country. Determine which country has the highest population density." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Approach 1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "By manually iterating through all the row with the `iterrows()` function. Example:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "\n", "neighbours_df = pd.read_csv(\"02_neighbours.csv\")\n", "for index, row in neighbours_df.iterrows():\n", " print(\"Index: %d, Name: %s\" % (index, row[\"Country\"]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now calculate the maximum density:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "\n", "max_density = 0\n", "max_index = -1\n", "neighbours_df = pd.read_csv(\"02_neighbours.csv\")\n", "display(neighbours_df) # display is a special Jupyter Notebook function to provide a pretty display of complex data\n", "\n", "for index, row in neighbours_df.iterrows():\n", " density = row['Population'] / row['Area']\n", " if density > max_density:\n", " max_density = density\n", " max_index = index\n", " \n", "print(\"Maximum density: %s, %f\" % (neighbours_df['Country'][max_index], max_density))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Approach 2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "By adding a new column to the *Pandas DataFrames*:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "\n", "def population_density(row):\n", " return row['Population'] / row['Area']\n", "\n", "neighbours_df = pd.read_csv(\"02_neighbours.csv\")\n", "neighbours_df['Density'] = neighbours_df.apply(population_density, axis = 1)\n", "\n", "display(neighbours_df) # display is a special Jupyter Notebook function to provide a pretty display of complex data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then use the `max()` and `idxmax()` function of the *Pandas DataFrames*:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "max_density = neighbours_df['Density'].max()\n", "max_index = neighbours_df['Density'].idxmax()\n", "max_name = neighbours_df['Country'][max_index]\n", "\n", "print(\"Max density: %f\" % max_density)\n", "print(\"Index: %d\" % max_index)\n", "print(\"Name: %s\" % max_name)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Logarithmic search (optional)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Also called binary search.*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let $f: [m..n] \\rightarrow H$ be a *monotonically increasing* function. Over the elements of $H$ let a total ordering relation be defined (reflexivity, antisymmetry, tranzitivity and connexcity), with a symbol $\\le$, for the strict version $<$.\n", "Determine whether function $f$ evaluates to a given value $h \\in H$ at any location. If yes, specify such a location.\n", "Formally:\n", "$$l = (\\exists i \\in [m..n]: f(i) = h) \\wedge l \\rightarrow f(ind) = h$$" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def log_search(elements, value):\n", " first = 0\n", " last = len(elements) - 1\n", " while first <= last:\n", " i = (first + last) // 2\n", " if elements[i] == value:\n", " return i\n", " elif elements[i] < value:\n", " first = i + 1\n", " else:\n", " last = i - 1\n", " return -1\n", "\n", "data_sorted = sorted(data)\n", "print(\"Sorted data: %s\" % data_sorted)\n", "\n", "index = log_search(data_sorted, data[0])\n", "print(\"Logaritmic search: value=%d, index=%d\" % (data[0], index))" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.6" } }, "nbformat": 4, "nbformat_minor": 2 }