{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Lecture 3: Basic algorithms"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Random generation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Random numbers can be generated with the `random` module."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import random\n",
    "number = random.randint(1, 100)\n",
    "print(number)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Generate 10 numbers between 1 and 100:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "numbers = []\n",
    "for i in range(10):\n",
    "    numbers.append(random.randint(1, 100))\n",
    "print(numbers)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Same with using Python's list generator expressions:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "numbers = [random.randint(1, 100) for _ in range(10)]\n",
    "print(numbers)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*Help:*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "numbers = [10, 20, 30, 40, 50]\n",
    "print(numbers)\n",
    "\n",
    "# iteration\n",
    "halves = []\n",
    "for x in numbers:\n",
    "    halves.append(x / 2)\n",
    "print(halves)\n",
    "\n",
    "# list generation\n",
    "halves = [x / 2 for x in numbers]\n",
    "print(halves)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Pseudo-random numbers"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There is no real randomicity in computer science (unless you build a device which measures e.g. cosmic radiation). These numbers generated are so called pseudo-random numbers. They are generated by a deterministic mathematical algorithm along a uniform distribution.\n",
    "\n",
    "The following algorithm will always generate the same number, because we seeded the algorithm a fix initial value:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "random.seed(42) # To reproduce the same results\n",
    "data = [random.randint(1, 100) for _ in range(10)]\n",
    "print(data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "By default the seed value is related to the milliseconds passed since the start of the computer, hence it will look like real \"random\" numbers."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Summation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let $f: [m..n] \\rightarrow H$ be a function. Let the addition operator $+$ be defined over the elements of $H$, which is an associative operation with a left identity element $0$. Our task is to summarize the values of $f$ function over the interval.\n",
    "Formally:\n",
    "$$s = \\sum_{i=m}^{n} f(i)$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Theoretical way"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "result = 0\n",
    "for i in range(0, len(data)):\n",
    "    result += data[i]\n",
    "print(\"Sum: %d\" % result)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Pythonic way"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "result = 0\n",
    "for value in data:\n",
    "    result += value\n",
    "print(\"Sum: %d\" % result)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Built-in function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "result = sum(data)\n",
    "print(\"Sum: %d\" % result)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Counting"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Given the $[m..n]$ interval, count the number of items inside it.\n",
    "Formally:\n",
    "$$s = \\sum_{\\substack{i=m}}^{n} 1$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Theoretical way"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "result = 0\n",
    "for i in range(0, len(data)):\n",
    "    result += 1\n",
    "print(\"Count: %d\" % result)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Pythonic way"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "result = sum([1 for _ in data])\n",
    "print(\"Count: %d\" % result)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Built-in function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "result = len(data)\n",
    "print(\"Count: %d\" % result)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Maximum search"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let $f: [m..n] \\rightarrow H$ be a function, $m \\le n$. Over the elements of $H$ let a total ordering relation be defined (reflexivity, antisymmetry, tranzitivity and connexcity), with a symbol $\\le$, for the strict version $<$.\n",
    "Our task is to determine the greatest value in the interval. Also determine an element of the interval, where function $f$ evalutes to this greatest value.\n",
    "Formally:\n",
    "$$max = f(ind) \\wedge \\forall i \\in [m..n]: f(i) \\le f(ind)$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Theoretical way"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "result = data[0]\n",
    "index = 0\n",
    "for i in range(1, len(data)):\n",
    "    if data[i] > result:\n",
    "        result = data[i]\n",
    "        index = i\n",
    "print(\"Max: %d, Index: %d\" % (result, index))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Pythonic way"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "result = data[0]\n",
    "index = 0\n",
    "for idx, value in enumerate(data[1:], start = 1):\n",
    "    if value > result:\n",
    "        result = value\n",
    "        index = idx\n",
    "print(\"Max: %d, Index: %d\" % (result, index))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Built-in function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "result = max(data)\n",
    "print(\"Max: %d\" % result)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Linear search"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let $\\beta: [m..n] \\rightarrow \\mathbb{L}$ condition be defined. Determine the first element of the interval which fulfills the condition (if any).\n",
    "Formally:\n",
    "\\begin{equation*}\n",
    "\t\\begin{aligned}\n",
    "\t\tl = (\\exists i \\in [m..n]:\\beta(i)) \\\\\n",
    "\t\tl \\rightarrow (ind \\in [m..n] \\wedge \\beta(ind) \\wedge \\forall i \\in [m..ind-1]: \\neg\\beta(i))\n",
    "\t\\end{aligned}\n",
    "\\end{equation*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Beta condition"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Introduce an `is_even` function which determine whether a number is even or not:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def is_even(number):\n",
    "    return number % 2 == 0"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Theoretical way"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "result = 0\n",
    "index = 0\n",
    "found = False\n",
    "i = 0\n",
    "while not found and i < len(data):\n",
    "    if is_even(data[i]):\n",
    "        result = data[i]\n",
    "        index = i\n",
    "        found = True\n",
    "    i += 1\n",
    "if found:\n",
    "    print(\"Linear search: %d, Index: %d\" % (result, index))\n",
    "else:\n",
    "    print(\"Linear search did not found an appropriate item\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Pythonic way"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "result = [x for x in data if is_even(x)]\n",
    "if len(result) > 0:\n",
    "    print(\"Linear search: %d\" % result)\n",
    "else:\n",
    "    print(\"Linear search did not found an appropriate item\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Built-in function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "result = filter(is_even, data)\n",
    "print(\"Linear search: %d\" % next(result, None))\n",
    "#print(\"Linear search: %s\" % list(result))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Conditional summation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The algorithm of summation can be further generalized when a $\\beta: [m..n] \\rightarrow \\mathbb{L}$ condition is defined to restrict the set of elements.\n",
    "\n",
    "Let $f: [m..n] \\rightarrow H$ be a function and $\\beta: [m..n] \\rightarrow \\mathbb{L}$ a condition. Let the addition operator $+$ be defined over the elements of $H$, which is an associative operation with a left identity element $0$. Our task is to summarize the values of $f$ function over the interval where the $\\beta$ condition is fulfilled.\n",
    "Formally:\n",
    "$$s = \\sum_{\\substack{i=m\\\\ \\beta(i)}}^{n} f(i)$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Theoretical way"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "result = 0\n",
    "for i in range(0, len(data)):\n",
    "    if is_even(data[i]):\n",
    "        result += data[i]\n",
    "print(\"Sum: %d\" % result)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Pythonic way"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "result = 0\n",
    "for value in data:\n",
    "    if is_even(value):\n",
    "        result += value\n",
    "print(\"Sum: %d\" % result)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Built-in function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "result = sum(filter(is_even, data))\n",
    "print(\"Sum: %d\" % result)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Conditional counting"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let $\\beta: [m..n] \\rightarrow \\mathbb{L}$ be a condition. Count how many items of the interval fulfills the condition!\n",
    "Formally:\n",
    "$$s = \\sum_{\\substack{i=m\\\\ \\beta(i)}}^{n} 1$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Theoretical way"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "result = 0\n",
    "for i in range(0, len(data)):\n",
    "    if is_even(data[i]):\n",
    "        result += 1\n",
    "print(\"Count: %d\" % result)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Pythonic way"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "result = sum([1 for x in data if is_even(x)])\n",
    "print(\"Count: %d\" % result)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Built-in function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "result = len(list(filter(is_even, data)))\n",
    "print(\"Count: %d\" % result)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Conditional maximum search"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The algorithm of maximum search can be further generalized with combining the $\\beta: [m..n] \\rightarrow \\mathbb{L}$ condition used in *linear search* as a restriction. Note that now the existence of a maximum value is not guaranteed.\n",
    "\n",
    "Let $f: [m..n] \\rightarrow H$ be a function and $\\beta: [m..n] \\rightarrow \\mathbb{L}$ a condition. Over the elements of $H$ let a total ordering relation be defined (reflexivity, antisymmetry, tranzitivity and connexcity), with a symbol $\\le$, for the strict version $<$.\n",
    "Our task is to determine the greatest value in the interval which fulfills the $\\beta$ condition. Also determine an element of the interval, where function $f$ evalutes to this greatest value.\n",
    "Formally:\n",
    "\\begin{equation*}\n",
    "\t\\begin{aligned}\n",
    "\t\tl = (\\exists i \\in [m..n]:\\beta(i)) \\\\\n",
    "\t\tl \\rightarrow (\\beta(ind) \\wedge max = f(ind) \\wedge \\forall i \\in [m..n]: \\beta(i) \\rightarrow f(i) \\le f(ind))\n",
    "\t\\end{aligned}\n",
    "\\end{equation*}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Theoretical way"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "result = None\n",
    "index = -1\n",
    "for i in range(0, len(data)):\n",
    "    if is_even(data[i]) and (result == None or data[i] > result):\n",
    "        result = data[i]\n",
    "        index = i\n",
    "print(\"Max: %d, Index: %d\" % (result, index))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Pythonic way"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "result = None\n",
    "index = -1\n",
    "for idx, value in enumerate(data):\n",
    "    if is_even(value) and (result == None or value > result):\n",
    "        result = value\n",
    "        index = idx\n",
    "print(\"Max: %d, Index: %d\" % (result, index))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Built-in function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "result = max(filter(is_even, data))\n",
    "print(\"Max: %d\" % result)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exercise"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Task:** read the area and population data from `02_neighbours.csv`. Calculate the population density for each neighbouring country. Determine which country has the highest population density."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Approach 1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "By manually iterating through all the row with the `iterrows()` function. Example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "\n",
    "neighbours_df = pd.read_csv(\"02_neighbours.csv\")\n",
    "for index, row in neighbours_df.iterrows():\n",
    "    print(\"Index: %d, Name: %s\" % (index, row[\"Country\"]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now calculate the maximum density:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "\n",
    "max_density = 0\n",
    "max_index = -1\n",
    "neighbours_df = pd.read_csv(\"02_neighbours.csv\")\n",
    "display(neighbours_df) # display is a special Jupyter Notebook function to provide a pretty display of complex data\n",
    "\n",
    "for index, row in neighbours_df.iterrows():\n",
    "    density = row['Population'] / row['Area']\n",
    "    if density > max_density:\n",
    "        max_density = density\n",
    "        max_index = index\n",
    "        \n",
    "print(\"Maximum density: %s, %f\" % (neighbours_df['Country'][max_index], max_density))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Approach 2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "By adding a new column to the *Pandas DataFrames*:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "\n",
    "def population_density(row):\n",
    "    return row['Population'] / row['Area']\n",
    "\n",
    "neighbours_df = pd.read_csv(\"02_neighbours.csv\")\n",
    "neighbours_df['Density'] = neighbours_df.apply(population_density, axis = 1)\n",
    "\n",
    "display(neighbours_df) # display is a special Jupyter Notebook function to provide a pretty display of complex data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then use the `max()` and `idxmax()` function of the *Pandas DataFrames*:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "max_density = neighbours_df['Density'].max()\n",
    "max_index = neighbours_df['Density'].idxmax()\n",
    "max_name = neighbours_df['Country'][max_index]\n",
    "\n",
    "print(\"Max density: %f\" % max_density)\n",
    "print(\"Index: %d\" % max_index)\n",
    "print(\"Name: %s\" % max_name)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Logarithmic search (optional)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*Also called binary search.*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let $f: [m..n] \\rightarrow H$ be a *monotonically increasing* function. Over the elements of $H$ let a total ordering relation be defined (reflexivity, antisymmetry, tranzitivity and connexcity), with a symbol $\\le$, for the strict version $<$.\n",
    "Determine whether function $f$ evaluates to a given value $h \\in H$ at any location. If yes, specify such a location.\n",
    "Formally:\n",
    "$$l = (\\exists i \\in [m..n]: f(i) = h) \\wedge l \\rightarrow f(ind) = h$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def log_search(elements, value):\n",
    "    first = 0\n",
    "    last = len(elements) - 1\n",
    "    while first <= last:\n",
    "        i = (first + last) // 2\n",
    "        if elements[i] == value:\n",
    "            return i\n",
    "        elif elements[i] < value:\n",
    "            first = i + 1\n",
    "        else:\n",
    "            last = i - 1\n",
    "    return -1\n",
    "\n",
    "data_sorted = sorted(data)\n",
    "print(\"Sorted data: %s\" % data_sorted)\n",
    "\n",
    "index = log_search(data_sorted, data[0])\n",
    "print(\"Logaritmic search: value=%d, index=%d\" % (data[0], index))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}