{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Lecture 2: Basic data structures" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Lists" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A list in Python is a heterogeneous container for items. This would remind you of an array in many other languages (like C++, Java or C#), but since Python does not support arrays, we have lists. The initial items of a list are defined between brackets." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "neighbours = ['Austria', 'Slovakia', 'Ukraine', 'Romania', 'Serbia', 'Croatia', 'Slovenia']\n", "print(neighbours)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The items of a list can be accessed by the numerical indexes. (The first item is indexed with *zero*.)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(neighbours[0])\n", "print(neighbours[1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also access a range of elements:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(neighbours[2:5])\n", "print(neighbours[2:])\n", "print(neighbours[:5])\n", "print(neighbours[:])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The number of items in a list (its length) can also be easily fetched:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "len(neighbours)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using a *for* loop, the items of a list can be iterated over:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for country in neighbours:\n", " print(country)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lists are mutable, meaning there items and the number of items it contains can change dynamically after its initial definition. We can remove elements:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "neighbours.remove('Slovakia')\n", "print(neighbours)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Add new ones:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "neighbours.append('Czechoslovakia')\n", "print(neighbours)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The elements can also be removed from or inserted to a specific location:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "neighbours.pop(3)\n", "del neighbours[3]\n", "neighbours.insert(3, 'Yugoslavia')\n", "print(neighbours)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Copying a list can be a bit tricky:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "alias_list = neighbours\n", "copied_list_1 = neighbours.copy()\n", "copied_list_2 = neighbours[:]\n", "\n", "alias_list.clear()\n", "print(neighbours)\n", "print(copied_list_1)\n", "print(copied_list_2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Tuples" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Tuples are also a sequence of heterogeneous elements. Its initial elements are defined as a comma separated list, surrounded by parentheses." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "neighbours = ('Austria', 'Slovakia', 'Ukraine', 'Romania', 'Serbia', 'Croatia', 'Slovenia')\n", "print(neighbours)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The elements or even a range of elements can also be accessed by their index:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(neighbours[0])\n", "print(neighbours[2:5])\n", "print(len(neighbours))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The elements of tuple can also be fetched by *tuple unpacking*:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a, b, c, d, e, f, g = neighbours\n", "print(a, b, c, d, e, f, g)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "While lists are mutable, tuples are immutable, meaning that the elements cannot be modified:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "neighbours[0] = 'Renamed country'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "New elements can neither be added to a tuple. Removing existing elements is also not possible." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "neighbours.append('New country')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Though tuples may seem similar to lists, they are often used in different situations and for different purposes. Tuples are immutable, and usually contain a heterogeneous sequence of elements. Lists are mutable, and their elements are usually homogeneous and are accessed by iterating over the list." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We could saw that lists tuples and even strings have many common properties, such as indexing and slicing operations. They are **sequence data types**." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Sets" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Python also includes a data type for *sets*. A set is an unordered collection with no duplicate elements. Basic uses include membership testing and eliminating duplicate entries. Set objects also support mathematical operations like union, intersection, difference, and symmetric difference.\n", "\n", "Curly braces or the `set()` function can be used to create sets. Note: to create an empty set you have to use `set()`, not `{}`; the latter creates an empty dictionary, a data structure that we discuss in the next section." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "neighbours = {'Austria', 'Slovakia', 'Ukraine', 'Romania', 'Serbia', 'Croatia', 'Slovenia'}\n", "print(neighbours)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Membership testing:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print('Serbia' in neighbours)\n", "print('Germany' in neighbours)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Sets guarantee to contain no duplicate entries:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "neighbours.add('Ukraine')\n", "print(neighbours)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Demonstration of basic set operations:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "german_speakers = {'Germany', 'Austria', 'Switzerland'}" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(\"Union: %s\" % (neighbours | german_speakers))\n", "print(\"Intersection: %s\" % (neighbours & german_speakers))\n", "print(\"Difference: %s\" % (neighbours - german_speakers))\n", "print(\"Symmetric difference: %s\" % (neighbours ^ german_speakers))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Dictionaries" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Unlike sequences, which are indexed by a range of numbers, dictionaries are indexed by *keys*. It is best to think of a dictionary as a set of *key: value* pairs, with the requirement that the keys are unique (within one dictionary). A pair of braces creates an empty dictionary: `{}`. Placing a comma-separated list of key:value pairs within the braces adds initial key:value pairs to the dictionary; this is also the way dictionaries are written on output.\n", "\n", "Dictionaries are sometimes found in other languages as “associative memories” or “associative arrays”" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "areas = {'Austria': 83871,\n", " 'Slovakia': 49037,\n", " 'Ukraine': 603500,\n", " 'Romania': 238397,\n", " 'Serbia': 88361,\n", " 'Croatia': 56594,\n", " 'Slovenia': 20273}\n", "print(areas)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `dict()` constructor builds dictionaries directly from sequences of key-value pairs:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "areas = dict([('Austria', 83871),\n", " ('Slovakia', 49037),\n", " ('Ukraine', 603500),\n", " ('Romania', 238397),\n", " ('Serbia',88361),\n", " ('Croatia', 56594),\n", " ('Slovenia', 20273)])\n", "print(areas)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Elements of a dictionary can be accessed through their key:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(areas['Croatia'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Keys can be any immutable type; strings and numbers can always be keys. Tuples can be used as keys if they contain only strings, numbers, or tuples; if a tuple contains any mutable object either directly or indirectly, it cannot be used as a key. You can’t use lists as keys, since lists can be modified." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Dictionaries are also mutable:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "areas['Croatia'] += 2\n", "areas['Serbia'] -= 2\n", "print(areas['Croatia'])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "areas['Hungary'] = 93028\n", "print(areas)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "del areas['Slovakia']\n", "print(areas)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can still use a *for* loop to iterate through the *key: value* pairs in a dictionary:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for key, value in areas.items():\n", " print(\"%s: %d km2\" % (key, value))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Accessing the list of *keys* or *values* is also possible:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(areas.items())\n", "print(areas.keys())\n", "print(areas.values())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Stacks" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The list methods make it very easy to use a list as a stack, where the last element added is the first element retrieved (*last-in, first-out*). To add an item to the top of the stack, use `append()`. To retrieve an item from the top of the stack, use `pop()` without an explicit index." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "stack = [1, 2, 3, 4, 5]\n", "stack.append(6)\n", "stack.append(7)\n", "print(stack)\n", "print(stack.pop())\n", "print(stack.pop())\n", "print(stack)\n", "stack.append(8)\n", "stack.append(9)\n", "print(stack)\n", "\n", "print(\"Process all the elements of the stack:\")\n", "while len(stack) > 0:\n", " print(stack.pop())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Queues" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is also possible to use a list as a queue, where the first element added is the first element retrieved (*first-in, first-out*); however, **lists are not efficient for this purpose**. While appends and pops from the end of list are fast, doing inserts or pops from the beginning of a list is slow (because all of the other elements have to be shifted by one)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Short outlook on modules" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In Python a logical unit of defintions (*variables, functions, classes*) shall be put in a standalone file to support the easy reuse of the code. Such a file is called a *module*; definitions from a module can be *imported* into other modules or into the *main* module.\n", "\n", "There are many built-in modules, e.g. the `math` module:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import math\n", "print(math.pi) # using a variable definition from module math\n", "print(math.factorial(10)) # using a function definition from module math" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can easily get a documentation for a module:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "help(math)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To implement a queue, use `collections.deque` which was designed to have fast appends and pops from both ends." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from collections import deque\n", "\n", "queue = deque([1, 2, 3, 4, 5])\n", "queue.append(6)\n", "queue.append(7)\n", "print(queue)\n", "print(queue.popleft())\n", "print(queue.popleft())\n", "print(queue)\n", "queue.append(8)\n", "queue.append(9)\n", "print(queue)\n", "\n", "print(\"Process all the elements of the stack:\")\n", "while len(queue) > 0:\n", " print(queue.popleft())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Summary excercise on basic data types" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Task:** request numbers from the user until the text *quit* is typed in. Ignore any other non-numeric input. Place the inputted numbers into a list and display them in a reversed order." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "numbers = []\n", "user_input = input('Next number: ')\n", "\n", "while user_input != 'quit':\n", " try:\n", " num = int(user_input)\n", " numbers.append(num)\n", " except:\n", " print('It is not a number, skipped!')\n", " \n", " user_input = input('Next number: ')\n", "\n", "# Iterate through the list with a:\n", "# - start index: len(numbers) - 1\n", "# - end index: 0 (-1 is exclusive)\n", "# - incremental step: -1\n", "print('Numbers is reversed order:')\n", "for i in range(len(numbers) - 1, -1, -1):\n", " print(numbers[i])\n", "\n", "# Or we can simply reverse a list:\n", "print('Numbers in reversed order: %s' % list(reversed(numbers)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Tabular data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Pandas** is a high-level data manipulation tool for Python. Its key data structure is called the *DataFrame*. DataFrames allow you to store and manipulate tabular data in rows of observations and columns of variables." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are several ways to create a DataFrame. One way way is to use a list. For example:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "\n", "neighbours = ['Austria', 'Slovakia', 'Ukraine', 'Romania', 'Serbia', 'Croatia', 'Slovenia']\n", "\n", "# Calling DataFrame constructor on list\n", "df = pd.DataFrame(neighbours)\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or a dictionary to have multiple columns:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "areas = { 'Country': ['Austria', 'Slovakia', 'Ukraine', 'Romania', 'Serbia', 'Croatia', 'Slovenia'],\n", " 'Area': [83871, 49037, 603500, 238397, 88361, 56594, 20273]\n", " }\n", "\n", "# Calling DataFrame constructor on dictionary\n", "df = pd.DataFrame(areas)\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Read a CSV file" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "neighbours_df = pd.read_csv(\"02_neighbours.csv\")\n", "neighbours_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Plotting" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Matplotlib* is the most popular 2D plotting library in Python. Using matplotlib, you can create pretty much any type of plot. \n", "\n", "*Pandas* has **tight integration** with *matplotlib*." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "\n", "neighbours_df.plot(kind='bar',x='Country',y='Area')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Summary excercise on tabular data and plotting" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Task:** read Hungary's historical population data from `02_population.csv`. Show the data on a line diagram!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "population_df = pd.read_csv(\"02_population.csv\")\n", "display(population_df) # display is a special Jupyter Notebook function to provide a pretty display of complex data\n", "\n", "population_df.plot(kind='line',x='Year',y='Population')\n", "plt.show()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.6" } }, "nbformat": 4, "nbformat_minor": 2 }