{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Lecture 8: Graph algorithms II. - minimum spanning tree" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The sample dataset for this lecture is given in the `08_telepulesek.shp` and it contains the metadata and location of Hungarian cities and towns." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Read the Shapefile into a Pandas DataFrame" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Open a shapefile (.shp). \n", "dBase file of attributes (.dbf) is automatically detected by name convention.\n", "\n", "*Source: https://gis.inf.elte.hu/files/public/elte-telepulesek*" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import shapefile\n", "\n", "sf = shapefile.Reader('08_telepulesek.shp', encoding = 'latin1')\n", "\n", "# Check whether files contains points\n", "if sf.shapeType == shapefile.POINT:\n", " print(\"This file contains points\")\n", " \n", " # Print some file level metadata\n", " print(\"Number of cities: %d\" % len(sf.shapes()))\n", " print(\"Attributes: %s\" % sf.fields)\n", "else:\n", " print(\"This file does not contain points\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Step 1**: Collect the field names into a list." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fields = [x[0] for x in sf.fields][1:]\n", "print(fields)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Step 2**: Fetch the records, then collect the X and Y coordinates into separate lists." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "records = sf.records()\n", "print(records[0])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "locations_y = [shp.points[0][0] for shp in sf.shapes()]\n", "locations_x = [shp.points[0][1] for shp in sf.shapes()]\n", "print((locations_x[0], locations_y[0]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Step 3**: Create a pandas *DataFrame* from the lists." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "\n", "# Create DataFrame, set columns and records\n", "df = pd.DataFrame(columns=fields, data=records)\n", "# Add location as addational columns\n", "df = df.assign(coord_x=locations_x)\n", "df = df.assign(coord_y=locations_y)\n", "display(df)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create a minimum spanning tree" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Step 1:** Create an undirected graph with the towns as the nodes." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import networkx as nx\n", "\n", "# Create empty, undirected graph\n", "graph = nx.Graph()\n", "\n", "# Iterate through all vector features (cities)\n", "for sr in sf.shapeRecords():\n", " shape = sr.shape # geometry\n", " record = sr.record # attributes\n", "\n", " # We could see from the attribute list that the name is the 3rd attribute \n", " name = record[2]\n", "\n", " # Add the city to the graph with its name\n", " # Store its county, postal code, category and geolocation as additional data\n", " graph.add_node(name, \n", " county = record[1],\n", " postal_code = record[3],\n", " category = record[13],\n", " location = shape.points[0]\n", " )\n", "\n", "# Check results\n", "print(graph.nodes['Esztergom'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Step 2:** Create a complete graph (add all possible edges)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import math\n", "\n", "for city_from in graph.nodes:\n", " location_from = graph.nodes[city_from]['location']\n", " for city_to in graph.nodes:\n", " location_to = graph.nodes[city_to]['location']\n", " if city_from < city_to: # we do not need to add all edges twice\n", " # Add edge to the graph with distance as its cost\n", " graph.add_edge(city_from, city_to, \n", " distance = math.sqrt(\n", " pow(location_from[0] - location_to[0], 2) + \n", " pow(location_from[1] - location_to[1], 2)))\n", "\n", "# Check results\n", "print(graph['Esztergom']['Debrecen'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Step 3:** Calculate the minimum spanning tree as a new graph." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print('Number of nodes in original graph: %d' % graph.order())\n", "print('Number of edges in original graph: %d' % graph.size())\n", "\n", "spanning_tree = nx.minimum_spanning_tree(graph, weight = 'distance')\n", "\n", "print('Number of nodes in spanning tree: %d' % spanning_tree.order())\n", "print('Number of edges in spanning tree: %d' % spanning_tree.size())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Step 4:** Visualize results." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "\n", "# Start new plot figure\n", "plt.figure(figsize=[15,10])\n", "\n", "# Plot all edges as black lines in the MST\n", "for edge in spanning_tree.edges:\n", " city_from = edge[0]\n", " city_to = edge[1]\n", "\n", " location_from = spanning_tree.nodes[city_from]['location']\n", " location_to = spanning_tree.nodes[city_to]['location']\n", " plt.plot([location_from[0], location_to[0]], [location_from[1], location_to[1]], color='black')\n", "\n", "# Plot all cities as red dots\n", "for city in spanning_tree.nodes:\n", " location = spanning_tree.nodes[city]['location']\n", " plt.plot(location[0], location[1], color='red', marker='o', markersize=1)\n", "\n", "# Display plot\n", "plt.show()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.6" } }, "nbformat": 4, "nbformat_minor": 2 }