{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Lecture 9: Spatial indexing" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Install the pyqtree package for Quad-tree support in Python!**\n", "\n", "If you have Anaconda installed, open the *Anaconda Prompt* and type in:\n", "```\n", "pip install pyqtree\n", "```\n", "\n", "If you have standalone Python3 and Jupyter Notebook install, open a command prompt / terminal and type in:\n", "```\n", "pip3 install pyqtree\n", "```\n", "\n", "*Also install the `scipy` package (for Kd-tree support). If you use Anaconda, you have it already pre-installed.*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Opening the shapefile" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Open a shapefile (.shp). \n", "dBase file of attributes (.dbf) is automatically detected by name convention.\n", "\n", "*Source: https://gis.inf.elte.hu/files/public/elte-telepulesek*\n", "\n", "The attributes in the dBase (.dbf) file are in the ISO-8859-2 Central European character encoding for this file. Since the default encoding would be Unicode, we have to override this setting." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import shapefile\n", "\n", "sf = shapefile.Reader('08_telepulesek.shp', encoding = 'latin1')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Calculating the minimal bounding box for all the points!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Fetch points for cities\n", "points = [shape.points[0] for shape in sf.shapes()]\n", "# Calculating the minimal bounding box\n", "min_x = min(points, key = lambda p: p[0])[0]\n", "max_x = max(points, key = lambda p: p[0])[0]\n", "min_y = min(points, key = lambda p: p[1])[1]\n", "max_y = max(points, key = lambda p: p[1])[1]\n", "\n", "print(\"Number of points: %d\" % len(points))\n", "print(\"Bounding box: (%.1f, %.1f) - (%.1f, %.1f)\" % (min_x, min_y, max_x, max_y))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## KdTree" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Step 1:** Create a point which we will query later" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#query_point = points[10] # BAD! No cloning, but reference copying\n", "query_point = list(points[10]) # clone the point!\n", "query_point[0] += 1\n", "query_point[1] += 2\n", "print(\"Query point: %s\" % query_point)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Step 2:** Construct KD-Tree" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import scipy.spatial # pip3 install scipy\n", "kdtree = scipy.spatial.KDTree(points)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Step 3:** Query the closest neighbor to the query point" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dist, idx = kdtree.query(query_point)\n", "print(\"Closest neighbor: distance = %.4f, index = %d, point = %s\" % (dist, idx, points[idx]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Step 4:** Query the 3 closest neighbors to the query point" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "distances, indices = kdtree.query(query_point, k = 3)\n", "print(\"3 closest neighbors:\")\n", "for i in range(len(indices)): \n", " idx = indices[i]\n", " dist = distances[i]\n", " print(\"%d. neighbor: distance = %.4f, index = %d}, point = %s\" % (i+1, dist, idx, points[idx]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Step 5:** Query the 50 closest neighbors to the query point within 10km" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "distances, indices = kdtree.query(query_point, k = 50, distance_upper_bound = 10000)\n", "print(\"Distance list: %s\" % distances)\n", "print(\"Index list: %s\" % indices)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will only find 13 neighbors in a 10km range, but the idx list has 100 elements.\n", "For the invalid 87 elements the `indices[i]` is not a valid index, but instead equals to `len(points)`.\n", "So with a simple check we can detect the end of the valid results." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "valid_indices = [idx for idx in indices if idx < len(points)]\n", "print(valid_indices)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(\"50 closest neighbors within 10km:\")\n", "for i in range(len(valid_indices)):\n", " idx = valid_indices[i]\n", " dist = distances[i]\n", " print(\"%d. neighbor: distance = %.1f, index = %d, point = %s\" % (i+1, dist, idx, points[idx]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## QuadTree" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Step 1:** Create a 10x10km query area around a point" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "query_area_size = 10000\n", "query_area = (\n", " points[10][0] - query_area_size/2,\n", " points[10][1] - query_area_size/2,\n", " points[10][0] + query_area_size/2,\n", " points[10][1] + query_area_size/2\n", ")\n", "print(\"Query area: %s, size = %.1f km\" % (query_area, query_area_size / 1000))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Step 2:** Construct the Quad-tree" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pyqtree # pip3 install pyqtree\n", "\n", "quadtree = pyqtree.Index(bbox=(min_x, min_y, max_x, max_y))\n", "for i in range(len(points)):\n", " obj = { \"id\": i, \"point\": points[i] }\n", " quadtree.insert(obj, points[i]) # object, bbox" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Note:* for a polygon, the first argument should be the indexed object (e.g. the polygon itself), and the second argument should be the bounding box of the polygon." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Step 3:** Query the points in the area" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "matches = quadtree.intersect(query_area)\n", "print(\"Matches: %s\" % matches)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.6" } }, "nbformat": 4, "nbformat_minor": 2 }