{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Lecture 4: Shapefile handling"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Install the pyshp Python package!** \n",
"\n",
"If you have Anaconda installed, open the *Anaconda Prompt* and type in:\n",
"```\n",
"pip install pyshp\n",
"```\n",
"\n",
"If you have standalone Python3 and Jupyter Notebook install, open a command prompt / terminal and type in:\n",
"```\n",
"pip3 install pyshp\n",
"```\n",
"\n",
"*If you have the pyshp package already installed, make sure its version is >= 2.0*"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Opening a shapefile"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Open a shapefile (.shp). \n",
"dBase file of attributes (.dbf) is automatically detected by name convention."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import shapefile\n",
"\n",
"sf = shapefile.Reader('04_megye_region.shp', encoding = 'latin1')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The attributes in the dBase (.dbf) file are in the ISO-8859-2 Central European character encoding for this file. Since the default encoding would be Unicode, we have to override this setting."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Reading shapefile"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Check whether files contains polygons:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"Geometry type: %d\" % sf.shapeType)\n",
"if sf.shapeType == shapefile.POLYGON:\n",
" print(\"This file contains polygons\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The library defines the following gemoetry types:\n",
" - NULL = 0\n",
" - POINT = 1\n",
" - POLYLINE = 3\n",
" - POLYGON = 5\n",
" - MULTIPOINT = 8\n",
" - POINTZ = 11\n",
" - POLYLINEZ = 13\n",
" - POLYGONZ = 15\n",
" - MULTIPOINTZ = 18\n",
" - POINTM = 21\n",
" - POLYLINEM = 23\n",
" - POLYGONM = 25\n",
" - MULTIPOINTM = 28\n",
" - MULTIPATCH = 31"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Print the number of shapes (geometries) in the file:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"Number of counties: %d\" % len(sf.shapes()))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Print the available attributes, their type and order:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"Attributes: %s\" % sf.fields)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"*Don't care about \"DeletionFlag\" for now.*"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Read all shapes (geometries) and records (attribute table):"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"shapes = sf.shapes()\n",
"records = sf.records()\n",
"print(\"Number of geometries: %d\" % len(shapes))\n",
"print(\"Number of records: %d\" % len(records))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Iterate through each shape-record pair and print each county's name and the number of points in its gemoetry:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"for i in range(0, len(shapes)):\n",
" # Get the name of the county, which is the first attribute (index 0)\n",
" name = records[i][0]\n",
" \n",
" # The shape is a closed polygon, the first and the last points are the same\n",
" point_count = len(shapes[i].points) - 1\n",
" \n",
" # Print out the name of the counties and the number of points in their polygons\n",
" print(\"{0}: {1} points\".format(name.title(), point_count))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Alternative way to read all shapes and records at the same time:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"shape_records = sf.shapeRecords()\n",
"print(\"First county: %s\" % shape_records[0].record[0])\n",
"print(\"Number of points: %d\" % (len(shape_records[0].shape.points) - 1))\n",
"print()\n",
"\n",
"for sr in shape_records:\n",
" name = sr.record[0]\n",
" point_count = len(sr.shape.points) - 1\n",
" print(\"{0}: {1} points\".format(name.title(), point_count))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Closing an opened file"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sf.close()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Naturally you cannot read from a file you have closed."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Summary exercise on shapefile reading"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Task:** calculate the perimeter of each county."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"What can you observe? Are all values approximately correct? \n",
"*Hint: pay special attention to Pest county!*"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Polygon parts"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"*Pest county* is a holed polygon and both the points of the external ring and the internal hole ring are given in the `points` list.\n",
"\n",
"We can check how many parts are in a shape through the`parts` list of a shape. The external ring is always the first part, followed by the inner holes, if any."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sf = shapefile.Reader('04_megye_region.shp', encoding = 'latin1')\n",
"\n",
"shape_records = sf.shapeRecords()\n",
"for sr in shape_records:\n",
" name = sr.record[0]\n",
" print(\"%s: %d parts (%s)\" % (name, len(sr.shape.parts), sr.shape.parts))\n",
" \n",
"sf.close()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As we can observe *Pest county* has two parts and the second part starts with the 1126th point. So the external ring only conists of the 0th-1125th points."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exercise"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Task:** fix the previous perimeter computation by only taking the external ring into consideration!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Plotting"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can use the previously introduced *Matplotlib* library to draw the polygons as line diagrams:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt\n",
"import shapefile\n",
"\n",
"# Special Jupyter Notebook command, so the plots by matplotlib will be display inside the Jupyter Notebook\n",
"%matplotlib inline\n",
"\n",
"sf = shapefile.Reader('04_megye_region.shp', encoding = 'latin1')\n",
"\n",
"# Start new plot figure\n",
"plt.figure()\n",
"# Iterate through all the shapes\n",
"for shape in sf.shapes():\n",
" # Only consider the first polygon if multiple parts are defined\n",
" end = len(shape.points) if len(shape.parts) == 1 else shape.parts[1] - 1\n",
"\n",
" # Get the X an Y positions into separate lists\n",
" xs = [coord[0] for coord in shape.points[:end]]\n",
" ys = [coord[1] for coord in shape.points[:end]]\n",
"\n",
" # Add polygon to plot\n",
" plt.plot(xs, ys)\n",
"\n",
"# Display plot ...\n",
"plt.show() \n",
"# ... or save plot\n",
"#plt.savefig('04_map.png')\n",
"\n",
"sf.close()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.6"
}
},
"nbformat": 4,
"nbformat_minor": 2
}