Yukon Place Names

    6 September 2006


    Canadian geographic placenames board now publishes all their data for free via a variety of formats and services on the Canadian Geographical Names Service (CGNS) (yay!). I decided to try and build a script which could be run once a year or on an as-needed basis to update a Yukon Gazateer. The automation part was a failure, but the data part is okay. What follows are my notes to myself, so I don't know how much you'll be able to get out of it.  -- Matt Wilkie

    There are 3,937 placenames in the database. Some are withdrawn or recinded though. You'll need to consult the user guide and the specifications for what that means ( GNSS_Users_Guide.pdf, http://cgns-dev.nrcan.gc.ca/cgns_web/standards_spec.html )

    Yukon_Placenames.shp is the results of my efforts. It's pretty much ready to use. Some work remains to be done to substitute the special characters for labeling. (See update from 15-nov-2007 at end of page)

    Original_yk-names.txt is the original data as downloaded and before cleaning.

    yk-names_cleaned.csv is the cleaned and now true CSV file.

    yk-geonames_cvs.shp is the cleaned file converted into a point shapefile.

    yk-geonames_gml.shp is the output from the Web Feature Server. The CSV and the GML files have the same records but different, and useful, attributes so ideally they should be merged together. That's a whole 'nother project though.

    Core_fields.txt has all the nitty gritty details on the attribute schemas and values.

    The download archive is yk_placenames_distrib.zip and about 2mb.


    If you don't care what trials and tribulations created this dataset stop reading now. :)



    15 November 2007

    We can use Gentium, Charis & Doulos fonts for the accurate rendering of the native placenames, espcially with this helpful character picker as a selection tool: http://people.w3.org/rishida/scripts/pickers/latin/ Soooo much easier than any other method I've seen to find the characters one needs! Characters are shown in order of visual similarity. No more constant jumping back and forth from one section to another trying find that special X! (use the special ones a the bottom too, and copy/paste the results!)

    Next task: script to convert geonames {32} codes to the appropriate stacked diacriticals: ǭ̈



    Ugly Details

    To download the entire Yukon in CSV format, use this url http://gnss.nrcan.gc.ca/gnss-srt/api?bbox=-142.0,59.0:-123.0,72&regionCode=60&output=csv (be nice to their server. We don't need to be getting it more than once or twice a year. Also be patient. It takes about three minutes for the entire file to be sent). Saved as original_yk-names.csv

    Huh. The data is there but not in csv format there are pipe symbols as field delimeters (|) and html line breaks for record delimters (<br> ). A fairly simple job for regular expression search and replace if you have a decent text editor. Fixed version: yukon-placenames.csv. I submitted a bug report in July and one the developers responded. I gave some more detail and haven't heard back. When I checked again this morning CVS output was still broken. Oh, there are data problems too. Things like 105O typed as 105 zero.

    Went looking for a script to easily convert lat/long to utm. Haven't found an ArcGIS one yet, but this python library is very easy to use: http://pygps.org/. Now I need to figure out how to tell it to get the UTM zone by itself. There's this one too: http://starship.python.net/crew/jhauser/Gproj.html

    What about GDAL/OGR? asked fwtools mailng list. Answer from Frank Warmerdam:


    The OGR Projections Tutorial might be helpful for you, though it mostly
    addresses stuff from the C++ point of view. http://www.gdal.org/ogr/osr_tutorial.html_

    The Python script http://www.gdal.org/srctree/pymod/samples/tolatlong.py
    _should show a bit of how to use projections stuff in Python. In your

    case you want to go from lat/long to utm. There is nothing pre-baked
    in OGR to identify the optimal UTM zone for a given point, but it is
    relatively easy to find the nearest central meridian since they are all
    in six degree increments.

    Sorry I don't have something a bit more specific!

    bah humbug. If ArcCatalog starts crashing everytime you start it, before it even finishes drawing the gui, try deleting/renaming %appdata%/ESRI/ArcCatalog/ArcCatalog.gx. Ahhh, there's a better fix: just rename/move the last opened directory or add a new data file to it. (bug logged, incident #75755)


    Code to calc UTMX/Y for a point shapefile loaded in ArcMap.

    Procedure: set data frame coordinate system to desired UTM zone > Select only those points in the Zone (requires point-on-poly overlay with utm_zones poly) > Open Attributes > Select UTM_X column (which is Longitude) > Calc Values > Advanced > paste code block from below > Set Output to equal X or Y depending on which column you are doing. Lather, Rinse, Repeat until done. (courtesy of http://forums.esri.com/Thread.asp?c=93&f=982&t=54791#135972)


    dim pMxDoc as imxdocument
    set pMxDoc = thisdocument
    dim pMap as IMap
    set pMap = pMxDoc.focusmap
    dim pGeometry as IGeometry
    set pGeometry = [Shape]
    pGeometry.Project pMap.SpatialReference
    dim pPoint as IPoint
    set pPoint = pGeometry
    X = pPoint.X
    Y = pPoint.Y


    code to grab yukon names from the CGNS web feature server:


    What the heck am I doing trying to convert broken CSV to shape, when they have a server which can spit the same thing out already baked into a spatial format? This will chop Excel/OpenOffice Calc out of the loop and then we won't have to fix the broken NTS names (those fine programs like to change 105e15 into 1.05e+15)

    <?xml version="1.0" encoding="ISO-8859-1" ?>
    <GetFeature srsName="EPSG:4269">
    <Query typeName="GEONAMES">

    The user guide says one can stick output="SHAPE" in the GetFeature line, but I get an error with that:

    <?xml version="1.0" encoding="ISO-8859-1"?>
    <ServiceExceptionReport version="1.1.3" xmlns=" http://www.opengis.net/ows "
    xmlns:xsi=" http://www.w3.org/2001/XMLSchema-instance "
    xsi:schemaLocation=" http://www.opengis.net/ows http://schemas.cubewerx.com/schemas/wms/1.1.3/ServiceExceptionReport.xsd ">
    CubeSERV-00002: Syntax error detected in XML stream "(stdin)" on line 2 char pos
                   46 (raised in function CwXmlScanText_ReadString() of file
                   "cw_xmlscan.c" line 2243)
    CubeSERV-00002: Hit unexpected character #x94 while scanning XML token (raised
                   in function CwXmlScanText_ReadString() of file "cw_xmlscan.c"
                   line 2200)

    Oh, that's why not use WFS. It's broken (or I'm not using it properly. Okay, spending too much time on that. Go back to kludge-ville and regex search & replace the NTS names:
    open yk_names_25jul2006.dbf in Excel, copy NTS column to Vim (we really should be doing this in python to make it easily repeatable), then:

    # match 115A08 and delete last trailing two digits
    # strip MCR130
    # fix false exponents (delete periods and trailing +0##)
    # change incorrect 105zerozerozero... to 105o 

    Next problem is to merge dupes (105a,105a,105a,105b ---> 105a,105b). Hmmm. I think I've gone beyond what's easy in vim, and now there's no choice but to learn the python way.

    Going back and looking at some of the intermediate WFS request outputs, I see that there is inconsistency in the attributes. This needs a more studied look but the one of immediate relevance is that there is a Relevance At Scale (r_vlaue) field which in the API cvs file is filled with many blanks while the GML output for that field is fully populated. That's enough to tell me it is foolish to rely on the cvs as an authoritative source, so I'm backing up and going to start from the GML.

    Try#2 at downloading from Geonames WFS server

    1. download with wget, using example command line from section 4.3 of the GNSS User Guide. It failed before because of line length limitations in CMD. Workaround is to save the command into a text file and run with:

    wget -O output_file.gml -i http_command.txt

    2. We use ogr2ogr to convert GML to shape, but shape has an attribute name length limit. To get the proper attribute names in Arc we need to dance around a little: Use ogrinfo output_file.gml to generate attribute schema (output_file.gfs), edit .gfs and strip leading "GEONAMES." from each <Name>. Then convert to shape using ogr2ogr which will generate an empty shapefile with the correct headings. . Edit the .dbf file with Excel and copy the column headings. Undo the edits to .gfs (or delete it all together), and convert again to shape. Open the second .dbf in Excel and copy the proper field headings, save and exit.

    ogrinfo yk_names.gml
    vim yk_names.gfs # :%s/GEONAMES\.//g; save
    ogr2ogr -a_srs EPSG:4269 yk_names yk_names.gml
    # excel yk_names/geonames.dbf; copy 1st row; close dbf
    del yk_names.gfs
    ogr2ogr -a_srs EPSG:4269 yk_names/ yk_names.gml
    # excel yk_names/geonames.dbf; paste 1st row; close dbf


    and that's all for now folks!


    Geographic Information,
    Information Management and Technology,
    Yukon Department of Environment
    10 Burns Road * Whitehorse, Yukon * Y1A 4Y9
    867-667-8133 Tel * 867-393-7003 Fax

        Send feedback