Papa Baiden

I did a project to study homlessness data in England as part of a voluntary program. I created a data processing pipeline in python, and an interactive data visualisation in react/d3. You can find the visualisation here.


I found a link to on twitter. They have a visualisation project underway all the time. The project that I worked on was from Papa Baiden, a charity based in London. The project was to meet this requirement: We need an in-depth dashboard that helps people understand the homelessness issues. The data visualisations will be featured on the website, Facebook group, Twitter, and Instagram.

The raw data

The raw data came from two sources. First are some spreadsheets containing official UK government figures, called Rough_Sleeping_Autumn_2016_Final_Tables which contains statistics on rough sleeping, and LT_615, which contains information on vacant properties in each region.

These spreadsheets were fairly tricky to work with. The tabular data only really makes sense when you are a human reading it: there's a hierarchy in the regional data that is described by empty cells and gaps in the data. The codes for the regions changed at a point in time, which causes the black gaps in the data in Figure 1. The description of this change was recorded in quite bureaucratic language in a cell at the bottom of the spreadsheet, not machine readable.

Problems in the spreadsheet data caused by codes changing.

I think this is a really good illustration of how hard it is to maintain a publicly accessible dataset. The UK government invest a lot of time and effort in making this data available, and they generally do an excellent job. Even they find it difficult to make a correct and complete dataset in machine readable format. I wrote python code to parse these spreadsheets to make them repeatable. The code is complex, with lots of logic to catch edge cases. I'm not sure that I would take this approach again. It is good for repeatability of this data, but hard to generalise.


Since this is a project I'm doing in my spare time, I allowed myself to get lost in interesting ideas that don't have a clear payoff. The biggest example is that I tried to avoid using a choropleth map. These are the nice looking maps where geographic areas are coloured to represent a variable. The problem with these maps is that while they look nice, they do a bad job at presenting information. It makes areas that are bigger look more important just because they have a larger area, you get a distorted picture.

Illustration from Not that many people read The Scotsman

To get around this problem, one solution is to use a hexagonal grid. You assign each region to a hexagon, and then try and arrange the hexagons on the grid to get them to represent geographic locations. Hexagons are good because they are a regular shape, you can make them have all the same area removing the distortion, and hexagons have a lot of immediate neighbours. The downside of hexagons is that geometric calculations are more complex with hexagons than a square grid. Fortunately there is fascinating tutorial from redblobgames that takes you through the calculations.

I worked on a python package (hexgridmap) to convert from shapefiles to hexgridmaps. The grid geometry works well, and converting a shapefile to python objects also works well. Where it falls down is locating the regions on the hex grid. I tried a few different approaches.

  • Assigning each region naively to the nearest hex on the grid. Where there are overlaps (multiple regions assigned to the same hex) sort them out by pushing them outwards. This sort of works, but doesn't keep neighbouring regions close together (Figure 3).
  • Try and create a function that describes a good placement, and use optimisation techniques like simulated annealing. This is less time consuming, but I couldn't find a good function to describe a good placement. You get something that is very happy once it sorts out all the overlaps, and is happy to achieve that by scattering regions all over the place.
  • I was starting to work on a branch that would create contiguous islands, and then place these next to each other. I stopped myself after just starting to explore this, and is where I would pick up again.

Results from a naive assignment and then pushing overlaps out. The general shape looks okay, but note that the regions in red are regions in Greater London, and get spread out too much. We end up with Ealing on the coast.

I stopped working on this thread of work after a few sessions, because I wanted to meet the deadline.

Building the visualisation

I used react and d3 to build the visualisation. All the interaction and DOM object creation and handling is done by react. Only using d3 for the convenience functions for creating things like scales that map between data and pixel space, or creating SVG path specifications from geographic data.

How the final visualisation looks.

What worked well

I like the interactivity, it works smoothly. React handling state is a very good way of working. It wasn't complex enough for me to need to reach for Redux. In particular I like the interaction between the table and the maps.

I like the scrolly bit in the left. You're able to scroll over that bit, while highlighting parts of the visualisation. That's quite neat.

What didn't work well

I didn't do a good job of figuring out how the interactivity was going to work beforehand. Also I don't think I did a good job of making it clear what things can be clicked on. The interactivity is unclear.

I think that the display is quite full, I wanted there to be more white space and for it to look neater. These both point to me needing to learn more about UI and design.