Summer of Code Ideas List 2019

Welcome to our Summer of CodeIdeas List for 2019. This year Free UK Genealogy have applied to be part of the Rails Girls Summer of Code initiative (NB: this is not-just-Rails, and not-just-Women). 

These are suggested ideas that would make good Summer of Code (SoC) projects. Take a look through them and see if there is anything you might wish to tackle. Consider these ideas as a starting point and you are more than welcome to ask any questions you may have.

Our priorities in 2019 are the first two ideas, plus anything on the FreeBMD2 project - please see https://waffle.io/FreeUKGen/FreeBMD2

Interactive Search (FreeCEN)

1891 Census

To explore and experiment with building Machine Learning/AI routines in order to build a search interface to replace the interim interface which we have on FreeCEN. This would start with asking the person’s name, and then add additional questions one at a time to arrive at an unique individual or small number (ideally 20 or less) of individuals who may be the person being searched for.

Review FreeCEN2 Site usage Patterns

A task to investigate how best to optimise our MongoDB sharding strategy based on how the database is used. This is a good opportunity to learn more about how non-relational databases operate and how best to optimise them based upon user data/searches. In order to decide on sharding strategy and shard key (#260) we need to know how users are actually using the new system to choose shard key based on typical usage patterns instead of just a guess at what the patterns will be. Of particular interest is whether most searches are done by specific census years or by "all years", as all years was not even an option in FreeCEN1 (the original version of the website).

Data Visualisation - Showing Our Data

England, administrative divisions (ceremonial counties) - Nmbrs - colored (-London zoom)

We would like to explore innovative new ways of showing our data for users. We think that using maps/timelines would be particularly useful for showing results, however we are open to ideas for this and welcome your creativity. 

NB: We do not want you to create family trees or create new ways of showing family trees. If you want to work on family tree apps, please head over to WikiTree.

Our mapping ideas are included in this Epic.

Mapping ideas include:

Create a map displaying the nearby places searched - As an example of how we visualise our data on FreeREG, we would like to take the list of nearby places searched and display the results on a map. To be useful to researchers, major watercourses, railways and roads should be shown. The base map should be Open.

Create maps of completion at Census Registration District level - We want to enable researchers to see the completeness of census transcriptions for a given place (registration district) at a glance, with a choice of views - country as a whole for one census year, or one "ancient” county in detail.

Provide a Map to enable access to the correct places in the DAP and maps to show coverage - We would like to provide a map of the counties/parishes with colouring to show the percentage of the registers (including birth, marriage and death) that we have transcribed. We might consider colour coding them based on the number of records transcribed. This will show the researcher if where they are looking is likely to be useful or not.

Microvolunteering Interfaces - Quality Control

We would like a volunteer-facing interface that allows users to review data, flag problems, and suggest corrections. This would utilize some of the techniques developed for #1342, but would be retrospective, letting volunteers analyze challenges across the system and tracking review status of the data (whether that be on a per-file or per-place basis).

Also:

Surnames - Microvolunteering Interface - We have been limited in how we can record surnames in baptisms; we've assumed the classic English model where the child has the father's surname. This is not always the case though (see issue#655) as it was not possible before we implemented flexible CSV - older records need revisiting and updating. We need a tool so that micro-volunteers can revisit records and update the use of surname fields.

Write automated computer vision program to be able to remove "bleed-through"

We require software which will remove, or reduce "ink bleed-through" on images when the offending page is available. There are programs which will try to identify what looks like bleed-through when looking at the page, but we have access to the page which is causing the bleed-through. It should be possible to take the offending page, mirror it left to right, freely rotate and resize to overlay it on the front page, then change the transparency to eliminate the bleed-through, then save the result. This should be much more accurate than trying to just work on the front page. This needs to be a stand-alone program for transcribers.

Develop Open Genealogy Data API

We would like to develop a simple API  in order to access and query our MongoDB databases on FreeCEN. We will need to consider our approach to Open Data and employ sufficient controls for accessing the data based on a number of variables, ideally using JSON but open to suggestions.

Other ideas

You can also find other ideas labelled ‘soc’ for FreePROBATE,FreeCENand FreeREG projects. Please do discuss these with us first before writing your proposal.

We also use ZenHub boards; you are welcome to comment there too (they are synchronised with GitHub). If you have your own idea or proposal, absolutely let us know! 

FreeCEN Board on ZenHub

FreeREG Board on ZenHub

Next Steps...

Find out how to proceed here.