Welcome to our Google Summer of Code Ideas List for 2019. This year Free UK Genealogy have applied to again be part of the Google Summer of Code initiative.
These are suggested ideas that would make good Google Summer of Code (SoC) projects. Take a look through them and see if there is anything you might wish to tackle. Consider these ideas as a starting point and you are more than welcome to ask any questions you may have.
To explore and experiment with building machine learning/ai routines in order to build a search interface to replace the interim interface which we have on FreeCEN which starts with asking the person’s name, and then adds additional questions one at a time to arrive at an unique individual or small number (ideally 20 or less) of individuals who may be the person being looked for.
A task to investigate how best to optimise our MongoDB sharding strategy based on how the database is used. This is a good opportunity to learn more about how non-relational databases operate and how best to optimise them based upon user data/searches. In order to decide on sharding strategy and shard key (#260) we need to know how users are actually using the new system to choose shard key based on typical usage patterns instead of just a guess at what the patterns will be. Of particular interest is whether most searches are done by specific census years or by "all years", as all years was not even an option in FreeCEN1 (the original version of the website).
We would like to explore innovative new ways of showing our data for users. We think that using maps/timelines would be particularly useful for showing results, however we are open to ideas for this and welcome your creativity.
NB: We do not want you to create family trees or create new ways of showing family trees. If you want to work on family tree apps, please head over to WikiTree.
Our mapping ideas are included in this Epic.
Mapping ideas include:
Create a map displaying the nearby places searched - As an example of how we visualise our data on FreeREG, we would like to take the list of nearby places searched and display the results on a map. To be useful to researchers, major watercourses, railways and roads should be shown. The base map should be Open.
Create maps of completion at Census Registration District level - We want to enable researchers to see the completeness of census transcriptions for a given place (registration district) at a glance, with a choice of views - country as a whole for one census year, or one "ancient” county in detail.
Provide a Map to enable access to the correct places in the DAP and maps to show coverage - We would like to provide a map of the counties/parishes with colouring to show the percentage of the registers (including birth, marriage and death) that we have transcribed. We might consider colour coding them based on the number of records transcribed. This will show the researcher if where they are looking is likely to be useful or not.
We would like a volunteer-facing interface that allows users to review data, flag problems, and suggest corrections. This would utilize some of the techniques developed for #1342, but would be retrospective, letting volunteers analyze challenges across the system and tracking review status of the data (whether that be on a per-file or per-place basis).
Surnames - Microvolunteering Interface - We have been limited in how we can record surnames in baptisms; we've assumed the classic English model where the child has the father's surname. This is not always the case though (see issue#655) as it was not possible before we implemented flexible CSV - older records need revisiting and updating. We need a tool so that micro-volunteers can revisit records and update the use of surname fields.
We require software which will remove, or reduce "ink bleed-through" on images when the offending page is available. There are programs which will try to identify what looks like bleed-through when looking at the page, but we have access to the page which is causing the bleed-through. It should be possible to take the offending page, mirror it left to right, freely rotate and resize to overlay it on the front page, then change the transparency to eliminate the bleed-through, then save the result. This should be much more accurate than trying to just work on the front page. This needs to be a stand-alone program for transcribers.
We would like to develop a simple API in order to access and query our MongoDB databases on FreeCEN. We will need to consider our approach to Open Data and employ sufficient controls for accessing the data based on a number of variables, ideally using JSON but open to suggestions.
We also use Waffle boards and you are welcome to comment there too (they are synchronised with Github). If you have your own idea or proposal, absolutely let us know!