Welcome to our ideas page. Here we will give you a quick overview of how to get involved with our projects as a technical volunteer or as a member of a student developer initiative e.g. Google Summer of Code, Ruby Girls Summer of Code. Please note: this page will be revised shortly, with the projects for 2019.
Please begin by filling in our new tech volunteer form and letting us know a bit about you.
Ensure you let us know what idea you might like to help develop, or indeed if you have a new idea you would like to discuss. We will update the ideas as often as possible, however it is always a good idea to check them out on Github too (see below for instructions).
If you are interested in Google’s Summer of Code, we'll put a link here when they are open for applications.
We use a range of tech here, however Ruby on Rails and MongoDB are the core web frameworks for development across both FreeREG and FreeCEN.
We use Github and an Agile way of working, operating in 2 week sprints. You do not have to have experience of the above, however you must be willing to learn and have access to reliable internet (we work remotely).
These are suggested ideas that would make good Summer of Code (SoC) projects. Feel free to take a look through these below and see if there is anything you might wish to tackle. These are not reserved just for SoC (or similar coding initiatives) and we welcome applications at any time.
Consider these ideas as a starting point and you are more than welcome to ask any questions you may have, or to discuss these with us:
Data Visualisation - showing our data
We would like to explore innovative new ways of showing our data for users. We think that using maps/timelines would be particularly useful for showing results, however we are open to ideas for this and welcome your creativity. NB, we are not a family tree website, and have no plans to become one - there are others out there - https://www.wikitree.com/ for example.
Computer vision: identify census records on images (FreeCEN)
To explore and experiment with building machine learning/ai routines in order to improve transcription tools and reduce human input required for census records.
Create a Quality Control Microvolunteering interface for FreeREG
We would like a volunteer-facing interface that allows users to review data, flag problems, and suggest corrections. This would utilize some of the techniques in #1342, but would be retrospective, letting volunteers analyze challenges across the system and tracking review status of the data (whether that be on a per-file or per place basis).
Investigate and solve the issue of corrupted birth counties in FreeCEN
A good first issue in order to get to know the code. It appears that a non- UTF-8 accented character in the name of a person is causing the offsets to get mixed up. For example, the county of birth of of a person born in Yorkshire is noted in the file as YKS (Yorkshire) but is corrupted to reported as KS- (which is not a valid abbreviation). When the invalid birth county is reported as an error, an exception occurs because of invalid UTF-8 in the error message.
First task is investigation: for example in LAN/rg092981.vld entry 863 the accented e in the first name is a 1-byte windows-1252 encoding character that is not a valid utf-8 codepoint, and gets replaced with the multiple-blyte invalid UTF-8 character (question mark icon) causing all subsequent fields to start at a 1-byte offset from where they should start in the byte-stream.
Solutions could include changing the input from windows encoding to UTF-8 or adding code to check for the invalid characters in the input and try to map them to valid UTF-8.
Review FreeCEN2 site-useage patterns as prelude to Sharding MongoDB
A task to investigate how best to optimise our MongoDB sharding strategy based on how the database is used. This is a good opportunity to learn more about how non-relational databases operate and how best to optimise them based upon user data/searches. In order to decide on sharding strategy and shard key (#260) we need to know how users are actually using the new system to choose shard key based on typical usage patterns instead of just a guess at what the patterns will be. Of particular interest is whether most searches are done by specific census years or by "all years" as all years was not even an option in FreeCEN1 (FC1).
Interactive Search (FreeCEN)
The census data includes a lot of information, and researchers often come with a lot of knowledge they can use to search by. If they are looking for an individual, they will usually know the person’s name (e.g. Mary Price) but they may also know Mary’s birth date, her husband or child’s name, the part of the country they expect her to be living in. They may also know her occupation, where she was born, whether or not she had a disability, and whether she spoke English, Welsh, or both. There are 645 people called Mary Price in the 1891 census we have transcribed so far (and this is far from the most common name).
We know that many people put into www.freecen.org.uk all the information that they have … and then are told that no records match their request. This can be for various reasons, such as the researcher has said she was born in Acomb, but she told the enumerator that she was born in York. Or they have said that Mary Price was disabled, but she didn’t want to tell the enumerator that she was deaf. Or they have said that Mary Price was a dressmaker, but the transcriber was not able to interpret the enumerator’s bad handwriting.
To explore and experiment with building machine learning/ai routines in order to build a search interface to replace the interim interface which we have on freecen2.freecen.org.uk which starts with asking the person’s name, and then adds additional questions one at a time to arrive at an unique individual or small number (ideally 20 or less) of individuals who may be the person being looked for.
We also use Waffle boards and you are welcome to comment there too (it is synchronised with Github). If you have your own idea or proposal, absolutely let us know!