[ad_1]
After many decades, Hungary finally received data from Russia in 2019 on Hungarian prisoners of war and deported civilians. After processing the data of some 682,000 people, the database managed by the Hungarian National Archives was opened on 25 February this year. The database can be considered complete, making it an important resource for research. It is also very important for the general public, since interested parties can learn about the available information, find relatives and relatives who have visited the Soviet camps. The automated transfer of the Cyrillic database to Hungarian was performed by researchers at the ELKH Center for Linguistic Research (NYTK) under the direction of Bálint Sass.
In 2019, the Hungarian National Archives purchased a digitized and scanned image of the boxes containing the basic data of some 682,000 Hungarian prisoners of war and deported civilians from the Russian State Military Archives, as well as the database developed from from them. It contains the most important information that can be linked to the person: the surname and first name of the person registered as a prisoner, the paternal name, the rank, the place and time of birth, the place and time of incarceration, the time of departure and output field; – if the person is deceased, date of death.
On the cards, of course, everything is in Cyrillic letters, so not only the data in Russian, but also in Hungarian – some elements of the surname, first name and geographical places – place of birth and captivity. During the processing, the language issue was that the Hungarian-language personal data dictated by the Hungarian prisoners were available in Cyrillic, as described by the registering soldier, usually a Russian, after listening to it. In addition, the data was further distorted when, during the 2010s, Russian colleagues created the database on the basis of cartoons: at that time, they recorded texts in Hungarian that they did not understand, but in Cyrillic letters, 70 years earlier. .
Could not automatically overwrite
The Russian-Hungarian automatic transcription and restoration of the data was performed by NYTK staff under the leadership of Bálint Sass. The task was to implement the transcription of “Ковач Йожеф – Kovács József”. The difficulty is that, due to distortions, letter-to-letter correspondence provides the correct solution in the rarest cases. The cases that are difficult to algorithm occur on a large scale, such as: Цилбауер – Zielbauer, Дейло – Béla, Саотморской – Szatmár, Гонграмеде – Csongrád or Кишкупфьистьгазл In many cases there are no longer the same or no other possible solutions. it is worth choosing automated, for example: Эрин – Ernő; Ervin; Erik.
Details of the papers can be found in the lecture given at this year’s Hungarian Computer Linguistics Conference and related publication, as well as the lecture given at the 2020 Hungarian Science Festival. The Automatic Rewrite Retrieval Tool can be found on github.
The free searchable public database, which was opened on February 25, 2021, the Day of Remembrance of the Victims of Communism, is available on the Hungarian National Archives website.
[forrás: Eötvös Lóránd Kutatási Hálózat]
(Cover image: Prisoners in the construction of the Belomor Canal (1932). Photo: Wikipedia)
[ad_2]