thimbronion: One way to tackle all of the junk data in the encyclopdia entries is to run through each entry and check each word against a dictionary and name dictionary to generate a list of ocr junk associated with entries.
whaack: seems like that'll help a lot but it'll leave the most confusing OCR mistakes for readers (ones that accidently map to another word)
thimbronion: True. Idk how to handle that case.
whaack: thimbronion: All I can think of is running two OCRs and then flagging all the mismatches.
thimbronion: whaack: good idea.
whaack: but I imagine that the OCR algorithm itself probably has this type of check built in, so you'd probably need a really different OCR
thimbronion: If only I had like 20 slavegirls.
whaack: lulz
thimbronion: I could just reward them for finding errors with whippings.
thimbronion: http://logs.nosuchlabs.com/log/alethepedia/2021-04-25#1002457 << available here: http://thimbron.com/wp-content/uploads/2021/06/log_with_index_of_linked_sites.sql.gz
snsabot: Logged on 2021-04-25 12:40:59 thimbronion: cgra: I also hope to provide a dump of the db soon. Currently stumped by some permissions issues what won't allow me to do a dump.