thimbronion: One way to tackle all of the junk data in the encyclopdia entries is to run through each entry and check each word against a dictionary and name dictionary to generate a list of ocr junk associated with entries.
    
    whaack: seems like that'll help a lot but it'll leave the most confusing OCR mistakes for readers (ones that accidently map to another word)
    
    thimbronion: True.  Idk how to handle that case.
    
    whaack: thimbronion: All I can think of is running two OCRs and then flagging all the mismatches.
    
    thimbronion: whaack: good idea.
    
    whaack: but I imagine that the OCR algorithm itself probably has this type of check built in, so you'd probably need a really different OCR
    
    thimbronion: If only I had like 20 slavegirls.
    
    whaack: lulz
    
    thimbronion: I could just reward them for finding errors with whippings.
    
    thimbronion: http://logs.nosuchlabs.com/log/alethepedia/2021-04-25#1002457 << available here: http://thimbron.com/wp-content/uploads/2021/06/log_with_index_of_linked_sites.sql.gz
    
    snsabot: Logged on 2021-04-25 12:40:59 thimbronion: cgra: I also hope to provide a dump of the db soon.  Currently stumped by some permissions issues what won't allow me to do a dump.