Show Idle (>14 d.) Chans


← 2021-06-11 | 2021-06-13 →
thimbronion: One way to tackle all of the junk data in the encyclopdia entries is to run through each entry and check each word against a dictionary and name dictionary to generate a list of ocr junk associated with entries.
whaack: seems like that'll help a lot but it'll leave the most confusing OCR mistakes for readers (ones that accidently map to another word)
thimbronion: True. Idk how to handle that case.
whaack: thimbronion: All I can think of is running two OCRs and then flagging all the mismatches.
thimbronion: whaack: good idea.
whaack: but I imagine that the OCR algorithm itself probably has this type of check built in, so you'd probably need a really different OCR
thimbronion: If only I had like 20 slavegirls.
whaack: lulz
thimbronion: I could just reward them for finding errors with whippings.
snsabot: Logged on 2021-04-25 12:40:59 thimbronion: cgra: I also hope to provide a dump of the db soon. Currently stumped by some permissions issues what won't allow me to do a dump.
← 2021-06-11 | 2021-06-13 →