#ossasepia | 2020-01-24

← 2020-01-23 | 2020-01-25 → ↓

whaack: argh it looks like asdf is the memory hogging culprit, not quicklisp itself

auctionbot: S#1077 O=17mn LB=None E=2020-01-29 05:43:03.567830 (117h28) >>> Dell R610 PE Server ships from U.S. (Server-A) http://blog.mod6.net/2020/01/physical-specifications-for-the-bitcoin-foundations-servers/

auctionbot: S#1078 O=17mn LB=18mn E=2020-01-29 05:43:42.679910 (117h28) >>> Dell R610 PE Server ships from Uruguay (Server-B) http://blog.mod6.net/2020/01/physical-specifications-for-the-bitcoin-foundations-servers/

auctionbot: --- end of auction list, 18mn total bids ---

whaack: diana_coman: EOD Report: Stuck to schedule, except I didn't have time for my Spanish study. For TheFleet I ran into a dead end trying to find a way to reduce the memory my program uses. asdf at a minimum consumes ~25MB. I tried to load asdf -> load my program -> unload asdf, but afaik there's no way to "unload" in CL. On another front, I investigated and discovered the reason for the join/disconnect dance. I was disconnecting becau

whaack: se the servers were sending malformatted irc replies. My bots logic is to reconnect when it hits an error, so it would get stuck in a cycle of connect->disconnect from bad reply->reconnect. Going forward I plan to not reconnect when I receive a malformatted reply. Also, I will put a limit on num reconnects per time window.

whaack: And I read through a good chunk of cl-irc. I was going through some of the utility functions in the code that handle formatting, string parsing, encoding/decodings, etc. That was mind numbingly boring. There were certain large functions/macros that I did not grok and will have to tackle again. Tomorrow I will read more of the high level code.

jfw facing the puzzle of reporting two fails while not being too hard on self nor being too slow about the reporting.

jfw: first: I'd intended to start before dinner or at least directly thereafter, which didn't happen

jfw: *start writing. I ended up doing first all manner of other things that had been put off - laundry, dishwashing, a phone call home, a walk. (perhaps that displeasure being relative point is proven already! - not that those things are especially displeasurable, in fact some the opposite so idk)

jfw: I did start though and not so late as to be tired, so, second: the log summarizing went poorly. I found myself going back through the text, trying to note each point of conversation briefly yet faithfully. I did not get far at all, by input line count or output word count

jfw: I thought maybe I should instead try to work from memory, but... there's so many unconnected / loosely connected points, I think I'd just end up with some kind of random sample of what happened to stick

jfw: another thought - ignored butterfly perhaps - was to (re)read one of the ossasepia log summary articles to get a better sense of how it could be done

auctionbot: S#1077 O=17mn LB=None E=2020-01-29 05:43:03.567830 (112h28) >>> Dell R610 PE Server ships from U.S. (Server-A) http://blog.mod6.net/2020/01/physical-specifications-for-the-bitcoin-foundations-servers/

auctionbot: S#1078 O=17mn LB=18mn E=2020-01-29 05:43:42.679910 (112h28) >>> Dell R610 PE Server ships from Uruguay (Server-B) http://blog.mod6.net/2020/01/physical-specifications-for-the-bitcoin-foundations-servers/

auctionbot: --- end of auction list, 18mn total bids ---

diana_coman: http://logs.ossasepia.com/log/ossasepia/2020-01-24#1015980 - this type of situation (too many/too quick reconnects) sounds like something you'd want reported to you/flagged too.

ossabot: Logged on 2020-01-24 00:58:41 whaack: se the servers were sending malformatted irc replies. My bots logic is to reconnect when it hits an error, so it would get stuck in a cycle of connect->disconnect from bad reply->reconnect. Going forward I plan to not reconnect when I receive a malformatted reply. Also, I will put a limit on num reconnects per time window.

diana_coman: http://logs.ossasepia.com/log/ossasepia/2020-01-24#1015984 - ahaha; well, if you get to cleaning ovens, the whole house and start looking for maybe digging as well the garden you don't have, *then* you'll know for sure you're truly avoiding there something!

ossabot: Logged on 2020-01-24 01:46:20 jfw: *start writing. I ended up doing first all manner of other things that had been put off - laundry, dishwashing, a phone call home, a walk. (perhaps that displeasure being relative point is proven already! - not that those things are especially displeasurable, in fact some the opposite so idk)

diana_coman: jfw: summarising is not an easy task at best of times (because it means basically that you fully digested the whole thing and are then able to re-tell any/all of it at whatever level of detail you choose; there's a LOT going on to end up with a good summary, let alone an excellent one)

diana_coman: add to that the fact that #t logs are not an easy read either and so take the time to appreciate and enjoy the steps you took on that long road really

diana_coman: yes, it takes time; no, there's no hurry nor pressure on *this* front so don't add it yourself where it doesn't belong

diana_coman: jfw: so listen, do your reading of #t to catch-up and then for writing go ahead with the practice of summarising what you read of #t but *without* pressure on the result being "a faithful summary of all points" or whatever else; let it be exactly a summary of your recollection (if you need to, brand it as such, it's your own blog and you can write there "what happened to stick" too, what!)

diana_coman: jfw: even better, take the writing on the #t logs as the fun part of the catchup and so write it whichever way you'd like to, be it picking and pointing what you enjoyed reading or poking fun at parts or whatever else!

diana_coman: the log-summary can turn in log-satire even; I'll probably enjoy reading it all the more for that!

diana_coman: whaack: how long do you think it would take you to write down a schema of what your bots/full setup do exactly? e.g. the pseudocode of it all.

diana_coman: jfw: btw, from the sounds of it, what makes you avoid/postpone/drag your feet there is not perceived difficulty but outright perfectionism getting in the way.

auctionbot: S#1077 O=17mn LB=None E=2020-01-29 05:43:03.567830 (107h28) >>> Dell R610 PE Server ships from U.S. (Server-A) http://blog.mod6.net/2020/01/physical-specifications-for-the-bitcoin-foundations-servers/

auctionbot: S#1078 O=17mn LB=18mn E=2020-01-29 05:43:42.679910 (107h28) >>> Dell R610 PE Server ships from Uruguay (Server-B) http://blog.mod6.net/2020/01/physical-specifications-for-the-bitcoin-foundations-servers/

auctionbot: --- end of auction list, 18mn total bids ---

diana_coman: now why does my blog spam keep asking me -and in Romanian too! - to order birthday cake in Kazan of all places and things to order.

whaack: diana_coman: I'm not sure, I would say 4-6 hours to be safe. It'd be a good exercise. Though I first want to apply some fixes for the problems discovered during the test run.

diana_coman: whaack: all right, apply the fixes first and then schedule somwhere the write-up of what's in there too.

whaack: diana_coman: okay i scheduled the writeup for tomorrow. i'll try to begin later today though if time permits

diana_coman: whaack: wait, so are you done with the fixes and otherwise figuring out the cl part and everything else?

whaack: diana_coman: I have not applied the fixes, but I have a good idea of what needs to be done and I don't think it will take that long.

diana_coman: ok.

whaack: The difficult tasks left are (1) fixing the memory issue, or figuring out a way to sidestep the problem (2) creating the function (get-all-channels-i-am-connnected-to)

whaack: For (2) my plan is to parse the db and determine that we are connected to a channel if a JOINED-CHANNEL row for that channel appeared more recently than the latest DISCONNECTED-FROM-CHANNEL row. The problem with this is that if there is an unclean disconnect, then the function will return false positives.

whaack: jfw: Can you explain, "You'll need to decide whether to apply initial or future rounds of yum updates. " ?

whaack: diana_coman: Would it be an appropriate time to write my own V, and then use it to install items on my new comp? Or should I use an existing tested V and leave the exercise of writing my own for a later time?

diana_coman: whaack: uhm, re that plan on get-all-channels etc - it seems to me you should first set out to write properly your understanding of the whole thing and only then look at that and identify the best way to do what you want done

whaack: diana_coman: ok

diana_coman: solving a problem is always a matter of a. properly identifying what the problem IS in the first place b. scoping it out c. figuring out the options to address the problem d. choosing one option (with clear reasons too!)

diana_coman: it should never be "oh, this looks like it might solve it, let's go"

diana_coman: whaack: how about manually v-ing for now, it's not like you need a ton of presses of very complicated trees, is it?

diana_coman: and damn it, now I really need to recall the name of that rrmartin story.

diana_coman: whaack: if you are done with reading&understanding that cl part you are using, then write up what you have and then add to it what problems you are trying to solve + what options you see

whaack: it should never be "oh, this looks like it might solve it, let's go" << this gave me a real chuckle

whaack: ^ do you m ean the cl-irc part?

whaack: mean*

jfw: diana_coman: will give it a try with the low-pressure log summary, thanks

diana_coman: whaack: yes, cl-irc part

diana_coman: jfw: cool.

whaack: alright, no i am not done reading & understanding it, i will continue today

diana_coman: whaack: good; that comes first; then the write-up; then decision/discussion; only after all that any implementation.

whaack: diana_coman: ack

jfw: whaack: re yum updates - Red Hat publishes errata in the form of advisories, updated source RPMs, and binaries for paid customers. CentOS grabs the updated sources, applies the occasional rebranding type patch, and builds their free binary RPMs. These live in a separate "updates" repository while the base repository in theory doesn't change from release time. Sometimes the problems fixed are

jfw: minor; sometimes they're security related; sometimes those might even be exploitable in your usage. Obviously this is not a V-like process, so the decision is necessarily kinda blind: do you trust the old packages with more-known holes or new ones with unknown holes?

jfw: Standard "best practices" in "the industry" based on vendor's advice is to always install updates timely - if you're compromised by known bug it's seen as lazy system administration, whereas if compromised by the inherent bug of 'always take updates' policy then who coulda possibly predicted.

jfw: That clarify the mess at least?

whaack: jfw: yes it does, thank you.

jfw: and would you have anything to add/correct there diana_coman? I recall you distinguish 'updates' from 'patches' but not sure how that works out in CentOS case

diana_coman: jfw: re CentOS tbh I do not upgrade and that's that; the whole old-holes vs new-holes can easily be stated in quite a few interesting ways (perhaps the most appropriate being the good ol' are you with the party -and so we'll commiserate at your loss though ofc you'll lose- or are you not with the party and so you are to blame for (all) loss(es).

diana_coman: re updates vs patches what/where do you mean?

jfw: ah, vpatch vs upgrade was the distinction rather; http://logs.ossasepia.com/log/ossasepia/2019-11-05#1008861

ossabot: Logged on 2019-11-05 16:02:55 diana_coman: jfw: out of pure curiosity now: do you find yourself upgrading software much those days?

diana_coman: at any rate and to state it clearly: it's not that I consider CentOS 6 to be some great OS or anything of the sort. It's just that I don't currently have anything better to put in its place and it is at least not changing aka not creating more work for no reason.

jfw: right. I like the 'party' comparison

jfw: whaack, would you like any more input from me on the CL memory thing or are you set for now with tracking down what the code does and what might be the problem?

whaack: jfw: I would love input on the CL memory problem

jfw: ok, well 25MB supposedly due to asdf is an improvement on 70MB noted previously, do I understand right?

diana_coman: jfw: btw, your description above was fun to read anyway; my "write it as satire and have fun" earlier was not random either.

jfw: ty diana_coman

whaack: jfw: yes, and that was from running my script with 'sbcl --script my_progam.lisp' (with (require "asdf") as the first line instead of running my script with 'sbcl --load my_program.lisp' (--load automatically loads an init file )

jfw: whaack: a tricky thing with that 25MB, depending how you're measuring - might not be asdf itself but parts of sbcl that got swapped in to load it (just guessing, I'm not up on sbcl internals), or garbage collection overhead

jfw: one way to look at it though, is that a CL environment is like an operating system in its own right, with memory-resident compiler even

jfw: so the way it 'wants to be used' is with internal scheduling - internal processes/threads or some async thing

jfw: so that overhead is paid once and shared by all processes. Running multiple UNIX processes, as IIRC you're doing, works but is kinda like loading a separate Linux kernel for each shell

jfw: Doesn't this bot support threads?

whaack: jfw: yes. And if I can't reduce the memory overhead I am planning on redesigning the system to run all in one process

jfw: why would that be a 'redesign'? too much state in global variables or something?

whaack: (i meant redesign the system to run in one unix process, still using multiple threads)

jfw: well I guess the way it spawns up bots for the many networks would have to change at least.

jfw: (understood)

diana_coman: whaack: why did you choose separate unix processes vs single process?

jfw: If it really is ASDF eating all that RAM though, the next step on that route could be removing that and explicitly specifying the whole buncha .lisp files to load and in what order, as I think you were planning anyway.

whaack: diana_coman: Two major reasons. One was for fault tolerance, since my program is writing to log files, parsing messages from random networks, etc. I figured there was a lot of opportunity for something to go wrong, and I didn't want a problem with one network crashing all the other networks.

whaack: diana_coman: The second reason is that there is a cap on threads-per-unix-process and web-sockets-per-unix-process. This problem is likely fixable - I believe there are commands / settings I can change to increase these caps.

jfw: 'web-sockets'? O.o

diana_coman: uhm, for the first reason, if it crashes, it's the code that crashes so the fault is in there and as a result, not much gain from separating.

diana_coman: it's not the network that crashes after all, but your code.

whaack: diana_coman: right. it's not a great reason. it's similar to the ~ "i'm going to throw this all in a try/catch cuz i can't conceptualize all the things that go wrong"

diana_coman: anyways, for now do the figuring out and the write up and the problem(s) description + options you see and we'll work out a proper process from there.

whaack: (which I do anyways, surrounding the code that reads messages from the network)

diana_coman: whaack: thing is, they will go wrong whether you conceptualize them or not; and the apparent-defense is worse than no defense in that it masks the problem until it becomes a beast.

whaack: jfw: yup i could figure out the order of the .lisp files, but I think I'm going to pass on that task. and web-sockets i guess is the wrong terminology lol. there's a max number of sockets-per-process

whaack: diana_coman: yup. the reconnect-disconnect dance was an example of a masked problem that bit me

diana_coman: myeah.

jfw: whaack: ah ok, saw the dashes and figured those were the literal symbols. There's also a Unix-level limit on file descriptors per process (which category includes sockets): ulimit -n

whaack: jfw: I have to review what a file descriptor is / how "files" work under the hood. My understanding is that there is a table mapping integers to files. The integers in this table are the file descriptors. Files are anything that can be written to or read from. So they can be a block of address space in storage, or a buffer from a connection over a network.

jfw: not a bad approximation; one missing level is the 'open file object'; these can be shared between processes. The file descriptor table is per-process

whaack did not know the file descriptor table was per process

jfw: 'block of address space in storage' - not if you're thinking of that as a contiguous region on disk, it's more indirect than that, but the kernel presents it as contiguous address space, yes

auctionbot: S#1077 O=17mn LB=None E=2020-01-29 05:43:03.567830 (102h28) >>> Dell R610 PE Server ships from U.S. (Server-A) http://blog.mod6.net/2020/01/physical-specifications-for-the-bitcoin-foundations-servers/

auctionbot: S#1078 O=17mn LB=18mn E=2020-01-29 05:43:42.679910 (102h28) >>> Dell R610 PE Server ships from Uruguay (Server-B) http://blog.mod6.net/2020/01/physical-specifications-for-the-bitcoin-foundations-servers/

auctionbot: --- end of auction list, 18mn total bids ---

whaack: jfw: ok, i wasn't sure either way

jfw: 'buffer from a connection' - the 'buffer' doesn't quite belong, it's more of an implementation detail. Could be as small as one byte (perhaps on a serial port)

jfw: example of per-process FDs: FD 1 is considered 'standard output'; one process might have it connected to an xterm (via pseudo-tty driver), another to a regular file (as in a shell redirection), another to /dev/tty1, another to /dev/null etc.

jfw bbl.

whaack: ok ty

jfw is back btw.

whaack: I am getting an error "Determing IP Information for eth0..." failed. [Failed] after running "service network start" I guess I am missing a driver. My /etc/sysconfig/network-scripts/ifcfg-eth0 file has the [following contents] [http://paste.deedbot.org/?id=aX_B]

whaack: I also get spammed RTNETLINK answers: File exists

jfw: whaack: I suspect you've got some basics to learn here too. ifconfig and route commands, DHCP

whaack: yes i do

jfw: the sysconfig scripts are fine but merely automate lower level tools that you will still need for figuring out what's going on.

jfw: add 'ping' for sure too, then in the more advanced level of the toolkit there's arp, netstat, tcpdump. Then there's name resolution which is controlled principally by /etc/hosts and /etc/resolv.conf and there's various tools for DNS testing.

jfw: iptables will be a thing to learn about down the line I expect.

whaack: jfw: alright, i will make a note to look into all those tools. for now am I correct in thinking that the error I'm receiving is because I am missing a driver?

jfw: I think not.

jfw: ifconfig will have the answer to that though.

whaack wishes he had man pages

jfw: lol. well there's man pages online. Or on your mac, though there are slight differences.

whaack: lol the man pages on the mac are dangerous! i got tripped up by a slight difference for some command a while back

whaack: (specifically: the mac man page for crontab doesn't mention that crontab will fail if the crontab file doesn't have a newline at the end, nor does the tool give a warning.)

whaack: the result from ifconfig brings up eth0 with RX: packets:3259, so it looks like there is some life there, i will invsestigate

jfw: possibly very simple problem: did you try 'service network restart' ? perhaps it was already working but the scripts don't detect the double-start?

whaack: yeah i did try that

whaack: looks like the problem is with dhclient not being able to get an ip address. the output from "ip link" and "mii-tool" makes me believe i am connected to my router

jfw: First base to cover is whether it's even expected to get one, i.e. that the router is serving DHCP. I would imagine so if it's a typical home setup.

jfw: Next, you could run dhclient in the foreground to see its output (would also go to syslog normally but uncertain as yet that you have that captured)

jfw: re 'makes me believe', I'm less familiar with the 'ip' suite, it's a newer linuxism; the expected ifconfig output would include UP and RUNNING. Also worth checking for blinky lights on the network port.

whaack: jfw: lights are blinky. dhclient eth0 -d gives me a few "DHCPDISOCVER on eth0 to 255.255.255.255 port 67 interval 7 (xid=0x1745f5ef) lines, and then No DHCPOFFERS received.

whaack: I went to my router's page and I believe it is servering dhcp

jfw: well, we could rule out DHCP problems for now by manually setting an otherwise unused IP/mask in the correct subnet and see if router pings

jfw: hm, I'm also recalling that centos has some firewall rules enabled by default, though not especially strict. Possibly SELinux too

jfw: on the former, 'iptables -F' should flush all the rules until next boot; on the former, check 'getenforce', either Disabled or Permissive is ok

whaack: jfw: I set a static ip address and ran "ping 192.168.1.1" then I got "from 192.168.1.9 icmp_seq=2 Destination Host Unreachable" for four lines and then after ping: sendmsg: No buffer space avaialable

jfw: the no buffer space thing is odd, does make me wonder about the driver or hardware. Maybe do some looking through dmesg for clues related to eth0

jfw: anyone else watching have ideas at this point? I'm running low

whaack: I see "eth0: no IPv6 routers present" in dmesg

jfw: that's normal (for an ipv6-loaded kernel)

jfw: something in there should name the chipset/driver which could be an avenue of research.

jfw: I'm going afk; might be one of those times to break on this, do more reading, come back later with fresh mind

whaack: getenforce was set to Enforcing, I Switched it to Permissive and restarted my connection, but that did not help

whaack: yup i'm going to put this down until tomorrow

whaack: thanks for the suggestions

jfw: ahh ok. Well that'll avoid other troubles at any rate.

jfw: iirc 'setenforce' writes to filesystem and so is persistent across boots

jfw: and you're welcome.

auctionbot: S#1077 O=17mn LB=None E=2020-01-29 05:43:03.567830 (97h28) >>> Dell R610 PE Server ships from U.S. (Server-A) http://blog.mod6.net/2020/01/physical-specifications-for-the-bitcoin-foundations-servers/

auctionbot: S#1078 O=17mn LB=18mn E=2020-01-29 05:43:42.679910 (97h28) >>> Dell R610 PE Server ships from Uruguay (Server-B) http://blog.mod6.net/2020/01/physical-specifications-for-the-bitcoin-foundations-servers/

auctionbot: --- end of auction list, 18mn total bids ---

← 2020-01-23 | 2020-01-25 → ↑