IRCaBot 2.1.0
GPLv3 © acetone, 2021-2022
#saltr
/2025/07/16
~dr|z3d
@RN
@RN_
@StormyCloud
@T3s|4
@T3s|4_
@eyedeekay
@orignal
@postman
@zzz
%Liorar
%ardu
%cumlord
%snex
+FreefallHeavens
+Leopold
+Xeha
+bak83_
+hk
+profetikla
+qend-irc2p
+r00tobo_BNC
+uop23ip
AHOH
Arch
BubbRubb
Danny
DeltaOreo
FreeB
Irc2PGuest38625
Irc2PGuest82037
Meow
Onn4l7h
acetone_
anontor
mareki2p_
maylay
not_bob_afk
pisslord
poriori_
r00tobo[2]
r3med1tz-
shiver_
simprelay
solidx66
thetia
u5657
usr001
weko_
wew__
zer0bitz
mareki2p Somebody is definitely scanning all known b32's for HTTP servers. I got GET request after running my I2CP app (and announcing my b32) for 3 days. I never mentioned my b32 anywhere else. Now I maybe sort of understand the portal guy (the CSS redirect guy).
dr|z3d no doubt. there are at least 3 people I know of.
dr|z3d amusingly, they're all in this channel.
snex i turned my scanner off a while ago
dr|z3d make that two :)
snex but go to #scanners if you want to learn more about it
Leopold_ mareki2p: that's notbob.i2p
cumlord yeah he got kind weird after i told him i was doing that
Leopold_ It seems like he's hiding something
Leopold_ we need to find out
Leopold_ I saw in your news about his thousandth post
dr|z3d I don't think there's anything dubious happening, other than some of the hosted content.
cumlord lol he's disappeared
cumlord yeahhhh big 1k :D
dr|z3d which brings me back to the idea snex mentioned a while ago.. some sort of subscription blocklist for floodfills to exclude said dubious sites.
snex yeah i had 2 ideas on implementation (only one original to me) but both have downsides
cumlord glad there’s so few of them anyway
dr|z3d well that's a relief.
snex 1. share only a partial b32 string, enough to avoid most false positives but not enough to let users brute force the real one
snex 2. share hashed strings and have routers hash each b32 before allowing it. could be too slow maybe
snex hash and check before allowing*
cumlord I pointed a squirt gun at one of them a couple months ago
dr|z3d sure, either a partial string of the b32 and/or b64, or a hash.
cumlord I kinda like the partial, maybe b64 though
dr|z3d *** laughs at the notion of a squirt gun. ***
snex yeah doing partial b64 should reduce false positives
snex i bet theres even an equation that can tell us the exact number of characters to obscure to minimize the chance of false positives
snex while maximizing the cpu required to brute force
Leopold_ What are dubious sites?
Leopold_ Are you referring to cases when an obviously private resource is found, or is it about content censorship?
cumlord Guess in this case to brute force it you’d have to do trial and error and at that point you might as well just scan for them
cumlord it’s about cp
snex it's about whatever you want to block your hardware from serving
snex it's up to each individual to decide what they want to block and allow
snex but you want to be able to share your blocks to others without revealing the actual sites
cumlord true, could see it working just like hosts.txt kinda thing
snex exactly
cumlord don’t want it don’t follow it
cumlord I think it could be more forgiving than I thought because of needing to check each all of them, so maybe it could have some leeway like string…string so it adjusts the omitted chars
not_bob mareki2p: Likley me. I should not be hitting very often.
not_bob Leopold_: I am simply collecting data. Nothing more.
not_bob For stats. there are eight sites serving unwanted content out of the 3100 that I have identified. That's a very small number.
cumlord yup most likely, I pause mine now and then
not_bob Mine has been running solid for about two months now?
not_bob But, I use a backoff so if a site does not responed, while it will get tried again, eventaully that chance becomes almost zero.
cumlord it’s a small number at least, more than I have
Leopold_ not_bob: Do you have crawlers for content search? I often see crawlers in the logs for wordpress and basic web server panels such as status-server and admin
not_bob Leopold_: Nope. I grab the site data from /, strip the html and never touch that site again. This is just so I can catagorize them.
not_bob And, only the html.
dr|z3d more info on the zzzot client bug, zzz: Caused by: org.eclipse.jetty.util.Utf8Appendable$NotUtf8Exception: Not valid UTF8! byte 85 in state 0
not_bob Leopold_: I get those wordpress and related scans on my sites as well.
cumlord I might do some of that.
not_bob Leopold_: I also have no plans to share my data with anyone, other than as general stats. And, if I do share b32s, they will be examples, not the actual b32 addresses.
not_bob "Wall of shame"
dr|z3d Leopold_: if you're seeing non-published urls being accessed, that's like a vuln scanner.
Leopold_ eight eepsites...
dr|z3d And in +, we have mitigations against vuln scanners.
not_bob That I'm aware of, yes. OUt of 3100 or so.
dr|z3d (You have to activate them, though)
not_bob Note that due to the way I2P works, I likely have only fraction of a percent view of the network at any given time.
snex seems like given enough time you'll eventually see everything
not_bob If you are curious, I've seen just over half a million unique b32 addresses.
cumlord I try not to run the vuln scanner constantly pegging the same sites though
not_bob snex: Possibly.
not_bob Anyway, once my scanner finds something, it stops on that address. And, if nothing is found, there is a backoff.
cumlord That’s what I think, even just running it intermittently is enough to get all the stable ones, I think
not_bob cumlord: Exactly.
snex i still wanna know why there are so many ntp servers
not_bob I have some of that logic in my reports. "hosts seen per hour" and then another graph that shows "hosts seen per hour that have been seen more than x times"
mareki2p Few things, some unrelated to others. The scrape was from 6f5ufkq6636k423ravrzxqsskmehyj2htloyp3dm4bflj32y63pq, crypto type elgamal+eddsa-sha512-ed25519. I2P looks to me like early internet, so there will be the same problems and challenges as they were back then. So suggestion for people running scrapers: Do you want to respect robots.txt? I can easily protect myself by dropping the initial packet anyway. Or s
mareki2p o many packets until I realize I don't want to serve that connection. Or use different port. Or something, there is plenty options. I already discovered one child porn site and I was not even looking for any.
not_bob_afk mareki2p: I do not bother with robots.txt. Why? If a site responds with http, then my scanner will never hit it again. If you have a robots.txt or not doesn't matter. Either way it will stop poking you.
dr|z3d if you ask cumlord, Leopold_, he might tell you how to block and ban vuln scanners.
dr|z3d (in +, with the http_blocklist file)
snex if your service is only for your use, use encrypted leasesets
mareki2p No no, I will deal with this myself. And I'm running this from my own I2CP app, not using Jetty from router (if this is relevant).
mareki2p Encrypted LeaseSets ... I didn't even started reading the docs about them yet.
cumlord mine doesnt doesn’t do any crawling, it just saves the html from the index page
cumlord oh yeah very good for that I think I had a guide for that, had to clear out the wall of shame the other day
cumlord yeah I think you’d need to do it yourself then with some throttling and blocking endpoints
mareki2p ok, I read blog post about encrypted leasesets by idk from 2021. It is ideal for me. But I didn't get the DH part. Who is allowing the clients to obtain the encrypted data? I'm guessing not the final destination as it wants to remain hidden. So I guess netDB then?
mareki2p Oh yeah, I'm stupid, the clients must already know the server's destination, of course.
mareki2p Another question. Is there any UDP HTTP servers (quic/http2.0/http3.0). I think nothing (I2P itself) is preventing me from this running one.
dr|z3d at the bare minimum, http 2 or 3 requires https.
mareki2p Oh, web browsers refuze to connect to nonHTTPS over UDP.
dr|z3d we had a discussion about http/2 a while back, it's technically supported in jetty but there's no plan to implement it.
mareki2p Yes, I remembered, HTTP spec allows it, but browsers decided to not do it.