Hey folks- network engineer here, looking for some help reverse-engineering how this (really powerful) free service works. The site I am talking about is ipv4info.com and the unique ability it provides is a searchable database by keyword (for organization, contact, etc, etc) for even the most tiny of network allocations.
FIRST: If you suspect I am about to spam you to sell a service, please see the last line of the post and also read the content here carefully and judge for yourself; you can no longer pay for data from this site, and it seems to be moribund and not making any revenue off of any obvious advertising (though I guess the traffic rankings do boost the value- whatever...)
SECOND: If you still thinking I'm spamming you / advertising but you have some helpful technical answers for me about how it might work, feel free to send me a direct message if you don't feel comfortable potentially sending more traffic to this site
THIRD: If anyone is able to help me figure out how this works, I'm happy to put up a clone as a free service and open-source the tooling/backend- though the fact that it's shutting down and based in Russia makes me wonder if it is up to something that can be considered a "gray-area" ...
... back to the point. I am very aware of whois/rwhois. I pull down the latest rwhois raw database files from all of the RIRs (ARIN, APNIC, LACNIC, ...) every night over FTP, parse them out, and do some searching based on keywords for some organizations that I work on contract for to provide them data on their Internet assets when they need third-party verification of "inventory"
What I found at this specific site when I stumbled upon it just a few months ago (which is apparently defunct, not taking new customers, and starting to have stale data) is a bunch of /28, /29 and /30 CIDR blocks that can't be found via any paid or free service I've been able to find over the years. This includes searching via the individual RIRs themselves using their keyword search mechanisms as a I mentioned.
I'm wondering how the heck it is possible that they have this data in this keyword searchable format. I understand I can always rwhois an IP address and get back the fields that contain keywords that I am looking for (e.g. the name of a customer's business) but for that you obviously need to know the IP address first. So, chicken and egg. And I'm not going to try to actively whois every IP on the Internet and get banned from every rwhois service before I get .05% through. I figure even excluding RFC1918 addresses, there's just too much search space since you have to search within the larger blocks to discover these smaller ones. Brute force searching just doesn't seem feasible unless it's part of some hybrid approach
To be clear- I am not talking about a standard searchable ASN database site- these are a dime a dozen and I've been using them for years. I'm not even talking about the equally common sites/services that let you find smaller CIDR blocks, typically /25 and larger via keyword searches. None of these sites turn up the networks that this site finds.
Example: As an example of how this is different, go to the ARIN site and search for "Exxon" (AND NO, I AM NOT AFFILIATED WITH EXXON EITHER!)
Using ARINs keyword search, You'll be able to track down a handful of their network blocks, especially a few of their big ones. Great. Now go to the site I mentioned in the first paragraph and type "Exxon". The networks it returns is significantly longer and contains a relatively large amount of really small networks, down to /30. These obviously are not all Exxon corporate- but at least a few are probably small Exxon corporate remote sites, maybe egress points, out of band management for DR, or surveillance gear. They are not all just random little gas stations or loosely affiliated entities. I guess Exxon wasn't a great example, but still, it works to demonstrate the point.
Can anyone speculate as to how they are getting this data in such a way that it is searchable by keyword?
After talking to a few friends, some ideas came up about monitoring for route announcements passively, then performing some active rwhois queries, and then continuously updating via this basic approach. But none of us are quite sure if this is actually practical.
I'm not necessarily looking for an equivalent service/site (though I would be very interested if there were any) but I am very interested in figuring out how they do this as it would help me in my work quite a bit- these small networks are often not well known to the organizations that are responsible for them, and they end up being the source of an outage or a security incident eventually, so having them discoverable so easily is really significant.
Before jumping to any conclusions and telling me that this site/service provides no unique capability, please give it a shot with any large international corporation as an example- find a small block (maybe a /29) see if you can find it via a keyword search on ARIN's site or ARINs rwhois DB snapshot files (or any other RIR for that matter) and then do a whois on it to see that the data is in fact correct.
BTW- In case anyone suspects I'm advertising for the site, I'll point out that it seems to have been a pay service at one time but is no longer accepting payments/subscriptions. It also says that the domain is for sale. It seems to be based in Russia. I assure you I'm not based in Russia, nor am I trying to attract attention to a moribund site in Russia.