Saturday, February 29, 2020

Machine learning on IP addresses - group by ASN? Other?

I'm analysing my company's Internet log data and I'd like to make use of IP data in my model. I'd like to use some kind of grouping of IPs as a feature, both because I don't know what the lease timeouts are of any given address, and also because I expect people on a similar block (e.g. spectrum home internet, some company's IP block) will behave similarly to one another.

The problem I'm running in to is that there doesn't seem to be any one canonical entity. Subnets (like a /24) aren't guaranteed to all be assigned to the same source. ASNs as found in BGP broadcasts (e.g. at https://iptoasn.com/) seem like a good option but some ranges aren't broadcasted, for example T-mobile has 26.0.0.0/8 but it doesn't seem to belong to an AS. It does appear as a netrange in ARIN (https://whois.arin.net/rest/net/NET-26-0-0-0-1/) but I don't know of a good way to do a bulk lookup of thousands or millions of IPs, or if a netrange necessarily belongs to just one organization.

I'm obviously not a network engineer and I'm missing a lot of information - what would be a good way to roll up IPs into similar groups?



No comments:

Post a Comment