This project uses the DNS queries on a Pi-hole to get a list of top-level domains contacted from a network. Sharing for anyone that may find this useful... the results may surprise you. This day project is the outcome of noticing my digital scale talking back to China.
By using FTL, we're able to get a list of all domains from 365 days ago up through one minute ago.
# Get list of domains from pihole-FTL database.
sqlite3 "/etc/pihole/pihole-FTL.db" "SELECT domain FROM queries GROUP BY domain ORDER BY domain" > domains.txt
By using logs, we're able to get a list of all domains from a specific day in past week.
# Get list of domains from today's pihole.log; yesterday's is pihole.log.1
cat /var/log/pihole.log | cut -d ' ' -f 6 | sort | uniq > domains.txt
# Prior logs are compressed so use zcat or zgrep to read or search.
zcat /var/log/pihole.log.2.gz | cut -d ' ' -f 6 | sort | uniq > domains.txt
This could have been combined to one step without using a file, but this allows for easy queries from the results. Look through your TLD list and drilldown with grep.
# List all TLDs.
cat domains.txt | awk -F'.' '{print $(NF)}' | sort | uniq
# List all fully-qualified domains registered to China.
grep 'cn$' domains.txt
# List all fully-qualified domains with two-letter TLDs.
grep '\.[a-z][a-z]$' domains.txt
Example: Block all Chinese registered domains.
On Pi-hole web-interface, click Blacklist, type cn
for Domain and China
for Comment. Click Add domain as wildcard
and Add to Blacklist
.
# Regex includes domain.cn
pihole --regex '(\.|^)cn$'
Not counting their ".com"s, Amazon was contacting 17 international servers and Google 27 countries. One can only guess why, but blocking does not appear to break functionality.
# Regex includes google.ae, google.co.za, google.com.ua, but not google.com
pihole --regex 'amazon[\.a-z]*\.[a-z][a-z]$'
pihole --regex 'google[\.a-z]*\.[a-z][a-z]$'