Hurricane Katrina Response
Back to Home Page
Be forewarned that I haven't been doing (I haven't had time to
do) much antispam stuff recently, and thus the below page is (variably)
out of date. I say "variably" because I just did a revision to
put things in past tense - but I may have missed some things, in which
case they are extremely out of date...
Spam (Unsolicited Bulk Email), antispam, and related matters
I have written on my computer page about part of
why I am a spamfighter, among my other activities in what little spare
time I have as a graduate student. Below are some of the things that I
have done (or, if specifically noted, am currently doing) as part of this
(go further below if you're wondering about a
test message... although if you're wondering now about a test
message, I have to wonder about why, given that I last sent a relay test
without prior specific permission from the sysadmins on a system over a
year ago):
- Examining current blacklists for their characteristics, such as how
often they list a "known-good" host. By
"known-good", I mean a host that we would need to
whitelist, if we used that blacklist, and moreover a host that may
indicate that the blacklist, while probably fine for its
maintainer(s)' intended usage, may be too over-eager to list for
ours; it does not mean an agreement between my viewpoint
and that of the organization or individual running that host (as might
be guessed from that I am checking some governmental sites, given
my political viewpoint), but
rather that blocking email from that host may have distinctly
negative consequences that override any spamblocking or other (see
below) motivations). My judgement on what is a
"known-good" host can vary over time, including in reaction
to what I learn via the checks of a host versus blacklists (see
below for one example of this). Among the
things that I do with this:
- Using this information to decide what blacklists we use and
what whitelisting is necessary;
- Notifying appropriate people in
known-good domains of correctable problems such as open relays or
bad WHOIS data. If the responsible people in a domain refuse to
do anything about such problems, then this is likely to result in
my concluding that they should not be in the "known-good"
category - the likelihood of spam from that domain is too high
for it to be worth whitelisting that domain.
- Giving information to others, including below, on such statistics
to help them in their decisions regarding using blacklists, such
as which blacklist to use.
- Keeping an eye on blacklists to make sure they are not
deliberately used to limit speech from
advocates of individual freedom, or otherwise politically biased.
(I haven't seen such in my personal experience, and hope
that I will not, but since using a blacklist is giving someone else
control over one's ability to receive email, keeping an eye on
this is needed. Given my
politics, I find
blacklists preferable to (most) governmental anti-spam efforts,
since the former are (or at least can, and
should, be, if not imposed by network monopolies
or oligopolies (of which I likewise disapprove)) voluntary in
usage, whereas the latter have definite problems in terms of
freedom of speech/press (e.g., with attempting to quash
Unsolicited Commercial Email (UCE)
instead of Unsolicited Bulk Email (UBE))... but such a voluntary
choice is only possible with adequate information, and I am
trying to help provide said information. See
below for more discussion of this.) Please
note the word "deliberately"
in the above. If those of a given point of view (e.g.,
political) tend to:
- Make technological mistakes such as having
open relays, failures to update information in databases
properly, etcetera (I've noticed this myself for technophobic
(e.g., most environmentalist) groups, with no real surprise);
- Act in such a way that they are submitted (to lists taking
user submissions) more frequently when a listable problem
is happening; or
- More frequently cause problems for others in a way that
is clearly within a blacklist's publically-declared listing
criteria
then said groups being listed more often is not an indication of
bias in a blacklist.
- Getting quite a bit of data, which I am still contemplating how
to deal with (although I have constructed an alpha version of a
perl "lamers" script), on how (unfortunately) common
invalid DNS information (lame delegation, bad hostnames or syntax
in SOAs et al, etcetera) is.
Available are:
- The current (perl)
scripts (tarred and gzipped; last
modified Saturday, 27-Apr-2002 22:13:24 EDT) that I am
using (which are in a very alpha state as of yet) for evaluating
blacklists vs known-good addresses and extracting said addresses
from sources such as email, DNS, etcetera. (Incidentally, I must
apologize to anyone who tried to download these very recently -
due to an error (by me) in our httpd.conf file, this was mistaken
for a worm going after files in a "/scripts" directory...
sorry!) These scripts use files named (by default)
known.good.init.txt,
known.good.domains.init.txt,
known.good.hosts.init.txt, and
known.good.ips.init.txt - I have now provided
example copies of these files in the tarfile above. These scripts
are named:
- test.list.merge.2.pl
- This script takes the known.good.domains.init.txt,
known.good.hosts.init.txt, and
known.good.ips.init.txt files and merges
them into a known.good.init.txt file, with
checking on whether the hosts in the second file exist, whether
PTRs exist for their corresponding IP addresses which point
to a different host that is in a known-good domain or
is simply a longer hostname for a known-good host and do not
appear to be dialups, which are then added as new known-good
hosts; whether MXes for already known-good hosts qualify as
known-good under the same definition as for PTRs; whether
known-good hosts are actually aliases which are CNAMEd to
known-good hosts; and whether any known-good hosts match
the last part of the name of another known-good hosts,
in which case they are treated as known-good domains also.
- test.list.email.3.pl
- This script takes a known.good.init.txt file and
splits it into known.good.domains.init.txt,
known.good.hosts.init.txt, and
known.good.ips.init.txt files, with additional
input of new known-good hosts from email files named on the
command line. Headers from these are extracted using
formail (part of
procmail),
although I will probably eventually change this to use the
Mail::Header Perl
module.
Recognized hostnames, plus rDNS names from IP addresses, from
some mail headers that match
known-good domains and do not appear to be dialups are added to
known.good.hosts.init.txt.
(Message headers that are not scanned
include Subject: and In-Reply-To:,
to avoid picking up such from discussion of spammers.
In addition, Message-Id: and
References: headers which, contrary to
standards,
lack a '@' sign separating a per-message code from a hostname
are also skipped. (This last step is to avoid having
"hostnames" like
200204280759.g3S7u127.broken.mailer.rutgers.edu.))
- test.list.split.pl
- This script simply splits a known.good.init.txt
into the equivalent known.good.domains.init.txt,
known.good.hosts.init.txt, and
known.good.ips.init.txt files. Normally,
test.list.email.3.pl would be used
instead, in order to pick up new information from email.
- test.list.known.good.pl
- This script is what does the checks versus blacklists of
hosts from a known.good.init.txt file. Both
IP-based and domain-name-based blacklists can be checked.
- The current summary and
detailed results (last modified
Saturday, 04-Jan-2003 23:07:36 EST) of said
checks. (I am working on improving the presentation of these,
including providing more
historical information and,
eventually, an alerting service for blacklists (and blacklist
users) on changes between runs (suggested by the maintainer of
blackholes.intersil.net and flowgoaway.com - thanks!).) Please
be aware that the most recent runs include some data regarding IP
addresses without rDNS which are associated with a few
"known-good" sites (e.g.,
http://www.anonymizer.com
and spamcop.net). Moreover, I
have now started doing some checking so as to differentiate
between IP addresses returned by blacklists
([blacklist]:[IP address]). Both of these together have resulted
in a considerable expansion on the size of the detailed results
file(s). (Incidentally, the
test.list.known.good.pl script takes several
days to run. I'm working on programming it to query blacklists in
parallel (provided they are served from different sets of
nameservers, of course - I don't want to overload the servers for
a group of blacklists (e.g.,
rbl.cluecentral.net)).
I am also interested in getting local copies of blacklists - and,
for some, may be able to provide
public secondary service - to help speed
this up. A thank you goes to those who have already made this
available to me, including Derek at
http://www.rfc-ignorant.org,
RFG at
http://www.monkeys.com,
the blacklist.spambag.org
maintainer, and (prior to ORBZ' demise) Ian of ORBZ (now at
http://www.dsbl.org).) I am
also now tracking how often the script has to retry a request for
a blacklist (with the exception of those with local mirrors,
problems with which would be purely local), so that people can
keep this in mind when using said blacklist (how frequently
mail may be delayed due to a temporary failure, for instance).
These do not include, for privacy (e.g., because of
extracting hostnames from email) and other
reasons, the current listing of known-good hosts and domains, except
for those hosts and domains listed by a given blacklist that I am
reporting on. Some other examples of (semi-)automated evaluation of
blacklists are:
Other pages about blacklists include:
[I had hoped to do some statistical comparisons of these results as
soon as I had adequate data to base such on, but haven't had the
time and probably won't.] Please note that
inclusion of a blacklist in the analysis does not in any way, shape,
or form indicate that I am endorsing its use. (Indeed, there are some
of them that I wouldn't use if you paid me to do it... I am
considering supplementing the above semi-automated review with
some more-subjective commentary on the available blacklists.)
Incidentally, to answer one question that has come up, except
as otherwise noted on this page, I am not associated with running
any blacklist.
- In the past - as in over a year ago for the last time for
a host without specific permission from its sysadmin(s), scanning hosts which:
- Send email to us, particularly spam (or when someone makes a
mistake and sends email to a spamtrap
here instead of to, say, me).
- Connect to this webserver (or to others of our machines) in
ways characteristic of worms or other attacks.
- Are on blacklists as being open proxies or relays (partially so
that hosts which have been fixed can be removed from said lists) or
as otherwise having problems indicating that such are likely
(e.g., if you have made yourself impossible to contact to notify
of mail relay problems by
not having a postmaster address).
- Are on lists of open proxies used by spammers, crackers,
those who disrupt IRC channels, etcetera.
I did this via adaptations of programs from
dsbl.org. These scans include checks
for open proxies and open relays. If you've gotten such
a scan and are wondering why:
- Check dsbl.org for whether
you are showing up as having a problem.
- Check other blacklists via, for instance,
openrbl.org to see if you
have a problem. Include checks versus both the IP address scanned,
the hostname involved, and the domain name involved.
- Check to see if you may have a worm (we do scans in such cases
because they also can be crackers using an open proxy of some
variety, and/or the host may have other insecurities that spammers
can take advantage of).
- Check the history of the IP address scanned at
spamcop.net to
see if it's sent spam that has been forwarded to spamcop.
- Do a web search via altavista or google for the host or IP
address in question. That's one way I find lists used by crackers
and spammers.
- I actively report spam that I receive to
SpamCop, which helps in tracing back
spam and notifying those who are supposed to be responsible for
stopping it - postmasters, abuse departments at ISPs, administrators
technical support people for a given IP address or IP address
block (gotten from WHOIS records), etcetera. [I'm still doing this.]
- If I see that addresses I report spam to, either gotten from WHOIS
myself or found via Spamcop, are bouncing, or (when I have time) I
check the
statistics pages and
notice such addresses are bouncing, I report them to the
RFC-Ignorant project for
use in blacklists (including ones that we use such as
ipwhois.rfc-ignorant.org). (If you wish to
do WHOIS checking yourself and want some perl scripts for this purpose,
you may wish to take a look at
http://www.geektools.com/software.php
and http://whois.bw.org/.) [I'm
still doing this.]
- I have proposed a perl subroutine to check whether ipwhois
submissions are valid, which may also be of use for other blacklist
maintainers to avoid goofs. It is available enclosed in a (partial)
testing script as
ipwhois.check.validity.pl.
- I have proposed an
alternate set of pages
for www.rfc-ignorant.org, which should be more readable by those
with disabilities or with browsers other than Netscape or Micro$oft.
(These use Server Side Includes to get equivalent presentation with
and without frames - see rfci.tar.gz for
a complete copy of the files before SSIs.) [The RFCI web page has
been updated - thanks, Derek!] I feel that the
RFC-Ignorant project is important, and maintaining
WHOIS
(despite the wishes of some on the
ietf-whois
list, including one of its heads, Eric Brunner-Williams, who is
threatening to use his power to exclude from the list anyone with a
different view, such as myself) is important, not only because of its
use by spamcop (and other methods of tracing spam, such as
SamSpade), but because there is
a need for responsible people - by which I include those who can
take care of problems, and those who can be held responsible
for problems - to be locatable, by admins and users at all levels,
in case of all types of trouble. (Said trouble includes both
inadvertent (e.g., a malfunctioning network-active program or router)
and deliberate (e.g., Denial of Service attacks like spam) problems
that people on one portion of the network are causing others.) This
is what WHOIS was originally invented for (see
RFC 954), and
this need still remains. (I am sympathetic to the privacy concerns
regarding this - as is unsurprising, given my
political views, and that I have
gotten spam traceable to domain names I have registered - but do not
feel they are both valid and unsatisfiable through other means. If,
for example, a limited few (law enforcement, for instance) are given
access to WHOIS (or similar databases), this will simply make naive
network users feel that their information is less available than it
actually is (to people in positions of power to abuse it). Moreover,
other solutions are available, such as an ISP or other responsible
contact substituting for an individual.) If there are those who are
so ignoring of their responsibility to the larger Internet that this
need for responsible contacts - including registries who are willing
to delete information for users, or countries which feel that their
privacy standards override the needs of the Internet as a whole - I
do not wish to be on the same Internet as these people.
- I have a few spamtrap addresses set up on
local machines, including
meatcan2@beatrice.rutgers.edu
(meatcan = canned meat = spam). Spammers frequently gather email
addresses from webpages. If they do so with this one, their emails
are trapped and used for analysis (e.g., adding to blacklists like
dnsbl.njabl.org,
relays.ordb.org,
and bl.spamcop.net). I am
working on getting better at spamtrapping and plan on adding more
spamtrap addresses. I am also willing to make available one of the
Perl scripts (and eventually others) that I use in spamtrapping,
namely to forward IP addresses from Received headers of spam to relay
testing services:
received.forward.pl. (This program
has been changed from the original version, which had sent email to
ORBZ. Unfortunately, ORBZ was taken down due to legal problems
(thanks to Lotus Domino servers having a
bug
that can cause an accidental crash if they are (adequately)
relay-tested.) I have revised this to send email to
ORDB instead. I have a version of
this under test that sends - via the
scanning mentioned above to
dsbl.org as well, incidentally.) The
input to this script is from the program formail, part of the
procmail package. As well
as some standard Perl 5 modules, it requires
Net::Netmask, available from
CPAN (I may revise this to use
Net::CIDR instead). It automatically exempts
private IP addresses, multicast IP addresses, local IP addresses (if
determinable!), and addresses already in the reported-to blacklists. Do
make sure that any spamtrap address for which you use it will not
spamtrap email from the services you are submitting IP addresses to!
- I have made available our local nameserver,
dogberry.rutgers.edu, for usage as a public secondary for the
blacklists:
and may in the future make it available for use as a public secondary
for other blacklists, depending on server load (this is not, after
all, the primary task of the server).
- I have been working on a DNS server which can be used to do checks of
whether an IP address is from a host or domain known to block
legitimate relay testing services (or otherwise cause problems for,
ranging from preferring being blocked to being tested to attempts to
shut down the service), such legitimate relay testing services, so that
these may be used in blacklisting (especially as the equivalent of an
"outputs" blacklist, to indicate a need to check Received addresses
for where that email has "been" in the past). (Networks known to do
this include rr.com/rr.net and above.net/mfnx.net.) This would be
created on-the-fly, using both known CIDRs for such hosts and
networks, and domain names gotten via PTR lookups (or NS servers for
said PTRs, if the lookup fails).
- I am doing my best to help with a network database project,
NIDB, which is in the
process (slow as yet, mainly due to other obligations on the part of
those involved) of getting started. Currently, the major two problems
appear to be:
- Figuring out the best way to distribute the database to DNS
servers (the zone files are currently inconveniently large, for
instance) and otherwise how to organize the database files for
maximal utility
- Getting better ways than CC'd mail to organize participants,
namely a proper mailing list server - we're waiting on this
before publicizing the project much further.
- In a long-term project [now on hold], working on ways to combine
blacklist and other information (e.g., domain names vs listings of
"good" and "bad" domains, ISPs, countries (in terms of level of spam
from that country versus likelihood of desired mail from that
country), and other characteristics (e.g., presence of a valid PTR
domain name)) to do a more nuanced evaluation of whether incoming mail
should be bounced or not. I plan on distributing any resulting
programs, which others can use for mail filtering at a site (or
beyond), and may allow access to a resulting
blacklist-et-al-summarizing/integrating server's output beyond our
hosts (it will certainly be allowed to other Rutgers hosts once it's
stable).
Incidentally, spamfighting is obviously one area in
which I disagree with many of those who are otherwise my allies in
political matters. I fear that
EFF (as in the Intel vs Hamadi case)
and others are forgetting that the right to freedom of speech/press
includes the right not to listen to or read what someone else says/writes,
and that freedom of the press in terms of publication is freedom for those
who own the press - not for everyone else to use that press (in
this case, that computer system) when the owner desires otherwise. The
only time that I would view use of a blacklist or other filtration as
censorship is if it is government-mandated, done by a (usually
government-supported) monopoly or oligopoly, or otherwise backed by
governments or bodies of an equivalent degree of power. (Examples of
indirect backing of such would be by tolerating a violation of
contract (e.g., failure by a company to transmit data that they have
contracted to transmit) or fraud (e.g., claiming in
BGP transmissions
that one will transmit data to/from an IP address block that will in
actuality be blackholed, if the data would otherwise take another route
that would succeed). Note, incidentally, that past experience has shown
that monopolies and ogliopolies, such as in Costa Rica
(RACSA) and
China,
tend to host lots of spammers (and lots of incompetent admins who allow
spamming through open relays, proxies, etcetera)... and a monopoly gives
those wishing to block this a difficult choice:
- Block spam, by blocking the monopolized region, but also block the
email of many innocent people who can't move to a better ISP,
due to said monopoly; or
- Don't block the spammers, and be immersed in unwanted email that
requires time to sort out (and/or to work on - inevitably fallible -
programs to sort it for one) and increases the risk of missing wanted
email.
Sometimes, action can be taken that targets directly those perpetuating
the bad situation. This was done with blockades of RACSA's corporate
email servers, for instance. I encourage people to do this with areas
such as China and (South) Korea, targeting both the governments and the
incompetent RIRs/NICs like KRNIC (which has listed, as of last count, some
private IP addresses
as being under its registrational jurisdiction, as well as all the other
problems with their WHOIS servers). Other means including refusing to
support those, like APNIC et al, who support effectively monopoly RIRs
(Regional Internet Registries) such as KRNIC - and ccTLD NICs like the
ones causing the ccTLDs for which they have responsibility to be listed in
whois.rfc-ignorant.org
- despite the incompetence - and possibly corruption - rife there. Their
real lack of caring about the welfare of the Internet community - the
arrogance of their monopoly status - is shown in the
contracts
they and ICANN have proposed, with their lack of obligation to
anyone except their "members" (as in the ISPs and governments
with large blocks of IP address space bought from that
RIR - they even lack a concern for the effects of their actions on
ISPs and governments elsewhere in the world, much less individual
Internet users). I don't want governmental control either, incidentally,
whether by the US or by multiple governments (as
Stuart Lynn
has proposed - completely ignoring
information warfare, especially as
carried on by governments (including that of the US), and the resultant,
to put it mildly, conflict of interest; it is clearly irrational to
believe that governments which are attacking the network to gain
advantage over each other are going to behave responsibly in matters of
Internet management) - but this behavior is no better, and no more
legitimate.
Blacklists are means of automatically propagating reputations (opinions
about past behavior, for instance), just as with a
PGP/GnuPG
web of trust.
Similarly to said web of trust, they ideally serve as much as possible
in place of the authoritarian arrangements (governmental regulation et
al) which would be otherwise necessary for the survival of the Internet.
Those like John Gilmore (with whom I otherwise agree in many respects,
and indeed respect and honor for his role in the founding of
EFF) who believe that better filtering
with no distributed reputational data will do the job are wanting AI -
indeed, are wanting more than AI on the human level, since they seem
to have forgotten that human beings use reputational data to make up for
lack of prescience. Blacklists are needed. But people also need to know
what they are using and how they will affect their electronic
communications.
This is viewable in
Any Browser and is
Valid HTML 4.01.
Page written by Allen Smith (send mail to
meatcan2@beatrice.rutgers.edu,
substituting easmith for meatcan2 -
see above for why; if it is blocked/bounces,
and you have some legitimate reason for sending email to me, send email
to actual at-symbol spamcop dot net).
I am not responsible for any pages linked from these, except for
those that I have written. Neither is the
Molecular Modeling Laboratory,
the Department of Biochemistry and Microbiology, Cook College,
or Rutgers University responsible for (or have any copyright on) pages
that I have written. My webpages are not
official Rutgers webpages.
This webpage is licensed (copyright 2005) under a Creative Commons
Attribution-ShareAlike 2.5 License.