|
Modules: Cleaner |
Merger | Remover | Filterer | Keeper | Sampler Separator | Seeder | Counter | Splitter | Utilities | Automater
Cleaner
1st Mailinglist
Manager Cleaner is designed
to extract e-mail addresses from raw data files and process
them to get the standard 1st Mailinglist Manager lists (that
is sorted, one item per line, no duplicates).
The Cleaner can process
several input files simultaneously. To select the files press
the “Input files” button. How to make a selection is described
in the section “General notes”, subsection “Input files
selection”.
Shortly, your actions to turn raw text files containing
some e-mail addresses within redundant text into sorted and
de-duped e-mails lists are as follows:
-
Open the Cleaner’s tab in
the 1st Mailinglist Manager application;
-
Choose processing modes,
indicate necessary settings and specify output files;
-
Click the “Go” button and
wait for the processing to complete;
-
Check the log and enjoy the
results.
Processing modes and
options
The
Cleaner offers several processing modes which can be turned on
in the section “Output files and Options” of the Cleaner’s
window. All processing modes can be activated at the same
time; each will generate a separate output file.
Processing mode “E-mail
addresses”
f
this mode is chosen, the Cleaner will extract all
syntactically valid e-mail addresses and also will attempt to
'correct' any e-mail addresses which have some illegal
characters within them. The output file is
alphabetically sorted, one item per line, without duplicates.
The
mode E-mail
addresses has some additional options:
Only
strip out addresses preceded by: check this option and enter
one or several words to indicate that in your output e-mails
list must be only the e-mails that follow this word or words
in the input file.
By default, the Cleaner
places into the output file all syntactically valid e-mail
addresses it finds. But in some cases you may need to get only
those ones which follow a certain word (or words) in the input
file.
The
typical application of this feature is creation of
unsubscribe-lists (or, in other words, remove-lists). Usually
in this case you have some e-mail messages from people asking
you not to mail them any more. Just export these messages into
one file and make the Cleaner process this file with the
option “Only strip out addresses preceded by” set and the word
“From:” written in the adjacent field. You’ll have your
remove-list as the result.
Reject any addresses longer then: check this option and enter
a maximal length (up to 80 characters) and the Cleaner will
reject e-mail addresses longer then this value. The default
maximal length is 45 characters.
No
duplicate domains:
check this option if you need only one e-mail address from
each domain present in the input file. For example, the input
list is:
mary@company.com alex@magazine.com snail@yahoo.com smith@market.com jane@market.com twiggy@yahoo.com info@company.com job@magazine.com nicky@yahoo.com
If
the option “No duplicate domains” is not checked, the
Cleaner will provide the following result:
alex@magazine.com info@company.com jane@market.com job@magazine.com mary@company.com nicky@yahoo.com smith@market.com snail@yahoo.comtwiggy@yahoo.com
If
the option “No duplicate domains” is checked (and the option
“Except mail services” unchecked, see below), the result will
be as follows:
alex@magazine.com mary@company.com smith@market.com snail@yahoo.com
The option
“No duplicate domains” is really useful when a list contains
e-mails in corporative domains. Using this option you avoid
mailing the same message to the same company repeatedly. But
at the same time you keep only one e-mail address per a
web-based service (domains yahoo, hotmail, msn, etc.), while
in fact each address in such domains belongs to a different
person.
To
solve this problem there is a sub-option Except mail services to the
option “No duplicate domains”. Check “Except mail services”
together with “No duplicate domains” to keep in your list all
e-mail addresses which belong to the domains listed in the box
“Mail Services” on the page “Options” of the 1st Mailinglist
Manager. Only one e-mail address will be kept in any other
domain.
For
example, the input list is the same:
mary@company.com alex@magazine.com snail@yahoo.com smith@market.com jane@market.com twiggy@yahoo.com info@company.com job@magazine.com nicky@yahoo.com
If
the options “No duplicate domains” and “Except mail services”
are both checked (and the yahoo domain is indicated in the
“Options”\”Mail Services”), the result will be as
follows:
alex@magazine.com mary@company.com nicky@yahoo.com smith@market.com snail@yahoo.com twiggy@yahoo.com
Allow embedded spaces in AOL
usernames: check this option to turn
on extended processing of AOL e-mail addresses which include
spaces, such as “john smith@aol.com” or “write
me@aol.com”.
If
the option is turned off, the Cleaner will not interpret a
space as a valid e-mail address character and thus will see
these e-mails like “smith@aol.com” and “me@aol.com”.
If
the option is turned on, the Cleaner will see these e-mails in
full, but before placing them into the output file will remove
spaces. The result will be “johnsmith@aol.com” and
“writeme@aol.com”. Still, these e-mails will be absolutely
valid according to AOL rules.
Save rejected e-mails
into: check this option and
specify a file name to get the list of the addresses which are
rejected (i. e. not placed into the output file) by the Cleaner due to some
inadequacy, like:
-
the lengths exceeds maximal
allowable value;
-
there’s only one character
before the “@” symbol;
-
the option “Only strip out
addresses preceded by” is turned on and the e-mail doesn’t
star with the specified word
-
etc.
Processing mode “IP
addresses”
If
this mode is chosen, the Cleaner will place into the output
file all syntactically valid IP addresses it will find in the
input file. The valid IP address is the one which consists of
4 numbers separated by dots, each number not greater than 255
(e. g. 230.121.1.0).
The
output file is sorted in numeric order, one item per line,
without duplicates.
Processing
mode “IP addresses in ()”
If
this mode is chosen, the Cleaner will place into the output
file only those syntactically valid IP addresses which were
within brackets or parentheses in the input file.
Valid
bracket/parentheses types are: () {} [] <>.
The
output file is sorted in numeric order, one item per line,
without duplicates.
Processing mode
“Proxies”
If
this mode is chosen, the Cleaner will extract from the input
file all syntactically valid proxies it can find. The valid
proxy is the one which consists of 4 numbers separated by dots
(each number not greater than 255) immediately followed by a
colon and a port number from 10 to 65535 (e. g.
230.121.1.0:80).
The
output file is sorted in numeric order, one item per line,
without duplicates.
Processing mode
“Proxies in ()”
If
this mode is chosen, the Cleaner will place into the output
file only those syntactically valid proxies which were within
brackets or parentheses in the input file.
Valid
bracket/parentheses types are: () {} [] <>.
The
output file is sorted in numeric order, one item per line,
without duplicates.
Processing mode
“Phone numbers”
If
this mode is chosen, the Cleaner will extract from the input
file phone numbers in North American format: area code (3
digits), exchange (3 digits), local number (4 digits).
The
allowable separators between the parts of the number are
space, parentheses or hyphen. Examples: 305 120 5067;
(903) 701-3018.
The
output file is sorted in numeric order, one item per line,
without duplicates.
The
option Only strip out
addresses preceded by is available for this processing
mode. It performs the same functions as in the E-mail addresses mode
(see above).
The
typical application of this feature is to extract the numbers
of faxes or mobile phones from e-mail messages. Export the
messages into one file and make the Cleaner process this file
with the option “Only strip out addresses preceded by” set and
the word “Fax” or “ Mobile
”
written in the adjacent field. In the result you’ll have the
faxes/mobiles list.
Processing features
There
are some processing features provided by the Cleaner.
- AOL addresses
processing. The additional checks are
performed during AOL e-mail addresses processing to meet the
standards of AOL:
- the address must begin with a letter - the address
must have the length from 3 to 16 characters to the left of
the “@” sign.
- “Agglutinate” addresses
processing. The Cleaner is able to
single out an e-mail address which is joined with the
repeated domain name (like “mary@hotmail.comhotmail.com”).
Such a combination is quite common in large e-mails
bases.
We
would also give you a small tip: you can use the Cleaner to easily
single out the invalid e-mail addresses which returned you the
“undeliverable” messages. Export these messages into a single
text file and process this file with the Cleaner. As the
result you will get the sorted and de-duped list of invalid
addresses, which can be then processed with the 1st
Mailinglist Manager Remover to get rid of
the useless addresses.
Processing results
You
should specify a separate output file for each Cleaner processing mode you
are going to use. There is a field for the output file name
next to each processing mode’s option. How to indicate an
output file is described in the section “General notes”,
subsection “Output files selection”.
After the Cleaner has processed
your raw file and extracted the items you need – e-mails, IPs,
phones, etc. – the results will be placed into the specified
files in the form of sorted lists (either alphabetically or
numeric order, according to the items type), one item per
line, no duplicates.
Indicating the output file names please note that
if there is no file with the specified name in the specified
path, it will be created. If the file with such name already
exists in the specified folder, it will be overwritten. At that
the backup copy of the file may be created according to the
application options (the same refers to any other output
file).
The
original input file will remain unchanged.
Important note
Always have all your raw
files processed by the Cleaner before you
process them with any other 1st Mailinglist Manager task. This
ensures you to have the e-mail addresses, IP addresses, phone
numbers lists not in the form of unpredictable text mess but
in the perfect normalized condition: one item per line,
sorted, without duplicates. This is the only form suitable for
the processing by the Merger, Remover, Keeper and
others.
Neglecting the Cleaner may lead to
error messages and unwanted and useless results.
|