[SAC] Mailman Postfix tuning - Urgent!!

Alex Mandel tech_dev at wildintellect.com
Sun Apr 20 11:34:51 PDT 2014


So Apache on Mail is still off. I tested a couple of times for a few
minutes, and the second I turn it on we get hammered by bing/google and
someone using wp.pl and hotmail addresses to mass subscribe attempt to
all our lists. This drives the load to 20+ instantly and keeps it there.
For those wondering, this is the smoking gun performance killer that
causes trouble for all the other VMs on osgeo4 (not that some other
things don't contribute a little).

** Until we fix this the list subscription pages will be unavailable **

Why- because it's blocking i/o on the whole machine.

The only ideas I've seen online so far are:
1. modsecurity - behavioral blocking of known bad from OWASP rules
2. mod_evasive - behavioral blocking by IP activity, ie too many hits
from same ip/time
3. a honeypot field in the subscription pages that prevents successful
submission by a bot
4. Some other QoS type Throttling of apache by ip.


I've also noticed with the disks not blocking our active mail queue is
maxed out, the deferred is dropping over time, and the deferred
addresses appear to all be @wp.pl
If we leave apache off, I think this will clear in a day or so and the
active queue should drop back down to real mail levels.

Helo and Sender restrictions might help prevent us from being used by
spammers. http://wiki.centos.org/HowTos/postfix_restrictions

I really need help on this one, everything I've posted the last few days
is stuff I've learned about mail servers since Friday. I have 0
experience configuring mail servers.

Thanks,
Alex

On 04/18/2014 07:21 PM, Alex Mandel wrote:
> More progress, based on some advice, I looked at apache.
> Turns out Apache is the cause of much of the i/o along with bouncing
> emails. I added Buffered logs, but that didn't change much.
> 
> I've added a robots.txt with crawl delay since google and bing are all
> over the logs. But there's also strange subscription patterns. For now
> apache is off on Mail. I'll probably leave it off until the disk rebuild
> finishes (eta end of weekend). So no new subscriptions till then. Any
> volunteers to look more into this?
> 
> Looks like mail queue is now catching up to real time...
> 
> FYI started modifying grub to include elevator=noop to change the
> scheduler (makes huge difference on QGIS and Projects) and picking
> swappiness between 10-30 on various machines since we have no shortage
> of ram.
> 
> Thanks,
> Alex
> 
> PS: All the tips I'm putting in emails are coming from various sources
> other than me, they are appreciated, so keep them coming.
> 
> 
> On 04/18/2014 06:36 PM, Alex Mandel wrote:
>> http://wiki.centos.org/HowTos/postfix_restrictions
>>
>> Helo and Sender restrictions might help, by validating that senders have
>> valid domains to start and follow standard practices.
>>
>> We do have recipient restrictions enabled already.
>>
>> FYI, I just tried uping the queue_run_delay to 600 from default 300. To
>> try resending less often.
>>
>> Thanks,
>> Alex
>>
>> On 04/18/2014 06:03 PM, Alex Mandel wrote:
>>> Disk rebuild on osgeo4 is taking longer than expected. Not sure why, but
>>> even with all the VMs off it was going at 2%/hour.
>>>
>>> After a couple of hours I turned on Mail so that we wouldn't impact the
>>> mailing lists for too long. The problem here is that Mail is the biggest
>>> i/o user on osgeo4.
>>>
>>> Poking around and reading up, I think there are some things that can be
>>> done to make Mail behave better but would really prefer someone who
>>> knows postfix step in and help out here.
>>>
>>> Ideas and observations:
>>> Ram-disk for the queues? (Yes we can allocate more ram to the instance)
>>>
>>> We seem to have a high deferred queue, do we need to clean out bad
>>> addresses? Perhaps increase the delay time, or spin off the deferred
>>> queue to another "graveyard" server?
>>>
>>> Is apache being hammered by bots scanning the archives?
>>>
>>> Are there bot like things trying to subscribe, I see some odd @wp.pl
>>> addresses in the apache logs, trying to hit many list in a few seconds
>>> and then rotating the username.
>>>
>>> Are we using a local DNS cache?
>>>
>>> http://www.postfix.org/QSHAPE_README.html#deferred_queue
>>> http://www.postfix.org/postconf.5.html#queue_run_delay
>>>
>>> Who's up for the challenge?
>>>
>>> Thanks,
>>> Alex
>>>
>>> PS: I'll turn QGIS and Projects back on in a few hours. Hopefully this
>>> can be tuned before that. Adhoc will likely stay off for the weekend.
>>> _______________________________________________
>>> Sac mailing list
>>> Sac at lists.osgeo.org
>>> http://lists.osgeo.org/mailman/listinfo/sac




More information about the Sac mailing list