Ticket #8 (closed defect: fixed)

Opened 7 months ago

Last modified 5 months ago

DB freeze, greyfix doesn't work!

Reported by: marco.favero@csi.it Assigned to: kmkaplan
Priority: major Version:
Keywords: Cc:

Description (Last modified by kmkaplan)

After few minute of work my Berkeley DB freeze. "greyfix --dump-triplets" shows no output. All postfix process wait to access DB without chance:

nobody   12075     1  0 Jan08 ?        00:00:00 /usr/local/sbin/greyfix -/ 32 --greylist-delay 300 -r 451%sWait%s%d%sseconds%sand%stry%sagain%p.

nobody   12091     1  0 Jan08 ?        00:00:00 /usr/local/sbin/greyfix -/ 32
[...]

My configuration is: postfix 2.3.4 greyfix 0.3.7 db4-4.1.25-8.1 Red Hat Enterprise Linux ES release 3 (Taroon Update 7)

DB freezed is like this:

[root]# ll /usr/local/var/lib/greyfix/
total 7616
-rw-------    1 nobody   nobody      16384 Jan  8 22:12 __db.001
-rw-------    1 nobody   nobody     278528 Jan  8 22:12 __db.002
-rw-------    1 nobody   nobody     458752 Jan  8 22:12 __db.003
-rw-------    1 nobody   nobody    7401472 Jan  8 22:12 triplets.db

Only if I delete all "db.00*" files it return to work for few minute.

Attachments

greyfix-8.patch (1.8 kB) - added by kmkaplan on 01/15/08 15:32:11.
BUGFIX detect deadlocks during run_expiry.

Change History

01/09/08 07:38:39 changed by anonymous

I add postfix log when DB stop to work:

postfix/smtpd[24919]: warning: timeout on private/greyfix while reading i nput attribute name postfix/smtpd[24919]: warning: problem talking to server private/greyfix:

Connection timed out

(follow-up: ↓ 5 ) 01/10/08 13:10:44 changed by anonymous

I read better case "Corrupted DB on x64 Solaris 10 U4", and I think that is the same behaviour. But here the OS is Red Hat.

01/11/08 07:14:48 changed by anonymous

When db stops to work, I notice that manually running "db_recover" is not sufficient even if I "reload" postfix. I need to manually kill -9 all greyfix processes. Otherwise postfix still reply with 451 4.3.5 Server configuration problem

bye

01/14/08 00:30:42 changed by kmkaplan

  • description changed.

(in reply to: ↑ 2 ) 01/14/08 09:45:40 changed by kmkaplan

  • owner set to kmkaplan.
  • status changed from new to assigned.

Replying to anonymous:

I read better case "Corrupted DB on x64 Solaris 10 U4", and I think that is the same behaviour.

No. The problem you describe seems very much deadlock related. I've had a similar report by private mail. I am investigating this.

01/15/08 15:32:11 changed by kmkaplan

  • attachment greyfix-8.patch added.

BUGFIX detect deadlocks during run_expiry.

01/15/08 15:37:14 changed by kmkaplan

  • status changed from assigned to closed.
  • resolution set to fixed.

The attached patch should fix this. It will be in next release.