Ticket #8 (closed defect: fixed)

Opened 11 years ago

Last modified 10 years ago

DB freeze, greyfix doesn't work!

Reported by: marco.favero@… Owned by: kmkaplan
Priority: major Version:
Keywords: Cc:

Description (last modified by kmkaplan) (diff)

After few minute of work my Berkeley DB freeze. "greyfix --dump-triplets" shows no output. All postfix process wait to access DB without chance:

nobody   12075     1  0 Jan08 ?        00:00:00 /usr/local/sbin/greyfix -/ 32 --greylist-delay 300 -r 451%sWait%s%d%sseconds%sand%stry%sagain%p.

nobody   12091     1  0 Jan08 ?        00:00:00 /usr/local/sbin/greyfix -/ 32
[...]

My configuration is: postfix 2.3.4 greyfix 0.3.7 db4-4.1.25-8.1 Red Hat Enterprise Linux ES release 3 (Taroon Update 7)

DB freezed is like this:

[root]# ll /usr/local/var/lib/greyfix/
total 7616
-rw-------    1 nobody   nobody      16384 Jan  8 22:12 __db.001
-rw-------    1 nobody   nobody     278528 Jan  8 22:12 __db.002
-rw-------    1 nobody   nobody     458752 Jan  8 22:12 __db.003
-rw-------    1 nobody   nobody    7401472 Jan  8 22:12 triplets.db

Only if I delete all "db.00*" files it return to work for few minute.

Attachments

greyfix-8.patch (1.8 kB) - added by kmkaplan 11 years ago.
BUGFIX detect deadlocks during run_expiry.

Change History

  Changed 11 years ago by anonymous

I add postfix log when DB stop to work:

postfix/smtpd[24919]: warning: timeout on private/greyfix while reading i nput attribute name postfix/smtpd[24919]: warning: problem talking to server private/greyfix:

Connection timed out

follow-up: ↓ 5   Changed 11 years ago by anonymous

I read better case "Corrupted DB on x64 Solaris 10 U4", and I think that is the same behaviour. But here the OS is Red Hat.

  Changed 11 years ago by anonymous

When db stops to work, I notice that manually running "db_recover" is not sufficient even if I "reload" postfix. I need to manually kill -9 all greyfix processes. Otherwise postfix still reply with 451 4.3.5 Server configuration problem

bye

  Changed 11 years ago by kmkaplan

  • description modified (diff)

in reply to: ↑ 2   Changed 11 years ago by kmkaplan

  • owner set to kmkaplan
  • status changed from new to assigned

Replying to anonymous:

I read better case "Corrupted DB on x64 Solaris 10 U4", and I think that is the same behaviour.

No. The problem you describe seems very much deadlock related. I've had a similar report by private mail. I am investigating this.

Changed 11 years ago by kmkaplan

BUGFIX detect deadlocks during run_expiry.

  Changed 11 years ago by kmkaplan

  • status changed from assigned to closed
  • resolution set to fixed

The attached patch should fix this. It will be in next release.

Note: See TracTickets for help on using tickets.