[SAC] Postmortem for LDAP troubles

Sandro Santilli strk at kbt.io
Tue Mar 2 02:26:03 PST 2021


This got somehow fixed but I'm not sure if it was one of
my actions. What I did:

1. Run the /usr/local/bin/copy_ldap_certs_to_secure.sh
   script to update ssl certs if needed

2. Found out that slapd did not restart successfully due
   to wrong permissions of the certificates

3. Fixed certificates permissions and successfully restarted
   slapd

At the end of the above process things started to work again.

The permission tweaking addition to copy_ldap_certs_to_secure.sh
script I've created a pull request for (please review):

  https://git.osgeo.org/gitea/sac/ansible-deployment/pulls/8

Why the copy_ldap_certs_to_secure.sh script invocation was NOT
performed automatically from the crontab of tech_dev is yet
to be understood, and I ticketed it here:

  https://git.osgeo.org/gitea/sac/ansible-deployment/issues/9

Looking forward for the new sysadmin contract !

--strk;

On Tue, Mar 02, 2021 at 09:57:09AM +0100, Sandro Santilli wrote:
> Today tracsvn container cannot connect LDAP server.
> 
> The current configuration for LDAP client on that machine
> is to use the public DNS name for the service (ldap.osgeo.org)
> but attempts to reach that host on port 389 hangs indefinitely.
> Hitting the host on port 636 is fine, with netcat:
> 
>   tracsvn:~# nc -vz ldap.osgeo.org 636
>   DNS fwd/rev mismatch: ldap.osgeo.org != base.osgeo.osuosl.org
>   ldap.osgeo.org [140.211.15.57] 636 (ldaps) open
> 
> But "can't contact" with ldapsearch:
> 
>   tracsvn:~# ldapsearch -H ldaps://ldap.osgeo.org:636 -x 'uid=strk'
>   ldap_sasl_bind(SIMPLE): Can't contact LDAP server (-1)
> 
> The LXD configuration on osgeo7 requests to listen on  port 636
> for the ldap.osgeo.org IP (140.211.15.57) and connect it to port
> 636 of 127.0.0.1 of the "secure" container. Indeed I cannot contact
> the server on that port from secure:
> 
>   secure:~# ldapsearch -H ldaps://127.0.0.1:636 -x 'uid=strk'
>   ldap_sasl_bind(SIMPLE): Can't contact LDAP server (-1)
> 
> While I do can see the ports open (both 636 and 389):
> 
>   secure:~# netstat -tnlp | grep '\(389\|636\)'
> 	tcp        0      0 0.0.0.0:636             0.0.0.0:*               LISTEN      29044/slapd
> 	tcp        0      0 0.0.0.0:389             0.0.0.0:*               LISTEN      29044/slapd
> 	tcp6       0      0 :::636                  :::*                    LISTEN      29044/slapd
> 	tcp6       0      0 :::389                  :::*                    LISTEN      29044/slapd
> 
> Logs from the journal don't even see attempts to connect, but the
> startup messages do contain some info about failures:
> 
> 	secure:~# journalctl -x -u slapd.service -f
> 	Mar 02 08:30:05 secure systemd[1]: slapd.service: Failed to reset devices.list: Operation not permitted
> 	Mar 02 08:30:05 secure systemd[1]: slapd.service: Failed to set invocation ID on control group /system.slice/slapd.service, ignoring: Operation not permitted
> 
> Ever saw those messages? Ideas what could we be up to ?
> Shall I blindly try a stop/start cycle on the LXD container ?
> 
> --strk;


More information about the Sac mailing list