We ran into this same issue and just finished working with support to get it sorted out. I put together a cleaned up version of the steps we went through to get the edge-management-server to a healthy state.
In our situation our LDAP data wound up with incorrect “glue” attributes on some objects, likely due to a sync issue that we have yet to identify. You can check to see if you are running into the same issue by following these steps.
Stop apigee-openldap, export your LDAP data, then check for “glue” object types by running these commands on your apigee-openldap node:
apigee-service apigee-openldap stop
slapcat -F /opt/apigee/data/apigee-openldap/slapd.d -l /tmp/glue.ldif
grep glue /tmp/glue.ldif
If you see output like this, then you might be in the same situation we were, and hopefully this information will help you repair your ldap data.
macbook$ grep glue /tmp/glue.ldif
objectClass: glue
structuralObjectClass: glue
objectClass: glue
structuralObjectClass: glue
objectClass: glue
structuralObjectClass: glue
objectClass: glue
structuralObjectClass: glue
objectClass: glue
structuralObjectClass: glue
If you don’t see any, this fix unfortunately won’t apply to you.
Before doing anything else, please backup all your OpenLDAP data from all instances.
Summary of steps:
- Edit the ldif file to fix any objects that have objectClass or structualObjectClass of glue.
- Shutdown any other LDAP instances that are configured to replicate data.
- Stop OpenLDAP, remove the existing DB, load the new data using slapadd, and set file permissions.
- Start apige-openldap and edge-management-server and check logs to verify health
- Remove the bad DB from other LDAP instances, and start the apigee-openldap service.
- Test functionality
1 - Editing the LDIF file
The goal in the object repair is to change the the objectClass and structuralObjectClass attributes to the correct type and add missing attributes. In our case, we only had four to repair. You can use a clean working export to figure out what is missing or wrong, but in our case Apigee support helped us out.
We had four objects we had to repair:
dn: dc=apigee,dc=com
dn: ou=global,dc=apigee,dc=com
dn: ou=users,ou=global,dc=apigee,dc=com
dn: ou=userroles,ou=global,dc=apigee,dc=com
I’ll go through them one by one. Lines in bold were modified or added, lines with strikethrough were removed.
dn: dc=apigee,dc=com
creatorsName: cn=manager,dc=apigee,dc=com
entryUUID: cb3b3086-3c40-1039-90fd-5f32d4271a03
createTimestamp: 20190716181053Z
entryCSN: 20190822152107.007197Z#000000#001#000000
modifiersName: cn=manager,dc=apigee,dc=com
modifyTimestamp: 20190822152107Z
objectClass: dcObject
objectClass: organization
dc: apigee
o: Apigee
structuralObjectClass: organization
objectClass: top
structuralObjectClass: glue
contextCSN: 20200608174655.054236Z#000000#001#000000
contextCSN: 20200114150746.195003Z#000000#002#000000
dn: ou=global,dc=apigee,dc=com
entryUUID: cb3dc558-3c40-1039-90fe-5f32d4271a03
creatorsName: cn=manager,dc=apigee,dc=com
createTimestamp: 20190716181053Z
entryCSN: 20190716181053.290392Z#000000#000#000000
modifiersName: cn=manager,dc=apigee,dc=com
modifyTimestamp: 20190716181053Z
objectClass: top
objectClass: organizationalUnit
structuralObjectClass: organizationalUnit
ou: global
objectClass: glue
structuralObjectClass: glue
dn: ou=users,ou=global,dc=apigee,dc=com
entryUUID: cb3e43f2-3c40-1039-90ff-5f32d4271a03
creatorsName: cn=manager,dc=apigee,dc=com
createTimestamp: 20190716181053Z
entryCSN: 20190716181053.293634Z#000000#000#000000
modifiersName: cn=manager,dc=apigee,dc=com
modifyTimestamp: 20190716181053Z
objectClass: top
objectClass: organizationalUnit
structuralObjectClass: organizationalUnit
ou: users
objectClass: glue
structuralObjectClass: glue
dn: ou=userroles,ou=global,dc=apigee,dc=com
entryUUID: cb3e6422-3c40-1039-9100-5f32d4271a03
creatorsName: cn=manager,dc=apigee,dc=com
createTimestamp: 20190716181053Z
entryCSN: 20190716181053.294458Z#000000#000#000000
modifiersName: cn=manager,dc=apigee,dc=com
modifyTimestamp: 20190716181053Z
objectClass: top
objectClass: organizationalUnit
structuralObjectClass: organizationalUnit
ou: userroles
objectClass: glue
structuralObjectClass: glue
Save the file as /tmp/glue_fixed.ldif and go to the next step
2 - Shutdown any other replicated OpenLDAP instance.
To prevent another instance from syncing with then one you are working on, stop the service on remotes nodes. If you aren’t sure if it’s replicating anywhere, you can search the ldap config for ‘olcSyncrepl’
grep -ir 'olcSyncrepl:' /opt/apigee/data/apigee-openldap/slapd.d/*
/opt/apigee/data/apigee-openldap/slapd.d/cn=config/olcDatabase={2}bdb.ldif:olcSyncrepl: {0}rid=001 provider=ldap://<remote_sync_ip>:10389/ binddn="cn=manage
3 - Remove the existing data and load new data.
Note: make sure you’ve backed everything up and that apigee-openldap is stopped.
-
Change to your ldap data directory, which is probably: /opt/apigee/data/apigee-openldap/ldap
-
Take a backup of the files in the directory, once you’ve done that, remove everything but DB_CONFIG.
-
Now load the fixed ldif file using slapadd.
slapadd -F /opt/apigee/data/apigee-openldap/slapd.d/ -l /tmp/glue_fixed.ldif
-
Ensure that the new files created by the slapadd command are owned by the proper user, in our case I had to change ownership to the ‘apigee’ user and group. If permissions are not correct, apigee-openldap may fail to start.
chown apigee:apigee *
4 - Start the services to validate changes
At this point you can try starting up the apigee-openldap and edge-management-server services. Both should work at this point. If they do not, check the edge-management-server logs and see if the error has changed.
5 - Cleanup and start any replicated openLDAP instances.
If you are replicating to a remote instance, we need to clean it up before starting it back up. As always, backup all data before making changes. Follow steps 3-1 and 3-2 to give the node a clean DB and start the service.
Do not load the ldif file. Replication will handle pushing data.
6 - Test functionality
At this point our environment was backup and appeared healthy, we testing logging into the edge-ui to make sure our user accounts worked as expected.
Miscellaneous notes
I picked up a few useful things along the way, particularly enabling openLDAP verbose debugging. You can turn it on by modifying this file:
/opt/apigeeapigee-openldap/lib/settings
Change the last digit of this line to ‘-1’ and restart apigee-openldap. See this page about openldap debug levels: https://www.openldap.org/doc/admin24/slapdconfig.html
Note: This setting seems to revert after starting the server, so if the service is restarted, the setting will revert back to the default.
EXTRA_ARGS="-h ldap://:$LDAP_PORT/ -F $APIGEE_APP_DATADIR/slapd.d/ -u $RUN_USER -d 64"
This gave me some insights as to what the edge-management-server was trying to do when it throws the error:
‘Got a life cycle exception while starting service [SecurityService, Unable to add initialize users and resources’.
In the debug logs we see it try to add the userroles ou and fail because it already exists. I’m guessing it doesn’t recognize the object as being what it wants because it has object type(s) of “glue”.
Debug Logs:
5f396615 >>> dnPrettyNormal:
=> ldap_bv2dn(ou=userroles,ou=global,dc=apigee,dc=com,0)
<= ldap_bv2dn(ou=userroles,ou=global,dc=apigee,dc=com)=0
=> ldap_dn2bv(272)
<= ldap_dn2bv(ou=userroles,ou=global,dc=apigee,dc=com)=0
=> ldap_dn2bv(272)
<= ldap_dn2bv(ou=userroles,ou=global,dc=apigee,dc=com)=0
5f396615 <<< dnPrettyNormal: ,
5f396615 conn=1000 op=3 ADD dn="ou=userroles,ou=global,dc=apigee,dc=com"
5f396615 ==> bdb_add: ou=userroles,ou=global,dc=apigee,dc=com
5f396615 oc_check_required entry (ou=userroles,ou=global,dc=apigee,dc=com), objectClass "organizationalUnit"
5f396615 oc_check_allowed type "ou"
5f396615 oc_check_allowed type "objectClass"
5f396615 oc_check_allowed type "structuralObjectClass"
5f396615 slap_queue_csn: queueing 0x7fce34105e00 20200816170005.270911Z#000000#001#000000
5f396615 bdb_add: txn1 id: 800001d3
5f396615 bdb_dn2entry("ou=userroles,ou=global,dc=apigee,dc=com")
5f396615 => bdb_dn2id("ou=userroles,ou=global,dc=apigee,dc=com")
5f396615 <= bdb_dn2id: got id=0x4
5f396615 entry_decode: "ou=userroles,ou=global,dc=apigee,dc=com"
5f396615 <= entry_decode(ou=userroles,ou=global,dc=apigee,dc=com)
5f396615 send_ldap_result: conn=1000 op=3 p=3
5f396615 send_ldap_result: err=68 matched="" text=""
5f396615 send_ldap_response: msgid=4 tag=105 err=68
ber_flush2: 14 bytes to sd 16
0000: 30 0c 02 01 04 69 07 0a 01 44 04 00 04 00 0....i...D....
ldap_write: want=14, written=14
0000: 30 0c 02 01 04 69 07 0a 01 44 04 00 04 00 0....i...D....
5f396615 conn=1000 op=3 RESULT tag=105 err=68 text=
The lines that stand out are:
5f396615 send_ldap_result: err=68 matched="" text=""
5f396615 conn=1000 op=3 RESULT tag=105 err=68 text=
According to this page (https://ldapwiki.com/wiki/LDAP%20Result%20Codes), ldap error 68 translates to:
“LDAP_ALREADY_EXISTS - Indicates that the add operation attempted to add an entry that already exists, or that the modify operation attempted to rename an entry to the name of an entry that already exists.”