In fact, it’s HIGHLY recommended….
Performance testing and stress testing are closely related and are essential tasks in any OpenAM deployment.
When conducting performance testing, you are trying to determine how well your system performs when subjected to a particular load. A primary goal of performance testing is to determine whether the system that you just built can support your client base (as defined by your performance requirements). Oftentimes you must tweak things (memory, configuration settings, hardware) in order to meet your performance requirements, but without executing performance tests, you will never know if you can support your clients until you are actually under fire (and by then, it may be too late).
Performance testing is an iterative process as shown in the following diagram:
Each of the states may be described as follows:
- Test – throw a load at your server
- Measure – take note of the results
- Compare – compare your results to those desired
- Tweak – modify the system to help achieve your performance results
During performance testing you may continue in this loop until such time that you meet your performance requirements – or until you find that your requirements were unrealistic in the first place.
Stress testing (aka “torture testing”) goes beyond normal performance testing in that the load you place on the system intentionally exceeds the anticipated capacity. The goal of stress testing is to determine the breaking point of the system and observe the behavior when the system fails.
Stress testing allows you to create contingency plans for those ‘worse case scenarios’ that will eventually occur (thanks to Mr. Murphy).
Before placing OpenAM into production you should test to see if your implementation meets your current performance requirements (concurrent sessions, authentications per second, etc.) and have a pretty good idea of where your limitations are. The problem is that an OpenAM deployment is comprised of multiple servers – each that may need to be tested (and tuned) separately. So how do you know where to start?
When executing performance and stress tests in OpenAM, there are three areas where I like to place my focus: 1) the protected application, 2) the OpenAM server, and 3) the data store(s). Testing the system as a whole may not provide enough information to determine where problems may lie and so I prefer to take an incremental approach that tests each component in sequence. I start with the data stores (authentication and user profile databases) and work my way back towards the protected application – with each iteration adding a new component.
Note: It should go without saying that the testing environment should mimic your production environment as closely as possible. Any deviation may cause your test results to be skewed and provide inaccurate data.
An OpenAM deployment may consist of multiple data stores – those that are used for authentication (Active Directory, OpenDJ, Radius Server, etc.) and those that are used to build a user’s profile (LDAP and RDBMS). Both of these are core to an OpenAM deployment and while they are typically the easiest to test, a misconfiguration here may have a pretty big impact on overall performance. As such, I start my testing at the database layer and focus only on that component.
Performance of an authentication database can be measured by the average number of authentications that occur over a particular period of time (seconds, minutes, hours) and the easiest way to test these types of databases is to simply perform authentication operations against them.
You can write your own scripts to accomplish this, but there are many freely available tools that can be used as well. One tool that I have used in the past is the SLAMD Distributed Load Generation Engine. SLAMD was designed to test directory server performance, but it can be used to test web applications as well. Unfortunately, SLAMD is no longer being actively developed, but you can still download a copy from http://dl.thezonemanager.com/slamd/.
A tool that I have started using to test authentications against an LDAP server is authrate, which is included in ForgeRock’s OpenDJ LDAP Toolkit. Authrate allows you to stress the server and display some really nice statistics while doing so. The authrate command line tool measures bind throughput and response times and is perfect for testing all sorts of LDAP authentication databases.
Performance of a user profile database is typically measured in search performance against that database. If your user profile database can be searched using LDAP (i.e. Active Directory or any LDAPv3 server), then you can use searchrate – also included in the OpenDJ LDAP Toolkit. searchrate is a command line tool that measures search throughput and response time.
The following is sample output from the searchrate command:
------------------------------------------------------------------------------- Throughput Response Time (ops/second) (milliseconds) recent average recent average 99.9% 99.99% 99.999% err/sec Entries/Srch ------------------------------------------------------------------------------- 188.7 188.7 3.214 3.214 306.364 306.364 306.364 0.0 0.0 223.1 205.9 2.508 2.831 27.805 306.364 306.364 0.0 0.0 245.7 219.2 2.273 2.622 20.374 306.364 306.364 0.0 0.0 238.7 224.1 2.144 2.495 27.805 306.364 306.364 0.0 0.0 287.9 236.8 1.972 2.368 32.656 306.364 306.364 0.0 0.0 335.0 253.4 1.657 2.208 32.656 306.364 306.364 0.0 0.0 358.7 268.4 1.532 2.080 30.827 306.364 306.364 0.0 0.0
The first two columns represent the throughput (number of operations per second) observed in the server. The first column contains the most recent value and the second column contains the average throughput since the test was initiated (i.e. the average of all values contained in column one).
The remaining columns represent response times with the third column being the most recent response time and the fourth column containing the average response time since the test was initiated. Columns five, six, and seven (represented by percentile headers) demonstrate how many operations fell within that range.
For instance, by the time we are at the 7th row, 99.9% of the operations are completed in 30.827 ms (5th column, 7th row), 99.99% are completed in 306.364 ms (6th column, 7th row), and 99.999% of them are completed within 306.364 ms (7th column, 7th row). The percentile rankings provide a good indication of the real system performance and can be interpreted as follows:
- 1 out of 1,000 search requests is exceeding 30 ms
- 1 one out of 100,000 requests is exceeding 306 ms
Note: The values contained in this search were performed on an untuned, limited resource test system. Results will vary depending on the amount of JVM memory, the system CPU(s), and the data contained in the directory. Generally, OpenDJ systems can achieve much better performance that the values shown above.
There are several factors that may need to be considered when tuning authentication and user profile databases. For instance, if you are using OpenDJ for your database you may need to modify your database cache, the number of worker threads, or even how indexing is configured in the server. If your constraint is operating system based, you may need to increase the size of the JVM or the number of file descriptors. If the hardware is the limiting factor, you may need to increase RAM, use high speed disks, or even faster network interfaces. No matter what the constraint, you should optimize the databases (and database servers) before moving up the stack to the OpenAM instance.
OpenAM Instance + Data Store(s)
Once you have optimized any data store(s) you can now begin testing directly against OpenAM as it is configured against those data store(s). Previous testing established a performance baseline and any degradation introduced at this point will be due to OpenAM or the environment (operating system, Java container) where it has been configured.
But how can you test an OpenAM instance without introducing the application that it is protecting? One way is to generate a series of authentications and authorizations using direct interfaces such as the OpenAM API or REST calls. I prefer to use REST calls as this is the easiest to implement.
There are browser based applications such as Postman that are great for functional testing, but these are not easily scriptable. As such, I lean towards a shell or Perl script containing a loop of cURL commands.
Note: You should use the same authentication and search operations in your cURL commands to be sure that you are making a fair comparison between the standalone database testing and the introduction of OpenAM.
You should expect some decrease in performance when the OpenAM server is introduced, but it should not be too drastic. If you find that it falls outside of your requirements, however, then you should consider updating OpenAM in one of the following areas:
- LDAP Configuration Settings (i.e. connections to the Configuration Server)
- Session Settings (if you are hitting limitations)
- JVM Settings (pay particular attention to garbage collection)
- Cache Settings (size and time to live)
Details behind each of these areas can be found in the OpenAM Administration Guide.
You may also find that OpenAM’s interaction with the database(s) introduces searches (or other operations) that you did not previously test for. This may require you to update your database(s) to account for this and restart your performance testing.
Note: Another tool I have started playing with is the Java Application Monitor (aka JAMon). While this tool is typically used to monitor a Java application, it provides some useful information to help determine bottlenecks working with databases, file IO, and garbage collection.
Application + OpenAM Instance + Data Store(s)
Once you feel comfortable with the performance delivered by OpenAM and its associated data store(s), it is time to introduce the final component – the protected application, itself.
This will differ quite a bit based on how you are protecting your application (for instance, policy agents will behave differently from OAuth2/OpenID Connect or SAML2) but this does provide you with the information you need to determine if you can meet your performance requirements in a production deployment.
If you have optimized everything up to this point, then the combination of all three components will provide a full end to end test of the entire system. In this case, then an impact due to network latency will be the most likely factor in performance testing.
To perform a full end to end test of all components, I prefer to use Apache JMeter. You configure JMeter to use a predefined set of credentials, authenticate to the protected resource, and look for specific responses from the server. Once you see those responses, JMeter will act according to how you have preconfigured it to act. This tool allows you to generate a load against OpenAM from login to logout and anything in between.
Keep in mind that any time that you introduce a monitoring tool into a testing environment, the tool (itself) can impact performance. So while the numbers you receive are useful, they are not altogether acurate. There may be some slight performance degradation (due to the introduction of the tool) that your users will never see.
You should also be aware that the client machine (where the load generation tools are installed) may become a bottleneck if you are not careful. You should consider distributing your performance testing tools across multiple client machines to minimize this effect. This is another way of ensuring that the client environment does not become the limiting factor.
Like many other areas in our field, performance testing an OpenAM deployment may be considered as much of an art as it is a science. There may be as many methods for testing as there are consultants and each varies based on the tools they use. The information contained here is just one approach performance testing – one that I have used successfully in our deployments.
What methods have you used? Feel free to share in the comments, below.
Securing SAML Assertions
SAML assertions passed over the public Internet will include a digital signature signed by an Identity Provider’s private key. Additionally, the assertion will include the IdP’s public key contained in the body of a digital certificate. Service Providers receiving the assertion can be assured that it has not been tampered with by comparing the unencrypted (hashed) message obtained from the digital signature with a hashed version of the message created by the Service Provider using the same hashing algorithm.
The process can be demonstrated by the following diagram where the Signing process is performed by the IdP and the Verification process is performed by the SP. The “Data” referred to in the diagram is the assertion and the “Hash function” is the hashing algorithm used by both the Identity Provider and the Service Provider.
In order for an Identity Provider to sign the assertion, they must first have a digital certificate.
OpenAM includes a default certificate that can use for testing purposes. This certificate is common to all installations and while convenient, should not be used for production deployments. Instead, you should either use a certificate obtained from a trusted certificate authority (such as Thawte or Entrust) or generate your own self-signed certificate.
Note: For the purposes of this article, $CONFIG refers to the location of the configuration folder specified during the installation process. $URI refers to the URI of the OpenAM application; also specified during the installation process (i.e. /openam).
OpenAM’s Default Signing Key
OpenAM stores its certificates in a Java Keystore file located in the $CONFIG/$URI folder by default. This can be found in the OpenAM Console as follows:
- Log in to the OpenAM Console as the administrative user.
- Select the Configuration tab.
- Select the Servers and Sites subtab.
- In the Servers panel, select the link for the appropriate server instance.
- Select the Security tab.
- Select the Key Store link at the top of the page.
You will see that the default location for the Java Keystore file, all passwords, and the alias of the default test certificate as follows:
Viewing the Contents of OpenAM’s Default Certificate
You can view the contents of this file as follows:
- Change to the $CONFIG/$URI folder.
- Use the Java keytool utility to view the contents of the file. (Note: The contents of the file are password protected. The default password is: changeit)
# keytool –list –keystore keystore.jks
Enter keystore password: changeit
Keystore type: JKS
Keystore provider: SUN
Your keystore contains 1 entry
Alias name: test
Creation date: Jul 16, 2008
Entry type: PrivateKeyEntry
Certificate chain length: 1
Owner: CN=test, OU=OpenSSO, O=Sun, L=Santa Clara, ST=California, C=US
Issuer: CN=test, OU=OpenSSO, O=Sun, L=Santa Clara, ST=California, C=US
Serial number: 478d074b
Valid from: Tue Jan 15 19:19:39 UTC 2008 until: Fri Jan 12 19:19:39 UTC 2018
Signature algorithm name: MD5withRSA
Replacing OpenAM’s Default Keystore
You should replace this file with a Java Keystore file containing your own key pair and certificate. This will be used as the key for digitally signing assertions as OpenAM plays the role of a Hosted Identity Provider. The process for performing this includes five basic steps:
- Generate a new Java Keystore file containing a new key pair consisting of the public and private keys.
- Export the digital certificate from the file and make it trusted by your Java installation.
- Generate encrypted password files that permit OpenAM to read the keys from the Java Keystore.
- Replace OpenAM’s default Java Keystore and password files with your newly created files.
- Restart OpenAM.
The following provides the detailed steps for replacing the default Java Keystore.
1. Generate a New Java Keystore Containing the Key Pair
a) Change to a temporary folder where you will generate your files.
# cd /tmp
b) Use the Java keytool utility to generate a new key pair that will be used as the signing key for your Hosted Identity Provider.
# keytool -genkeypair -alias signingKey -keyalg RSA -keysize 1024 -validity 730
-storetype JKS -keystore keystore.jks
Enter keystore password: cangetin
Re-enter new password: cangetin
What is your first and last name?
What is the name of your organizational unit?
What is the name of your organization?
[Unknown]: Identity Fusion
What is the name of your City or Locality?
What is the name of your State or Province?
What is the two-letter country code for this unit?
Is CN=idp.identityfusion.com, OU=Security, O=Identity Fusion, L=Tampa, ST=FL, C=US correct?
Enter key password for <signingKey>
(RETURN if same as keystore password): cangetin
Re-enter new password: cangetin
You have now generated a self-signed certificate but since it has been signed by you, it is not automatically trusted by other applications. In order to trust the new certificate, you need to export it from your keystore file, and import it into the cacerts file for your Java installation. To accomplish this, perform the following steps:
2. Make the Certificate Trusted
a) Export the self-signed certificate as follows:
# keytool -exportcert -alias signingKey -file idfSelfSignedCert.crt -keystore keystore.jks
Enter keystore password: cangetin
Certificate stored in file <idfSelfSignedCert.crt>
b) Import the certificate into the Java trust store as follows:
# keytool -importcert -alias signingKey -file idfSelfSignedCert.crt -trustcacerts
Enter keystore password: changeit
Owner: CN=idp.identityfusion.com, OU=Security, O=Identity Fusion, L=Tampa, ST=FL, C=US
Issuer: CN=idp.identityfusion.com, OU=Security, O=Identity Fusion, L=Tampa, ST=FL, C=US
Serial number: 34113557
Valid from: Thu Jan 30 04:25:51 UTC 2014 until: Sat Jan 30 04:25:51 UTC 2016
Signature algorithm name: SHA256withRSA
#1: ObjectId: 184.108.40.206 Criticality=false
0000: 12 3B 83 BE 46 D6 D5 17 0F 49 37 E4 61 CC 89 BE .;..F....I7.a...
0010: 6D B0 5B F5 m.[.
Trust this certificate? [no]: yes
OpenAM needs to be able to open the truststore (keystore.jks) and read the key created in step 1. The private key and truststore database have both been locked with a password that you entered while configuring the truststore and signing key, however. For OpenAM to be able to read this information you need to place these passwords in files on the file system.
3. Generate Encrypted Password Files
Note: The passwords will start out as clear text at first, but will be encrypted to provide secure access.
a) Create the password file for the trust store as follows:
# cat “cangetin” > storepass.cleartext
b) Create the password file for the signing key as follows:
# cat “cangetin” > keypass.cleartext
c) Prepare encrypted versions of these passwords by using the OpenAM ampassword utility (which is part of the OpenAM administration tools).
# ampassword –encrypt keypass.cleartext > .keypass
# ampassword –encrypt storepass.cleartext > .storepass
Note: Use these file names as you will be replacing the default files of the same name.
4. Replace the Default OpenAM Files With Your New Files
a) Make a backup copy of your existing keystore and password files.
# cp $CONFIG/$URI/.keypass $CONFIG/$URI/.keypass.save
# cp $CONFIG/$URI/.storepass $CONFIG/$URI/.storepass.save
# cp $CONFIG/$URI/keystore.jks $CONFIG/$URI/keystore.jks.save
b) Overwrite the existing keystore and password files as follows:
# cp .keypass $CONFIG/$URI/.keypass
# cp .storepass $CONFIG/$URI/.storepass
# cp keystore.jks $CONFIG/$URI/keystore.jks
5. Restart the container where OpenAM is currently running.
This will allow OpenAM to use the new keystore and read the new password files.
Verifying Your Changes
You can use the keytool utility to view the contents of your Keystore as previously mentioned in this article. Alternately, you can log in to the OpenAM Console and see that OpenAM is using the new signing key as follows:
- Log in to OpenAM Console.
- Select the Common Tasks tab.
- Select the Create Hosted Identity Provider option beneath the Create SAMLv2 Providers section.
Verify that you now see your new signing key appear beneath the Signing Key option as follows:
You have now successfully replaced the default OpenAM Java Keystore with your own custom version.
Suppose that you have an OpenDJ directory server with 300,000 entries. And further suppose that the space consumed on your disk for said directory is 1.2 GB and made up of 114 database (*.jdb) files. Suppose that you didn’t plan correctly and you are now running out of space on your hard drive. What should you do? Run to your local System Administrator and beg for him to increase the size of your partition? Before promising to buy him lunch for the next year or offering your first born child to mow his lawn, look to see if you actually need that much space in the first place.
In general, the size of your database is based on three things:
- The number of entries in your database
- The size of an average entry
- Your indexing strategy
The first two items are relatively straight forward as you probably have a good idea of your data profile, but an improper indexing strategy can take you by surprise and may actually cause more harm than good. Indexes are used to increase search performance based on application search filters. Lack of necessary indexes can impact performance, increase aggravation, and lead to calls in the middle of the night. But maintaining indexes that are never used can unnecessarily increase disk space and impact the performance of write operations.
OpenDJ comes with the following default indexes:
Indexes on operational attributes are necessary to make OpenDJ run efficiently. You should never modify these unless instructed to do so by ForgeRock support. Standard attributes, however, are used to increase external application search performance and should reflect the types of searches being performed by your own applications. Default attributes (and index types) are based on ForgeRock’s observations of what most of its customers use, but you may not be like most of their customers and while maintaining some index types can be relatively benign, others (like SUBSTRING) may have a more dramatic effect.
Using Indexes to Increase Search Performance
From a high level perspective, indexes are used to identify likely candidates that might be found as a result of an application’s search filter. Assume, for instance, that you have a simple phone book application that allows you to search for phone numbers based on first name and last name. A filter to locate all entries that have a first name (givenname) of “Bill” would be:
But not every entry in your OpenDJ server has a givenname attribute with a value of “Bill” so looking at every entry to see if it matches may take a lot of time. But how can you avoid looking at every entry? The answer is simple; you create an index for the givenname attribute to narrow down your search. Simply add an EQUALITY index for the givenname attribute and OpenDJ will associate all entries in its database with those that have a particular value. The following is a conceptual representation of how OpenDJ will make this association:
givenname=Wild Bill: 12
This demonstrates that the givenname value for entries 1, 3, 9, and 22 are all “Bill”. When OpenDJ receives a search for all entries that have a first name of “Bill”, it immediately knows that a match is found in records 1, 3, 9 and 22. It doesn’t even look at the other entries. In a database that contains hundreds of thousands of entries, this can drastically increase search performance.
This is all well and fine, but how can indexes actually impact us?
How Unnecessary Indexes Can Hurt You
Imagine that you met a coworker at a party last night and you didn’t quite get his name. You seem to remember his name was Bill, but you heard people call him Bill, Billy, Wild Bill, and even Bill-O-Rama. You want to look him up, but you can’t because you really aren’t sure about his first name. Hopefully your same phone book application allows you to search for all entries that contain the string, “Bill”. If so, an EQUALITY index would not work as you really don’t know the specifics of what you are looking for. In this case you would create a SUBSTRING index for the givenname attribute. In so doing, OpenDJ will associate substrings with entries as follows:
givenname=*ild : 12
givenname=*ld B: 12
givenname=*d Bi: 12
|Note: OpenDJ created entries for substrings consisting of four or more characters. These include the beginning of string (^) and end of string ($) characters; the shorter the string, the fewer entries that are created. Imagine how many entries would be generated if the attribute contained a value of ‘supercalifragilisticexpialidocious’!|
There are times when maintaining indexes may actually be more costly than if you were to perform an unindexed search (i.e. evaluate every entry in the directory server). To prevent this, OpenDJ provides the ds-cfg-index-entry-limit configuration parameter that allows you to define an upper limit on the number of indexes maintained for an attribute. There is a global (default) value of 4000 for this parameter, but it may also be configured on a per indexed attribute basis. A value of 4000 means that OpenDJ will stop generating index values once it reaches 4000 index entries. A minor problem is that you can maintain up to 4000 index entries for attributes that are never included in a search filter. A bigger problem, however, is that each time a write operation is performed that includes the indexed attribute, the indexes for that attribute are rebuilt. If your OpenDJ server is subject to extensive write operations, then you may be constantly writing and rewriting your database files which may impact write performance and ultimately overall server performance. (See “Unlocking the Mystery behind the OpenDJ User Database” for more information on how, when, and why the database files are rewritten on change operations.)
Determining Whether an Index is Necessary or Not
A recommendation is to maintain only those indexes for attributes that are included in your application search filters. The types of indexes selected should reflect the manner of searches being performed by your application. To determine this, you can review your LDAP-enabled applications and attempt to ascertain the types of filters it may be producing; but this may not be so obvious.
A more realistic approach is to come up with a “best guess” and then monitor your server to see if your guess was accurate or not. You can then add, delete, or modify attribute indexes based on your findings.
When to Add Indexes
You should monitor your access logs for searches that take a long time and consider adding indexes for search times that you find unacceptable. This can be seen in the etime (or elapsed time) value which is displayed in milliseconds (by default). This is subject to your own SLAs, but etimes greater than 5 milliseconds may be considered unacceptable. If you see etimes in the order of seconds (as shown below) then you definitely need to investigate further.
[31/Dec/2013:18:07:21 +0000] SEARCH RES conn=2231288 op=6 msgID=502 result=0 nentries=1 unindexed etime=5836
This access log entry indicates that the search took 5.8 seconds to complete. One reason why it took so long was that it was an unindexed search (as noted by the “unindexed” tag in the entry). To determine the filter associated with this search, you need to search backwards in the access log and find the corresponding SEARCH REQ for this connection (conn=2231288) and this operation (op=6).
[31/Dec/2013:18:07:15 +0000] SEARCH REQ conn=2231288 op=6 msgID=502 base="ou=people,dc=example,dc=com" scope=wholeSubtree filter="(&(&(exampleGUID=88291000818)(objectclass=inetorgperson)))" attrs="*"
This access log entry indicates that the filter used to perform the search is
OpenDJ contains a default EQUALITY index for objectclass so assuming that you have not modified the default indexes, then the unindexed attribute causing the problem is exampleGUID. Now that you have identified the culprit, should you run right out and create an EQUALITY index for this attribute? Not necessarily. It really depends on how often you see searches of this type appear in the access logs and what their impact might be. You don’t want to maintain exampleGUID indexes if your application only searches on this attribute once in a blue moon. If, however, you see this type of search on a consistent basis, you might want to consider adding an index.
When to Remove Unnecessary Indexes
It is relatively straightforward to determine when to add indexes, but how do you know when you are maintaining unnecessary indexes? Unfortunately, OpenDJ does not include utilities to tell you this, but it is possible to determine unused indexes by once again, reviewing the search filters in the access logs. One approach to accomplishing this would be to perform the following:
- Determine attribute names included in the search filter.
- Determine type of search being performed (EQUALITY, SUBSTRING, PRESENCE, etc.)
- Determine the frequency of the searches.
- Compare the searches to the already configured indexes.
- Remove unnecessary indexes (if desired).
It is pretty easy to write a script to perform these steps and fortunately one has already been written by Chris Ridd to perform steps 1 through 4. His topfilters script can be found here. Once armed with the information from his script you would simply compare it to what you already have configured for OpenDJ.
How to Determine Current Indexes and Index Types
Current indexes are reflected beneath the cn=config suffix of your OpenDJ server. You can either query this suffix as the rootDN user or you can simply view the contents of the config.ldif file to see what indexes have been configured.
Another method is to use the dbtest command to obtain a more detailed analysis on each index. The dbtest command can be found in the bin directory of your OpenDJ installation. An example execution of this command might be:
/opt/opendj/bin/dbtest list-index-status -b "dc=example,dc=com" -n userRoot
Execution of this command will return each index, its type, the database it is associated with, whether the index is valid or not, and the number of records associated with the index. It will also detail the undefined index keys that are not maintained due to the ds-cfg-index-entry-limit being reached for that attribute.
You can take the data returned from Chris’ script, compare it with the data found for those indexes you are currently maintaining and make an intelligent decision as to whether you want to modify your indexes in any way.
Should you delete any indexes that you believe are not being used? Again, not necessarily. Your access logs only reflect a point in time and may not provide a comprehensive listing of application search filters. You should always carefully consider removing existing indexes, but if you find that you have made a mistake, you can always monitor the access log for searches that are taking an unacceptably long time – or wait for that 3:00 am phone call to let you know.
If you do decide it is necessary to update your indexes, then the best approach is to do so using the OpenDJ Control Panel or the dsconfig command line tool. You should never update the config.ldif file directly.
The following provides an overview of how to add a new index for the exampleGUID attribute. The index type is set to EQUALITY.
/opt/opendj/bin/dsconfig create-local-db-index --port 4444 --hostname ldap1.example.com --bindDN "cn=Directory Manager" --bindPassword password
--backend-name userRoot --index-name exampleGUID --set index-type:equality
The following provides an overview of how to remove an existing EQUALITY index type from an existing mail index.
/opt/opendj/bin/dsconfig set-local-db-index-prop --port 4444 --hostname ldap1.example.com --bindDN "cn=Directory Manager" --bindPassword password
--backend-name userRoot --index-name mail --remove index-type:equality
If you would rather remove the entire mail index, use the following command, instead.
/opt/opendj/bin/dsconfig delete-local-db-index --port 4444 --hostname ldap1.example.com --bindDN "cn=Directory Manager" --bindPassword password
--backend-name userRoot --index-name mail --trustAll
OpenDJ automatically updates indexes on LDAP operations that update the database. Adding or deleting an index or an index value is a configuration change, however, and does not affect index values already in the database. If you delete an index type, existing index values will remain in the database until you rebuild the index. The same is true if you add a new index or index type. Indexes will not be added for existing database entries until you rebuild the index.
As such, any configuration changes to indexes should be followed by a rebuilding of the appropriate index. The following provides an overview of how to rebuild the mail index once its configuration has changed.
/opt/opendj/bin/rebuild-index -p 4444 -D "cn=Directory Manager" -w password -b "dc=example,dc=com" --index mail --start 0 --trustAll
|Note: It is not necessary to stop the OpenDJ instance before performing this task. It has been my experience, however, that if you are able to stop the server you might want to consider doing so. If so, then you do not need to specify a start time, bind credentials, or the trust acceptance. These are not necessary as you will be initiating the connection immediately and directly to the database.|
Debugging Index Problems
There are times when you may see performance problems that indicate that you are performing an unindexed search, but when you look at the indexes, you find that the appropriate index has been configured.
|Note: This problem typically occurs when you do not rebuild the index after you have configured it. Essentially, there is already data in the database when the indexes were applied. In such cases, OpenDJ will not attempt to update the index until an initial rebuild-index has been performed.|
One method of debugging this problem is to use the debugsearchindex capability in OpenDJ.
If you perform your search and request that the debugsearchindex attribute be returned as follows:
/opt/opendj/bin/ldapsearch -D "cn=Directory Manager" -w cangetin -b "ou=people,dc=example,dc=com" -s sub "(&(&( exampleGUID=88291000818)(objectclass=inetorgperson)))" debugsearchindex
OpenDJ will emulate the search, but will not actually perform it against the database. Instead, it will tell you how the search is to be performed and whether or not the values are indexed or not as follows:
debugsearchindex: filter=(&(&( exampleGUID=88291000818)[NOT-INDEXED](object
[NOT-INDEXED] scope=wholeSubtree[LIMIT-EXCEEDED:30] final=[NOT-INDEXED]
If you see something like this but your configuration tells you that the indexes have been configured, then it is time to send your LDAP administrator to training.
As with most middleware products knowing when and how to configure indexes can be as much of an art as it is a science. You should follow best practices where possible, but as with other products you should monitor your server to see if those practices apply to you and react where appropriate.
While teaching a recent ForgeRock OpenDJ class, a student of mine observed an interesting behavior that at first seemed quite odd. While rebuilding his attribute indexes, the student found that the overall database size seemed to grow each time he performed a reindex operation. What seems obvious to me now sure made me scratch my head as I scrambled for an answer. I am sharing my findings here in the hopes that others will either a) find this information useful or b) find comic relief as to my misfortune.
|Note: If you are unclear about the information contained in OpenDJ’s database files, then I highly recommend that you read my posting entitled, Unlocking the Mystery behind the OpenDJ User Database. In that article I describe the overall structure of the Berkeley DB Java Edition database used by OpenDJ and how both entries (and indexes) are maintained in the same database.|
In OpenDJ, the rebuild-index command is used to update any attribute indexes contained in the OpenDJ database. This is necessary after you make a configuration change that affects indexes (such as modifying the index entry limit). Indexes are database specific and you can elect to rebuild a single attribute index or rebuild all attribute indexes for a particular database.
The following syntax is used to rebuild ALL indexes associated with the dc=example,dc=com suffix and its use is what caused the frantic head-scratching to occur:
$ rebuild-index -h ldap.example.com -p 4444 -D "cn=Directory Manager" -w password -b dc=example,dc=com --rebuildAll
The student observed (and questioned) that every time he rebuilt the indexes, the aggregated size of the
*.jdb files actually increased by some factor. In the case of a rebuild-all, it was about 18 MB each time he ran the command; in the case of rebuilding a single index, it was only about 3 MB each time. But the increase was consistent each time he rebuilt the index(es). This continued to occur until it reached a certain size at which time the consumption fell back to its original size (in our observations this occurred at roughly 200 MB when using the rebuild-all option).
The following details the output of the
du -sh command on the
userRoot database each time the
rebuild-index command was run:
- 124 MB
- 142 MB (+ 18MB)
- 160 MB (+ 18MB)
- 178 MB (+ 18MB)
- 200 MB (+ 22MB)
- 123 MB (- 77MB)
This trend was consistent over several iterations.
We continued testing and observed that in addition to the increasing size, the database files on the file system (
*.jdb) were changing as well. What was once
000000002.jdb now became
000000002.jdb file and
000000003.jdb and later became
000000003.jdb file and
000000004.jdb. This occurred at the same time that we dropped back down to the 123 MB size and was the clue that unlocked the mystery.
Unlike the Berkeley Sleepycat database used in OpenDJ’s forefathers, when data is modified in the OpenDJ database, it is not immediately removed from the database. Instead it is marked for removal and the record essentially becomes inactive. Updated records are then appended to the end of the database in a log file fashion.
This process continues until OpenDJ cleaner threads detect that a database file contains less than 50% active records. Once that occurs, the cleaner threads migrate all active records from the file and append them to the end of the last file in the OpenDJ database (a new file is created if necessary). Once migrated, the cleaner threads delete the database file containing the stale entries.
During the rebuild process, old index values in each of the
*.jdb files are marked as inactive and new indexes are added to the database. Simply marking these indexes as inactive does not eliminate their existence in the database and they continue to consume disk space. This process continues until the point where the cleaner threads detect that old indexes account for > 50% of the database entries. At this point, the migration process occurs, new
*.jdb files are created to store the new indexes, old stale
*.jdb files are deleted (hence the
*.jdb file name changes), and the disk space is returned.
When an index is rebuilt, the whole btree is marked as deleted. But since it actually represents specific records of the database files, they will only be collected when the file itself reaches the threshold that triggers recollection.
With small databases, you will see the behavior you’re describing. With larger databases, this will be less noticeable as the amount of index records will be larger and cleanup point may be reached faster.
So there you go, mystery solved!
I recently read a Computerworld article that discussed the reluctance of physicians to share patient data with the patients themselves. The article referenced a survey conducted by Accenture and Harris Interactive that found of the 3,700 physicians asked, only 31% felt that patients should have access to their own healthcare records.
“It found that 82% of U.S. physicians want patients to update their electronic health records with information about themselves, but only 31% believe patients should have full access to that record; 65% believe patients should have only limited access. Four percent said patients should have no access at all.”
This can best be represented by the following graphic from the Computerworld article:
This is old school thinking and is akin to asking someone to “show me yours and I will ‘think’ about showing you mine” (but probably won’t). How very one-sided.
When I first joined the Personal Data Ecosystem Consortium (PDEC), I did so because I believed that people should be allowed to take control of their own data. To me, “personal data” was roughly defined as identity and PII data; this was largely due to my identity background. But over the past year this has shifted towards healthcare data and while many of the same thoughts apply, the ROI on managing healthcare data can be much higher as it directly correlates to a person’s primary asset – their health.
Google Health, Microsoft HealthVault, CareZone – there is no shortage of applications designed to assist people in managing their healthcare data. While some efforts have failed, others remain hopeful. But as this survey demonstrates, there is still a long way to go to change the minds of those who are diagnosing and managing this data – the physicians, themselves. If they could only understand that patients are uniquely capable of assisting in the management of their own healthcare; but in order to do so, they need the data (and they need to understand what it means).
Over the past couple of years we have been developing applications that utilize the Lifedash platform. This allows our users to take control of their own data and selectively share it with others. Our latest application is CareSync and it is directly focused on healthcare. We are currently in a beta of the Web application and are piloting our Health Assistant services. Both of these offerings allow people to aggregate and manage their own healthcare in a collaborative environment but allows them to do it safely and securely. The feedback we have received from our participants has been overwhelmingly positive as people are losing faith in the healthcare system. They either want to (or feel forced to) take an active role in managing their own (or family’s) healthcare but to do so, they need the data.
With the reluctance of most physicians to share it is challenging at best. There are, however, techniques that you can use to obtain this information but it requires persistence (the word “nagging” comes to mind). It should not be that way - after all, it is our data.
In the words of healthcare activist e-Patient Dave, just “give me my damn data!” Or as I would add, just “give me my damn data, help me to understand what you just gave me, and tell me how I compare to others in my situation!”