ForgeRock | Easy Identity

OpenDJ Indexes Explained

January 1, 2014 idmdude 4 comments

Suppose that you have an OpenDJ directory server with 300,000 entries. And further suppose that the space consumed on your disk for said directory is 1.2 GB and made up of 114 database (*.jdb) files. Suppose that you didn’t plan correctly and you are now running out of space on your hard drive. What should you do? Run to your local System Administrator and beg for him to increase the size of your partition? Before promising to buy him lunch for the next year or offering your first born child to mow his lawn, look to see if you actually need that much space in the first place.

In general, the size of your database is based on three things:

The number of entries in your database
The size of an average entry
Your indexing strategy

The first two items are relatively straight forward as you probably have a good idea of your data profile, but an improper indexing strategy can take you by surprise and may actually cause more harm than good. Indexes are used to increase search performance based on application search filters. Lack of necessary indexes can impact performance, increase aggravation, and lead to calls in the middle of the night. But maintaining indexes that are never used can unnecessarily increase disk space and impact the performance of write operations.

Default Indexes

OpenDJ comes with the following default indexes:

Attribute	Presence	Equality	Substring	Order
aci	x				operational
cn		x	x		standard
ds-sync-conflict		x			operational
ds-sync-hist				x	operational
entryUUID		x			operational
givenName		x	x		standard
mail		x	x		standard
member		x			standard
objectclass		x			standard
sn		x	x		standard
telephonenumber		x	x		standard
uid		x			standard
uniquemember		x			standard

Indexes on operational attributes are necessary to make OpenDJ run efficiently. You should never modify these unless instructed to do so by ForgeRock support. Standard attributes, however, are used to increase external application search performance and should reflect the types of searches being performed by your own applications. Default attributes (and index types) are based on ForgeRock’s observations of what most of its customers use, but you may not be like most of their customers and while maintaining some index types can be relatively benign, others (like SUBSTRING) may have a more dramatic effect.

Using Indexes to Increase Search Performance

From a high level perspective, indexes are used to identify likely candidates that might be found as a result of an application’s search filter. Assume, for instance, that you have a simple phone book application that allows you to search for phone numbers based on first name and last name. A filter to locate all entries that have a first name (givenname) of “Bill” would be:

(givenname=Bill)

But not every entry in your OpenDJ server has a givenname attribute with a value of “Bill” so looking at every entry to see if it matches may take a lot of time. But how can you avoid looking at every entry? The answer is simple; you create an index for the givenname attribute to narrow down your search. Simply add an EQUALITY index for the givenname attribute and OpenDJ will associate all entries in its database with those that have a particular value. The following is a conceptual representation of how OpenDJ will make this association:

givenname=Bill: 1,3,9,22 givenname=Ralph: 2,11 givenname=Wally: 4,5,6,7,8,10 givenname=Wild Bill: 12 givenname=Billy: 13,15,21 givenname=Silly: 14 ….

This demonstrates that the givenname value for entries 1, 3, 9, and 22 are all “Bill”. When OpenDJ receives a search for all entries that have a first name of “Bill”, it immediately knows that a match is found in records 1, 3, 9 and 22. It doesn’t even look at the other entries. In a database that contains hundreds of thousands of entries, this can drastically increase search performance.

This is all well and fine, but how can indexes actually impact us?

How Unnecessary Indexes Can Hurt You

Imagine that you met a coworker at a party last night and you didn’t quite get his name. You seem to remember his name was Bill, but you heard people call him Bill, Billy, Wild Bill, and even Bill-O-Rama. You want to look him up, but you can’t because you really aren’t sure about his first name. Hopefully your same phone book application allows you to search for all entries that contain the string, “Bill”. If so, an EQUALITY index would not work as you really don’t know the specifics of what you are looking for. In this case you would create a SUBSTRING index for the givenname attribute. In so doing, OpenDJ will associate substrings with entries as follows:

givenname=*Bill: 1,3,9,12,13,15,21,22 givenname=*illy: 13,14,15,21 givenname=*Wild: 12 givenname=*ild : 12 givenname=*ld B: 12 givenname=*d Bi: 12 givenname=*Ralp: 2,11 givenname=*alph: 2,11 givenname=*ally: 4,5,6,7,8,10 ….

Note: OpenDJ created entries for substrings consisting of four or more characters. These include the beginning of string (^) and end of string ($) characters; the shorter the string, the fewer entries that are created. Imagine how many entries would be generated if the attribute contained a value of ‘supercalifragilisticexpialidocious’!

There are times when maintaining indexes may actually be more costly than if you were to perform an unindexed search (i.e. evaluate every entry in the directory server). To prevent this, OpenDJ provides the ds-cfg-index-entry-limit configuration parameter that allows you to define an upper limit on the number of indexes maintained for an attribute. There is a global (default) value of 4000 for this parameter, but it may also be configured on a per indexed attribute basis. A value of 4000 means that OpenDJ will stop generating index values once it reaches 4000 index entries. A minor problem is that you can maintain up to 4000 index entries for attributes that are never included in a search filter. A bigger problem, however, is that each time a write operation is performed that includes the indexed attribute, the indexes for that attribute are rebuilt. If your OpenDJ server is subject to extensive write operations, then you may be constantly writing and rewriting your database files which may impact write performance and ultimately overall server performance. (See “Unlocking the Mystery behind the OpenDJ User Database” for more information on how, when, and why the database files are rewritten on change operations.)

Determining Whether an Index is Necessary or Not

A recommendation is to maintain only those indexes for attributes that are included in your application search filters. The types of indexes selected should reflect the manner of searches being performed by your application. To determine this, you can review your LDAP-enabled applications and attempt to ascertain the types of filters it may be producing; but this may not be so obvious.

A more realistic approach is to come up with a “best guess” and then monitor your server to see if your guess was accurate or not. You can then add, delete, or modify attribute indexes based on your findings.

When to Add Indexes

You should monitor your access logs for searches that take a long time and consider adding indexes for search times that you find unacceptable. This can be seen in the etime (or elapsed time) value which is displayed in milliseconds (by default). This is subject to your own SLAs, but etimes greater than 5 milliseconds may be considered unacceptable. If you see etimes in the order of seconds (as shown below) then you definitely need to investigate further.

[31/Dec/2013:18:07:21 +0000] SEARCH RES conn=2231288 op=6 msgID=502 result=0 nentries=1 unindexed etime=5836

This access log entry indicates that the search took 5.8 seconds to complete. One reason why it took so long was that it was an unindexed search (as noted by the “unindexed” tag in the entry). To determine the filter associated with this search, you need to search backwards in the access log and find the corresponding SEARCH REQ for this connection (conn=2231288) and this operation (op=6).

[31/Dec/2013:18:07:15 +0000] SEARCH REQ conn=2231288 op=6 msgID=502 base="ou=people,dc=example,dc=com" scope=wholeSubtree filter="(&(&(exampleGUID=88291000818)(objectclass=inetorgperson)))" attrs="*"

This access log entry indicates that the filter used to perform the search is

(&(&(exampleGUID=88291000818)(objectclass=inetorgperson)))

OpenDJ contains a default EQUALITY index for objectclass so assuming that you have not modified the default indexes, then the unindexed attribute causing the problem is exampleGUID. Now that you have identified the culprit, should you run right out and create an EQUALITY index for this attribute? Not necessarily. It really depends on how often you see searches of this type appear in the access logs and what their impact might be. You don’t want to maintain exampleGUID indexes if your application only searches on this attribute once in a blue moon. If, however, you see this type of search on a consistent basis, you might want to consider adding an index.

When to Remove Unnecessary Indexes

It is relatively straightforward to determine when to add indexes, but how do you know when you are maintaining unnecessary indexes? Unfortunately, OpenDJ does not include utilities to tell you this, but it is possible to determine unused indexes by once again, reviewing the search filters in the access logs. One approach to accomplishing this would be to perform the following:

Determine attribute names included in the search filter.
Determine type of search being performed (EQUALITY, SUBSTRING, PRESENCE, etc.)
Determine the frequency of the searches.
Compare the searches to the already configured indexes.
Remove unnecessary indexes (if desired).

It is pretty easy to write a script to perform these steps and fortunately one has already been written by Chris Ridd to perform steps 1 through 4. His topfilters script can be found here. Once armed with the information from his script you would simply compare it to what you already have configured for OpenDJ.

How to Determine Current Indexes and Index Types

Current indexes are reflected beneath the cn=config suffix of your OpenDJ server. You can either query this suffix as the rootDN user or you can simply view the contents of the config.ldif file to see what indexes have been configured.

dn: ds-cfg-attribute=givenName,cn=Index,ds-cfg-backend-id=userRoot,cn=Backends,cn=config objectClass: top objectClass: ds-cfg-local-db-index ds-cfg-index-type: equality ds-cfg-index-type: substring ds-cfg-attribute: givenName

Another method is to use the dbtest command to obtain a more detailed analysis on each index. The dbtest command can be found in the bin directory of your OpenDJ installation. An example execution of this command might be:

/opt/opendj/bin/dbtest list-index-status -b "dc=example,dc=com" -n userRoot

Execution of this command will return each index, its type, the database it is associated with, whether the index is valid or not, and the number of records associated with the index. It will also detail the undefined index keys that are not maintained due to the ds-cfg-index-entry-limit being reached for that attribute.

You can take the data returned from Chris’ script, compare it with the data found for those indexes you are currently maintaining and make an intelligent decision as to whether you want to modify your indexes in any way.

Should you delete any indexes that you believe are not being used? Again, not necessarily. Your access logs only reflect a point in time and may not provide a comprehensive listing of application search filters. You should always carefully consider removing existing indexes, but if you find that you have made a mistake, you can always monitor the access log for searches that are taking an unacceptably long time – or wait for that 3:00 am phone call to let you know.

Configuring Indexes

If you do decide it is necessary to update your indexes, then the best approach is to do so using the OpenDJ Control Panel or the dsconfig command line tool. You should never update the config.ldif file directly.

The following provides an overview of how to add a new index for the exampleGUID attribute. The index type is set to EQUALITY.

/opt/opendj/bin/dsconfig create-local-db-index --port 4444 --hostname ldap1.example.com --bindDN "cn=Directory Manager" --bindPassword password--backend-name userRoot --index-name exampleGUID --set index-type:equality --trustAll

The following provides an overview of how to remove an existing EQUALITY index type from an existing mail index.

/opt/opendj/bin/dsconfig set-local-db-index-prop --port 4444 --hostname ldap1.example.com --bindDN "cn=Directory Manager" --bindPassword password --backend-name userRoot --index-name mail --remove index-type:equality --trustAll

If you would rather remove the entire mail index, use the following command, instead.

/opt/opendj/bin/dsconfig delete-local-db-index --port 4444 --hostname ldap1.example.com --bindDN "cn=Directory Manager" --bindPassword password --backend-name userRoot --index-name mail --trustAll

Rebuilding Indexes

OpenDJ automatically updates indexes on LDAP operations that update the database. Adding or deleting an index or an index value is a configuration change, however, and does not affect index values already in the database. If you delete an index type, existing index values will remain in the database until you rebuild the index. The same is true if you add a new index or index type. Indexes will not be added for existing database entries until you rebuild the index.

As such, any configuration changes to indexes should be followed by a rebuilding of the appropriate index. The following provides an overview of how to rebuild the mail index once its configuration has changed.

/opt/opendj/bin/rebuild-index -p 4444 -D "cn=Directory Manager" -w password -b "dc=example,dc=com" --index mail --start 0 --trustAll

Note: It is not necessary to stop the OpenDJ instance before performing this task. It has been my experience, however, that if you are able to stop the server you might want to consider doing so. If so, then you do not need to specify a start time, bind credentials, or the trust acceptance. These are not necessary as you will be initiating the connection immediately and directly to the database.

Debugging Index Problems

There are times when you may see performance problems that indicate that you are performing an unindexed search, but when you look at the indexes, you find that the appropriate index has been configured.

Note: This problem typically occurs when you do not rebuild the index after you have configured it. Essentially, there is already data in the database when the indexes were applied. In such cases, OpenDJ will not attempt to update the index until an initial rebuild-index has been performed.

One method of debugging this problem is to use the debugsearchindex capability in OpenDJ.

If you perform your search and request that the debugsearchindex attribute be returned as follows:

/opt/opendj/bin/ldapsearch -D "cn=Directory Manager" -w cangetin -b "ou=people,dc=example,dc=com" -s sub "(&(&( exampleGUID=88291000818)(objectclass=inetorgperson)))" debugsearchindex

OpenDJ will emulate the search, but will not actually perform it against the database. Instead, it will tell you how the search is to be performed and whether or not the values are indexed or not as follows:

dn: cn=debugsearch debugsearchindex: filter=(&(&( exampleGUID=88291000818)[NOT-INDEXED](object Class=inetorgperson)[INDEX:objectClass.equality][LIMIT-EXCEEDED])[NOT-INDEXED]) [NOT-INDEXED] scope=wholeSubtree[LIMIT-EXCEEDED:30] final=[NOT-INDEXED]

If you see something like this but your configuration tells you that the indexes have been configured, then it is time to send your LDAP administrator to training.

Summary

As with most middleware products knowing when and how to configure indexes can be as much of an art as it is a science. You should follow best practices where possible, but as with other products you should monitor your server to see if those practices apply to you and react where appropriate.

Categories: Directory Server, OpenDJ Tags: ForgeRock, OpenDJ

What do OpenDJ and McDonald’s Have in Common?

August 8, 2012 idmdude Leave a comment

The OpenDJ directory server is highly scalable and can process all sorts of requests from different types of clients over various protocols. The following diagram provides an overview of how OpenDJ processes these requests. (See The OpenDJ Architecture for a more detailed description of each component.)

Note: The following information has been taken from ForgeRock’s OpenDJ Administration, Maintenance and Tuning Class and has been used with the permission of ForgeRock.

Client requests are accepted and processed by an appropriate Connection Handler. The Connection Handler decodes the request according to the protocol (LDAP, JMX, SNMP, etc.) and either responds immediately or converts it into an LDAP Operation Object that is added to the Work Queue.

Analogy: I like to use the analogy of the drive-through window at a fast food restaurant when describing this process. You are the client making a request of the establishment. The Connection Handler is the person who takes your order; they take your request and enter it into their ordering system (the Work Queue). They do not prepare your food; their jobs are simply to take the order as quickly and efficiently as possible.

Worker Threads monitor and detect items on the Work Queue and respond by processing them in a first in, first out fashion. Requests may be routed or filtered based on the server configuration and then possibly transformed before the appropriate backend is selected.

Analogy: Continuing with the fast food analogy, the Worker Threads are similar to the people who prepare your food. They monitor the order system (Work Queue) for any new orders and process them in a first in, first out fashion.

Note: OpenDJ routing is currently limited to the server’s determination of the appropriate backend. In future versions, this may take on more of a proxy or virtual directory type of implementation.

The result is returned to the client by the Worker Threads using the callback method specified by the Connection Handler.

Analogy: Once your order is completed, the food (or the results of your request) is given to you by one of the Worker Threads who has been tasked with that responsibility. This is the only place where the analogy somewhat breaks down. In older fast food restaurants (ones with only one window) this may sometimes be the person who took your order in the first place. In our analogy, however, the Connection Handler never responds to your request. This model is more closely attuned to more recent fast food establishments where they have two windows and there is a clear delineation of duties between the order taker (Connection Handler) and the one who provides you with your food (the Worker Thread).

Other services such as access control processing (ACIs), Logging, and Monitoring provide different access points within the request processing flow and are used to control, audit, and monitor how the requests are processed.

So, what do OpenDJ and McDonald’s have in common? They are both highly efficient entities that have been streamlined to process requests in the most efficient manner possible.

Check out ForgeRock’s website for more information on OpenDJ or click here if you are interested in attending one of ForgeRock’s upcoming training classes.

Categories: Directory Server, OpenDJ Tags: ForgeRock, LDAP, Lightweight Directory Access Protocol, OpenDJ, Tuning Class

The OpenDJ Architecture

July 23, 2012 idmdude 4 comments

An understanding of the components that make up the OpenDJ Architecture is useful for administering, configuring, or troubleshooting the OpenDJ server.

The following information has been taken from ForgeRock’s OpenDJ Administration, Maintenance and Tuning Class and has been used with the permission of ForgeRock.

The OpenDJ server has been developed using a modular architecture in which most or all components are written to a well-defined specification. This image above provides an overview of these components. The following sections provide a brief description of some of the more prevalent components shown in this image.

Configuration Handler

The OpenDJ Configuration Handler is responsible for managing configuration information within OpenDJ’s configuration files (i.e. config.ldif). Configuration information may impact one or more components; as such, the Configuration Handler is responsible for notifying appropriate components when a configuration change occurs.

Connection Handlers

Connection and request handlers manage all interaction with LDAP clients. This includes accepting new connections and reading and responding to client. Connection handlers are responsible for any special processing that might be required for this communication, including managing encryption or performing protocol translation. It is possible to have multiple concurrent implementations active at any given time and as such, OpenDJ includes connection handlers which support various forms of communication that clients use to interact with the server (JMX, LDAP, LDAPS, LDIF, SNMP). Administrators have the ability to enable or disable these connection handlers to support their client environment.

Note: ForgeRock is currently working on REST and JSON interfaces to provide direct access to directory server data.

Backend Databases

Connection handlers place client requests onto OpenDJ’s Work Queue. Worker threads detect requests placed on the work queue and are responsible for performing the processing necessary to respond to the request. Today’s directory servers must be able to handle a tremendous number of requests in a short period of time; as such, OpenDJ’s Work Queue has been built to be both highly efficient and provide high performance.

A backend database serves as a repository for searching, retrieving, and storing directory data. OpenDJ supports multiple backends including those considered typical databases (such as Oracle, MySql, and Berekely DB) as well as file-based and memory-based backends. There can be multiple backend databases active at any given time, each of which handle mutually exclusive subsets of data (selection of the appropriate database is based on the root suffix specified in the operation). OpenDJ facilitates interaction with these backends and provides tools for enabling, disabling, creating, removing, backing up, and restoring the databases independently from each other without impacting other backends.

Note: Backends may consist of local or remote repositories (i.e. the database is stored on a remote machine). This can be found in cases where the backend interacts with a proxy or a virtual server. Support for proxy and virtual server backends are scheduled for a future release.

Loggers

OpenDJ has a robust logging capability that allows server information to be retained in various repositories. The most common loggers are as follows:

Access Logger – stores server operations (binds, searches, modifications, etc.)
Error Logger – stores warnings, errors, and significant events that occur with the server
Debug Logger – records debug information when the server is run with debugging enabled and Java assertions are active.

Multiple loggers can be configured for each of these and each logger may be actively storing different information (filtered or not) in different formats in different repositories.

Note: Some error loggers can be used as an alerting mechanism to actively notify administrators of potential problems.

SASL Handlers

The LDAP protocol supports two methods that clients may use to authenticate to the server:

LDAP simple authentication
Simple Authentication and Security Layer (SASL)

SASL is an authentication framework that supports multiple authentication mechanisms including ANONYMOUS, CRAM-MD5, DIGEST-MD5, EXTERNAL, GSSAPI, and PLAIN.

OpenDJ includes a set of handlers that implement each of these SASL mechanisms in order to determine the identity of the client.

Access Control

OpenDJ contains an access control module that is used to determine if a client is permitted to perform a particular request or not.

Password Storage

OpenDJ includes several password storage modules that can be used to obscure user passwords using a reversible or one-way algorithm. Password storage schemes encode new passwords provided by users so that they are stored in an encoded manner. This makes it difficult or impossible for someone to determine the clear-text passwords from the encoded values. They can also be used to determine whether a clear-text password provided by a client matches the encoded value stored in the server.

Password Complexity

OpenDJ includes a series of modules that define logic used to determine whether a user’s password meets minimum requirements or not.

Syntax and Matching Rules

Attributes must follow a particular syntax and search filters determine matches based on a set of matching rules. OpenDJ contains a set of syntaxes and matching rules that define the logic for dealing with different kinds of attributes.

Database Cache

Interacting with data in memory is much faster than interacting with data on disk. As such, OpenDJ includes a database caching module that loads directory data into memory.

Check out ForgeRock’s website for more information on OpenDJ or click here if you are interested in attending one of ForgeRock’s upcoming training classes.