Moving to Relational model

An LDAP object may contain both single and multi valued attributes, the example below demonstrates a group object, which contains several single valued-attributes (cn, disinghuishedName, objectGUID, name) and a multi-valued attribute member. When transforming LDAP data into relational model, we will often need to separate single-valued data streams from multi-valued, the latter forming their own tables (see example below, where member attribute data is transformed into members table, and linked via a foreign key back to the group object. The disinguishedName attribute, which is guaranteed to be unique within an LDAP directory, plays the role of a glue, which links the multi-valued entities back to the primary object.

image image

 

In the next two sections we will explore the potential approaches of accomplishing this task.

Single Source with Conditional Split Approach

A single LDAP query may be configured to bring back both single and multi-valued attributes. The example below shows that a combination of single and multi valued attributes is included in the AttributesToLoad property of the LDAP source component.

image

 

image 

This configuration results in the output flow containing both attribute types. Note that each value in a multi-valued attribute receives its own row, and that each such row also contains the distinghuishedName value of the object in which the attribute resides. This allows us to establish the relationship between the object and its multi-valued attributes. Contrary to the multi-valued attributes, all single-valued attributes are placed in a single row.

 image

Typically, our next task is to separate the flow into multiple streams, where multi-valued attributes are routed to separate tables. In our example, all rows where the value of the member attribute is not NULL will be sent to a separate output, and consequently placed into the members table. The rows, which do not meet this criteria, will be assumed to be the rows containing the single-valued attribute and routed accordingly (group table).

image

The main disadvantage of this approach is the inefficient use of SSIS server memory (RAM), since the majority of the cells of the output will contain NULLs. In our example, each row representing a member value will contain 3 NULLs. If the number of multi-valued values per attribute is large, this may result in significant waste of memory resources.

Multiple Sources Approach

In this approach we will isolate the multi-valued attributes into their own queries, which in turn results in multiple LDAP Source Components. In other words, we will construct two LDAP queries, one bringing back all single-valued attributes and another one returning the member data. Of course, the second query also needs to include the distinghuishedName of the object, so that we can correlate the results of the two queries.

image

image

 

image

Unfortunately, we also need to attach a Conditional Split Transformation to the output of the Member Query component, since its output will also include a row per group object with NULL in the member column. This is the side-effect of including the single-valued distinghuishedName attribute in the query. We need this attribute to link the multi-valued data back to the object, but the source component is designed to produce a single row per object for all its single-valued attributes. Hence the need to filter-out such rows in this particular scenario.

image

As you can see, this approach accomplishes a significant reduction in NULL cells in the output, at the expense of a somewhat more complex data flow design (more components to configure and manage).

image

Last edited Feb 28, 2013 at 7:07 PM by atcherni, version 2

Comments

No comments yet.