I encounter a fair number of AD implementations as part of my work. Some are good, some bad and some just plain ugly. Here’s a more or less random collection of bad habits that I see quite regularly and some tips on how to avoid and/or kick them.
1. Poor or missing Active Directory monitoring
A number of organisations rely on monitoring Domain Controllers simply as servers. They will monitor things such as CPU, memory, disk utilisation, disk space, etc., but not AD as a service. If something goes bad within AD it might not be picked up by standard server monitoring and alerting. You need to ensure that all AD services are available and healthy. This involves monitoring items such as LDAP and GC port availability and response times, forest synchronisation with an authoritative time source, correctly published DNS SRV records, replication working, SYSVOL healthy, etc.
Implementing a monitoring and alerting solution for your AD service will allow problems to be detected and resolved early, rather than firefighting after the event has happened.
In addition to Microsoft’s Operations Manager Management Pack for AD, there are a number of 3rd party AD monitoring solutions. NetPro’s DirectoryAnalyzer is one of the more comprehensive.
2. Bad delegation
AD offers the ability to implement a granular delegation to suit environments of all sizes. Why is it then that so many organisations end up with little or no delegation and security model? For example, I regularly see environments that have 20 or more accounts in the Domain Admins group. This appears to be because it is seen as too difficult and/or time consuming to configure the appropriate delegation. Once an account is put into a privileged group there appears to be reluctance to remove it “in case it breaks something”. Here are some general tips around delegation.
- Document your delegation model. Implement it, enforce it and monitor it.
- Separate standard user accounts from administrative accounts. Only allow administrative accounts to be members of privileged groups.
- Don’t allow service accounts to be members of the highly privileged groups (e.g. Domain Admins, Schema Admins, Enterprise Admins and built-in Administrators). If the documentation from a vendor says that this membership is required the information is probably wrong. 99% of the time there is a way to delegate without making the account a member of a privileged group.
- Apply the principle of least privilege. Give accounts the permissions they need to perform their tasks and no more.
- Keep the Schema Admins and Enterprise Admins groups empty. Only populate these groups temporarily when required for a specific task.
- Don’t mess with the built-in Administrators group. Leave it alone.
- Keep the membership of Domain Admins to a low number (should be no more than 5 trusted individuals, even in large environments).
3. Abuse of the Default Domain Policy
I have seen a number of environments in which the Default Domain Policy and the Default Domain Controllers Policy are heavily used. It is considered a best practice to leave the Default Domain Policy and the Default Domain Controllers Policy untouched and to create new GPOs linked at the Domain and Domain Controllers OU to hold your required settings. The reason for this is that if the Default policies become corrupt and you have no good backups you at least have the option of restoring the defaults using DCGPOFIX.
4. No formal object lifecycle management
I often encounter environments that have little or no formal process for AD object provisioning, re-provisioning and deprovisioning. Amongst other issues, this can lead to a large number of inactive/unused accounts and other objects in the directory. Often the problem is only addressed during a migration or upgrade. The clean-up can be time-consuming, difficult and expensive. Try to associate each newly provisioned object with a human owner (guardian). This will help when making changes in your environment and when you need to remove inactive or unused objects from your directory.
5. No representative staging environment
When making changes to your AD environment, especially schema changes, it is important to have a representative staging environment. This will reduce the overall risk when making the change in your production environment. To make the environment representative, try to make sure at least the following items are the same in both environments:
- Schema extensions
- Domain Controller service pack and patch levels
- Domain and forest functional levels
- Number of domains
- GC availability
- FSMO role distribution
6. No tracking of schema changes
There is nothing built-in to AD that will keep track of what changes have been made to the default schema. Quite often I see environments in which the administrators have no idea what changes have been made to the schema. This can lead to risk and uncertainty when making future changes. If you have a formal change management system in place in your organisation, ensure that schema changes are included and fully documented. Try to maintain copies of the LDIF files that are used for the schema extensions, These are useful for preparing test environments as well as being self-documenting.
Even if you do have a formal change management system in place, consider keeping a separate change log somewhere inside your AD environment (e.g. in SYSVOL). Change management systems may come and go, but your AD infrastructure could be in place for 20 years or more.
7. Missing forest recovery plan
Given the importance of AD to most organisations, I am constantly amazed at how many have no forest recovery plan. When challenged on this, most just point to off-site DCs as an indication of the redundancy they have. But what if you lose forest-wide functionality? Microsoft’s excellent whitepaper on forest recovery lists the following failure/horror scenarios that might require a forest recovery:
- None of the domain controllers can replicate with its replication partner.
- Changes cannot be made to Active Directory at any domain controller.
- New domain controllers cannot be installed in any domain.
- All domain controllers have been logically corrupted or physically damaged to a point that business continuity is impossible (for instance, all business applications that depend on Active Directory are non-functional).
- A rogue administrator has compromised the Active Directory environment.
- An adversary intentionally or an administrator accidentally runs a script that spreads data corruption across the Active Directory forest.
- An adversary intentionally or an administrator accidentally extends the Active Directory schema with malicious or conflicting changes.
The whitepaper offers guidelines for building your own forest recovery plan and provides a sample roadmap for the recovery steps involved. Microsoft also recommends that you test your forest recovery at least once per year.
8. Missing subnet registrations
In a number of environments I have seen, AD subnets are registered and associated with their corresponding AD site when the infrastructure is first put in place. Subnets introduced afterwards are not always registered. When subnets are not registered, clients on those subnets will not find an in-site DC and/or GC to use, which can lead to slow responses and unnecessary bandwidth utilisation.
DCs detect connections from clients on unregistered subnets and log the information in the Directory Service event log (Event 5807). The DC also commits the information into the %windir%\debug\netlogon.log. You should regularly monitor your DCs for missing subnets and register them as required.
9. No auditing of Directory Service Access events
If someone deletes an entire OU tree in your domain, you are very likely going to want to know who (or at least which account) was used to perform the deletion. That information will be captured in the security event log of the DC where the change was made, as long as auditing is enabled for the DCs via Group Policy and turned on in the appropriate SACLs of the objects within the directory. Quite often I see that either one or both of these two steps are missing.
I recommend defining and documenting an audit policy for your AD environment and then implementing the policy. Each environment will have different auditing requirements based on the type of organisation that it is, so it is important not to simply accept the default configuration.
10. No event log consolidation
This is linked to the previous entry. There is no point implementing an audit policy if you then subsequently lose the information you need simply because the events have been overwritten in the security event log. Microsoft doesn’t provide a built-in mechanism for consolidation of audit and other event log information. They do however include an Audit Collection System as part of Operations Manager. A number of 3rd parties offer similar solutions that provide a centralised, consolidated view of event information. These systems have the advantage of storing the events more efficiently for much longer periods of time and allowing faster event searches. If the information is important to you (as it is for most organisations) then consider putting the money and resources aside to implement such a system.
Nice list, for me the bad delegation is the worst. I’ve seen 50 domain admins and people get offended if you try and take their rights away. I do wish Microsoft made delegation easier, it shouldn’t take 3rd party tools to make it pain free.