Quick Tip: Filter on IP in Network Monitor 3.4

April 16, 2014, 8:35 am

≫ Next: Direct Access: No Security Associations

≪ Previous: Windows 7 and the RemoteApp Connection URL Issue

[Update 2014-04-17] Thanks to Steve’s comment I learned that the HEX notation is absolutely not a must. You can just use the IP address but unlike simple filters like Destination or Source you must not use quotes around the IP! Using quotes for the IP will give you a valid filter but no matches will be found. So there’s absolutely no benefit in using the HEX notation. Makes by post a bit useless, but at least I learned something out of it!

In the past I often used Wireshark to debug all kinds of issues. The last year I’ve been using Microsoft Network Monitor 3.4 more and more. I don’t think Network Monitor is better or worse than Wireshark, but Network Monitor has the capabilities to use a trace file generated by the built-in tracing engine of Windows ( See Network Tracing Awesomeness) That means I don’t have to install Wireshark allover the place!

Small side note: Network Monitor 3.4 has been out for a while and I’ve often wondered when a newer version would be released. I wasn’t missing any particular features, but hey I’m an IT guy, I like new stuff! And today I happen to stumble upon this: http://blogs.technet.com/b/messageanalyzer/ Seems like Microsoft does have a successor: Message Analyzer. I installed it an did some quick tests and it seems like there’s a lot of fancy stuff in there! Using it will take some time to learn though. Some fancy features: graphs (e.g. SMB performance) and remote live capturing!!! Awesome!

Either way, back to my good old Network Monitor. One of the things I often do is blindly capture everything and then I try to filter the data that is displayed. One thing I often require is to ensure only traffic with a particular host is displayed. Typically I did this by adding a filter which is of the format “Source == IP and Destination == IP”. This does work mostly, but sometimes it doesn’t when the IP is translated to the actual DNS name. Besides that, it always bothered me I had to type such a long filter.

Next to “Source” and “Destination” there’s also IPv4.SourceAddress, IPv4.DestinationAddress and IPv4.Address. I tried those in the past a few time: e.g. IPv4.Address == “10.6.69.121” and although the filter is accepted, no traffic is shown. Today I accidentally found out that those IPv4.XYZ filters expect an IP in HEX format!

If you got some captured data and you want to filter you can simply drill down the IP packet information, right-click sourceAddress and choose add to display filter. That will give you the hex notation.

Now this works perfectly if you want to do it for a display filter. But what if you want to limit the amount of data captured by using a capture filter?

Sample filter:

Well you can just use a simple IP to HEX converter site like this: http://www.miniwebtool.com/ip-address-to-hex-converter/?ip=10.6.69.121

Or you can use the following PowerShell oneliner:

"$(([Net.IPAddress]"10.6.69.121").GetAddressBytes() | ForEach-Object { '{0:x2}' -f $_ })" -Replace '\s'

The PowerShell oneliner will require you to add an x after the first 0 though.

Happy tracing!

↧

Direct Access: No Security Associations

August 8, 2014, 12:06 am

≫ Next: Static Host Records Disappearing From DNS

≪ Previous: Quick Tip: Filter on IP in Network Monitor 3.4

I’ve been working on a Direct Access deployment for quite some time now. The clients are Windows 7 SP1 and the DA servers are based on Windows Server 2012. We hand out DA capabilities using a GPO that is scoped to a Windows Active Directory group. Once the computer is in the group, both the DA settings are configured (GPO) and the client requests a computer certificate (auto-enrollment). From this point on, most of the clients have DA connectivity immediately.

Ironically, while creating a document for the helpdesk to be able to resolve basic DA connectivity issues, I ran into the following issue:

netsh dns show state

DA is configured and is enabled as to be expected…

ipconfig

Our IPHTTPS interface has a valid IP Address. I can even ping the IPv6 address of the DA DNS service (determined by netsh namespace show policy)…

netsh int httpstunnel show int

The IPHTTPS interface didn’t show any errors…

netsh advf monitor show mmsa

netsh advf monitor show qmsa

Ahah! No SAs match the specified criteria. So no security associations were being made. But… Why?!

After some googling I stumbled upon some threads where they asked is the IKE and AuthIP IPsec Keying Modules service running?

And indeed, on my faulty client this service was stopped and had a startup type of manual. Starting the service just once seemed to immediately reconfigure the startup type to automatic as well. And in a matter of seconds the DA connectivity seemed fine. Security Associations were now successfully created:

In the days after more and more newly assigned clients seemed to suffer this phenomena. So, as a workaround, I configured the IKE and AuthIP IPsec Keying Modules service with an automatic startup type using GPO:

And the actual setting:

If anyone has a an explanation as to why this suddenly stopped working I’d be happy to hear it. Perhaps some patch was released who “broke” the DA/IPSec stuff to get the service running?

↧

Static Host Records Disappearing From DNS

August 12, 2014, 11:58 pm

≫ Next: SCOM 2012 R2: Web Portal: 503 The Service is Unavailable

≪ Previous: Direct Access: No Security Associations

Somewhere in the past year I started writing the stuff below. I had a specific DNS issue I was looking into. Sadly I never found the real solution, but I found the troubleshooting information interesting enough to save it for future use. The case: there are multiple servers which have one network interface but have multiple IP addresses on them, typically web servers. We prefer them to only register their “primary” IP address in DNS. In order to achieve this we uncheck “Register this connection’s addresses in DNS” and create a static A (and PTR) record for the hostname and the primary IP.

However we are seeing that some of these records seem to disappear after a while.Here’s someone with the same problem: Serverfault.com: Disabling DNS registration on Server 2008 R2

In the end I was able to reproduce this for a given machine:

Enable DNS Client Events logging:

Enable DNS Debug Logging:

And in order to reproduce I made sure both the A and PTR records were gone. Typically A record was disappearing, but PTR remained in place. So I made sure to manually delete the PTR record.

Then we go ahead and create our static record:

And the resulting record:

So in theory, even if scavenging were enabled, it shouldn’t affect this record.

Now we fast forward in time. To be more precise, I found out that the deletion for this specific record was occurring every hour. Using repadmin it’s pretty easy to see when exactly the deletion occurred:

And with AD Auditing enabled:

For more on enabling auditing: Acefekay: DNS Records disappearing and DNS auditing.aspx

So for example at 16:10:22 the DNS debug log shows us the following:

I highlighted the TKEY query as this made me suspect that Dynamic Updates were involved. TKEY is the negotiation for secure updates if I’m not mistaken. You can read more about that there: Technet: DNS Processes and Interactions (Example: How dynamic update works)

On the server I could also see some related events appearing in the DNS Client Events log:

So Dynamic Updates seemed to be removing our statically created records, but it wasn’t active. How is that possible?!

For starters I stumbled upon this KB article: KB2520155: DNS Host record of a computer is deleted after you change the DNS server assignment But that didn’t seem to be applicable as we weren’t touching DNS server settings in any way.

As indicated in the beginning of the explanation there was a post with same problem: Serverfault.com: Disabling DNS registration on server 2008 R2 The solutions provided didn’t seem interesting to me. I tried the netsh command, but it was already set to “none”. Setting a registry key seemed to be a lot of hassle for something which should work right away. I was looking for either a hotfix or a proper way to configure it.

And then I stumbled upon this: Social Technet: Single machine will register all IP addresses in DNS - want to register only one

This led to the following KB articles:

· Vista/ Windows 2008: KB975808: All IP addresses are registered on the DNS servers when the IP addresses are assigned to one network adapter on a computer that is running Windows Server 2008 SP2 or Windows Vista SP2

· Win 7/ Windows 2008 R2: KB2386184: IP addresses are still registered on the DNS servers even if the IP addresses are not used for outgoing traffic on a computer that is running Windows 7 or Windows Server 2008 R2

As far as Win7/ Windows 2008 R2 is included, this hotfix is included in SP1.

After you install this hotfix, you can assign IP addresses that will not be registered for outgoing traffic on the DNS servers by using a new flag of the netsh command. This new flag is the skipassource flag.

>> This makes me wonder if this only affects DNS registration or also “outgoing TCP/IP traffic”: e.g. can we assume that all outgoing traffic will use the primary IP? This would be useful in firewall scenario’s. From what I read here (Technet: Set-NetIPAddress> -SkipAsSource) I think it does.

Sidenote: IP Address selection for outgoing traffic (Blogs.technet.com: Source IP address selection on a Multi-Homed Windows Computer )

The server will use the 192.168.1.68 address because it has the longest matching prefix.

To see this more clearly, consider the IP addresses in binary:

11000000 10101000 00000001 00001110 = 192.168.1.14 (Bits matching the gateway = 25)

11000000 10101000 00000001 01000100 = 192.168.1.68 (Bits matching the gateway = 26)

11000000 10101000 00000001 01111111 = 192.168.1.127 The 192.168.1.68 address has more matching high order bits with the gateway address 192.168.1.127. Therefore, it is used for off-link communication.

In order to use SkipAsSource we have to add additional address from the command line:

· Netsh int ipv4 add address <Interface Name> <ip address> <netmask> skipassource=true

In order to verify this we can execute the following command:

· Netsh int ipv4 show ipaddresses level=verbose

Important remark: there’s also a hotfix for this “feature”:KB2554859: The "skipassource" flag of IP addresses is cleared after you use the GUI to change IP settings of a network adapter in Windows 7 or in Windows Server 2008 R2It seems that if you use the GUI to modify the “Register this connection’s addresses in DNS” setting it will actually clear the skipassource flag! This hotfix is NOT included in SP1.

Without knowing this already seems to be active for some servers! In fact it seems that Windows Failover Clustering uses this to avoid the “VIPs” to be registered as the hostname.

Contrary to most of my other blog posts this one isn’t as polished or a nice wrap-up, but still to me it has some valuable bits of information regarding DNS troubleshooting as such it goes into my personal archive.

↧

SCOM 2012 R2: Web Portal: 503 The Service is Unavailable

August 13, 2014, 12:01 am

≫ Next: Failover Cluster: Generic Applications Fail with OutOfMemoryException

≪ Previous: Static Host Records Disappearing From DNS

The other day one of my customers mentioned that their SCOM Web Portal has been broken for a while now. As I like digging into web application issues I took a quick look. Here’s what I came up with. It seems that the portal itself was loading fine, but viewing All Alerts or Active Alerts showed a Service Unavailable (“HTTP Error 503: The service is unavailable”).

One of the things about IIS based errors is that in most cases the Event Log on the web server can help you a great deal. In the System Event Log I found the following:

A process serving application pool 'OperationsManagerMonitoringView' reported a failure trying to read configuration during startup. The process id was '6352'. Please check the Application Event Log for further event messages logged by the worker process on the specific error. The data field contains the error number.

Checking the IIS Management Console I could indeed see that the Application Pool was stopped. Starting it succeeded, but viewing the page made it crash again. Looking a bit further I found the following in the Application Event Log:

The worker process for application pool 'OperationsManagerMonitoringView' encountered an error 'Configuration file is not well-formed XML

' trying to read configuration data from file '\\?\C:\Windows\Microsoft.NET\Framework64\v2.0.50727\CONFIG\web.config', line number '14'. The data field contains the error code.

Now that seems pretty descriptive! Using notepad I checked the contents of the file and tried to see why the XML was not well-formed. I checked the XML tags and the closings and such but I couldn’t find anything at first sight. Looking a bit longer I saw that the quotes (“) were different from the other quotes in the file. Here’s a screenshot of the bad line and the fixed line. You can simply erase and retype the “ on that line and you should be good to go.

Personally I like taking a backup copy before I perform manual fixes. After saving the file I did an IISReset just to be sure. And after that we were able to successfully view our alerts through the Web Portal again!

↧

Failover Cluster: Generic Applications Fail with OutOfMemoryException

August 14, 2014, 9:01 am

≫ Next: Quick Tip: Enumerate a User his AD Group Memberships

≪ Previous: SCOM 2012 R2: Web Portal: 503 The Service is Unavailable

Recently I helped a customer which was having troubles migrating from a Windows 2003 cluster to a Windows 2012 cluster. The resources they were running on the cluster consisted of many in house developed applications. There were about 80 of them and they were running as generic applications.

Due to Windows 2003 being end of life they started a phased migration towards Windows 2012 (in a test environment). At first the migration seemed to go smooth, but at a given moment they were only able to start a limited amount of applications. The applications that failed gave an Out Of Memory exception (OutOfMemoryException). Typically they could start about 25 applications, and from then on they weren’t able to start more. This number wasn’t exact, sometimes it was more, sometimes it was less.

As I suspected that this wasn’t really a failover clustering problem but more a Windows problem I googled for “windows 2012 running many applications out of memory exception”. I found several links:

HP: Unable to Create More than 140 Generic Application Cluster Resources

IBM: Configuring the Windows registry: Increasing the noninteractive desktop heap size

If the parallel engine is installed on a computer that runs Microsoft Windows Server, Standard or Enterprise edition, increase the noninteractive desktop heap size to ensure that a sufficient number of processes can be created and run concurrently

So it seems you can tweak the desktop heap size in the registry. Here is some background regarding the modification we did to the registry.

The key: HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\Session Manager\SubSystems\Windows

SharedSection has three values: (KB184802: User32.dll or Kernel32.dll fails to initialize)

The first SharedSection value (1024) is the shared heap size common to all desktops. This includes the global handle table, which holds handles to windows, menus, icons, cursors, and so forth, and shared system settings. It is unlikely that you would ever need to change this value.
The second SharedSection value (3072) is the size of the desktop heap for each desktop that is associated with the "interactive" window station WinSta0. User objects like hooks, menus, strings, and windows consume memory in this desktop heap. It is unlikely that you would ever need to change this second SharedSection value.
The third SharedSection value (512) is the size of the desktop heap for each desktop that is associated with a "noninteractive" window station. If this value is not present, the size of the desktop heap for noninteractive window stations will be same as the size specified for interactive window stations (the second SharedSection value).

Default on Windows 2012 seems to be 768

Raising it to 2048 seems to be a workaround/solution. A reboot is required! After this we were able to start up to 200 generic applications (we didn’t test more). However after a while there were some failures, but at first sight quite limited. This might be due to the actual memory being exhausted. Either way, we definitely saw a huge improvement.

Disclaimer: ASKPERF: Sessions, desktops and windows stations

Please do not modify these values on a whim. Changing the second or third value too high can put you in a no-boot situation due to the kernel not being able to allocate memory properly to even get Session 0 set up

Bonus info: why didn’t the customer didn’t have any issues running the same workload on Windows 2003? They configured the generic applications with “allow desktop interaction”. Something which was removed from the generic applications in Windows 2008. Because they had “allow desktop interaction” configured, generic applications were running in an interaction session and thus were not limited by the much smaller non-interactive desktop heap size.

↧

Quick Tip: Enumerate a User his AD Group Memberships

August 28, 2014, 6:36 am

≫ Next: Active Directory: Lsass.exe High CPU Usage

≪ Previous: Failover Cluster: Generic Applications Fail with OutOfMemoryException

Using the two following commands you can easily retrieve all the groups a user is member of. This command will also take account group membership caused by nested groups. Here’s the first line, it’s a multi-line command that will store all of the groups the users is a member of in the $tokenGroups variable. The groups are represented by their SID.

$tokenGroups = Get-ADUser -SearchScope Base -SearchBase 'CN=thomas,OU=Admin Accounts,DC=contoso,DC=com' `

-LDAPFilter '(objectClass=user)' -Properties tokenGroups | Select-Object `

-ExpandProperty tokenGroups | Select-Object -ExpandProperty Value

In order to easily translate them to their AD AccountName you can use the following command I blogged about earlier (Quick Tip: Resolving an SID to a AccountName)

$groups = $tokengroups | % {((New-Object System.Security.Principal.SecurityIdentifier($_)).Translate( [System.Security.Principal.NTAccount])).Value}

Using the “-SearchSCope Base –SearchBase …” approach seems to be necessary as you cannot simply use Get-ADUser username …

↧

Active Directory: Lsass.exe High CPU Usage

September 18, 2014, 11:15 am

≫ Next: S.DS.AM GetAuthorizationGroups() Fails on Windows 2008 R2/WIN7

≪ Previous: Quick Tip: Enumerate a User his AD Group Memberships

Recently I had an intriguing issue at one of my customers I frequently visit. They saw their domain controllers having a CPU usage way above what we normally saw. Typically their domain controllers have between 0 – 15% CPU usage whereas now we saw all of their domain controllers running at 80 – 90% during business hours. A quick look showed us that the process which required this much CPU power was lsass.exe. Lsass.exe is responsible for handling all kind of requests towards Active Directory. If you want you can skip to the end to find the cause, but I’ll write this rather lengthy post nevertheless so that others can learn from the steps I took before finding the answer.

Some background: the environment consists of approximately 20.000 users. Technologies in the mix: Windows 7, Windows 2008 R2, SCCM, Exchange, … Our Domain Controllers are virtual, run 2008 R2 x64 SP1, have 4 vCPU’s and 16 GB RAM. RAM is a bit oversized but CPU is really the issue here. There are 8 Domain Controllers.

Here you can see a screenshot of a more or less busy domain controller. It’s not 80% on this screenshot but it’s definitely much more than we are used to see. Task Manager shows us without doubt that lsass.exe is mainly responsible.

Whenever a domain controller (or server) is busy and you are trying to find the cause, it’s a good idea to start with the built-in tool Perfmon. In the Windows 2003 timeframe articles you’ll see that they mention SPA (Server Performance Advisor) a lot, but for Windows 2008 R2 and up you don’t need this separate download anymore. Its functionality is included in Perfmon. Just go to Start > Run > Perfmon

Open up Data Collector Sets > System and right-click Active Directory Diagnostics > Start. By default it will collect data for 5’ and then it will compile a nice HTML report for you. And that’s where things went wrong for me. The compiling seemed to take a lot of time (20-30 minutes) and after that I ended up with no performance data and no report. I guess the amount of data to be processed was just to much. I found a workaround to get the report though:

While it is compiling: copy the folder mentioned in “Output” to a temporary location. In my example C:\PerfLogs\ADDS\20140909-0001

If the report would fail to compile you will see that the folder will be emptied. However we can try to compile to report for the data we copied by executing the following command:

tracerpt *.blg *.etl -df RPT3870.tmp-report report.html -f html

The .tmp file seems to be crucial for the command and it should be present in the folder you copied. Normally you’ll see again that lsass.exe is using a lot of CPU:

The Active Directory section is pretty cool and has a lot of information. For several categories you can see the top x heavy CPU queries/processes.

A sample for LDAP queries, I had to erase quite some information as I try to avoid sharing customer specific details.

However in our case, for all of the possible tasks/categories, nothing stood out. So the search went on. We took a network trace on a domain controller using the built-in tracing capabilities (Setspn: Network Tracing Awesomeness)

Netsh trace start capture=yes
Netsh trace stop

In our environment taking a trace of 20 minutes or 1 minute resulted in the same. Due to the large amount of traffic passing by, only 1 minute of tracing data was available in the file. The file was approximately 130 MB.

Using Microsoft Network Monitor (Microsoft.com: Network Monitor 3.4 ) the etl file from the trace can be opened and analyzed. However: 1 minute of tracing contained about 328041 frames (in my case). In order to find a needle in this haystack I used the Top Users expert (Network Monitor plugin): Codeplex: NMTopUsers This plugin will give you an overview of all IP’s that communicated and how many data/packets they exchanged. It’s an easy way to see which IP is communicating more than average. If that’s a server like Exchange that might be legit, but if it’s a client something might be off.

As a starting point I took an IP with many packets:

I filtered the trace to only show traffic involving traffic with that IP

IPv4.Address == x.y.z.a

Going through the data of that specific client seemed to reveal a lot of TCP 445 (SMB) traffic. However in Network Monitor this was displayed as all kinds of protocols:

SMB2
LSAD
LSAT
MSRPC
SAMR

As far as I can tell it seems that TCP 445 (SMB) is being used as a way for certain protocols to talk to Active Directory (lsass.exe on the domain controller). When a client logs on, you could explain TCP 445 as group policy objects being downloaded from the SYSVOL share. However in our case that definitely didn’t seem to be case. Browsing a bit through the traffic, it seemed that the LSAT and SAMR messages were more than interesting. So I changed my display filter:

IPv4.Address == x.y.z.a AND (ProtocolName == “LSAT” OR ProtocolName == “SAMR”)

This resulted in traffic being displayed which seemed pretty easy to read and understand:

It seemed that the client was doing a massive amount of LsarLookupNames3 request. A typical request looked like this:

Moreover by opening multiple requests I could see that each request was holding a different username to be looked up. Now why would a client be performing queries to AD that seemingly involved all (or a large subset) of our AD user accounts. In my case, in the 60’ seconds trace I had my top client was doing 1015 lookups in 25 seconds. Now that has to count for some load if potentially hundreds of clients are doing this.

As the traffic was identified as SMB, I figured the open files (Computer Management) on the domain controller might show something:

Seems like a lot client have a connection to \lsarpc

And \samr is also popular. Now I got to be honest, both lsarpc and samr were completely unknown to me. I always thought whatever kind of lookups have to be performed against AD are over some kind of LDAP.

After using my favourite search engine I came up with a bit more background information:

It seems LsarOpenPolicy2, LsarLookupNames3, LsarClose are all operations that are performed against a pipe that is available over TCP 445. The LsarLookupNames3 operation seems to resolve usernames to SIDs.

Using the open files tool I tried identifying clients that were currently causing traffic. Keep in mind that there’s also traffic to those pipes that are absolutely valid. In my case I took some random clients and used “netstat –ano | findstr 445” to check if the SMB session remained open for more than a few seconds. Typically this would mean the client was actually hammering our domain controllers.

From this trace we can see that the process ID of the process performing the traffic is 33016. Using Task Manager or Process Explorer (SysInternals) you can easily identifiy the actual process behind this ID.

We could see that the process was WmiPrvSe.exe and that it’s parent process was Svchost.exe (C:\WINDOWS\system32\svchost.exe -k DcomLaunch). If you see multiple WmiPrvSe.exe processes, don’t be alarmed. Each namespace (e.g. root\cimv2) has it’s own process. Those processes are only running when actual queries are being handled.

In the above screenshot you can see, by hovering over the process, that this instance is responsible for the root\CIMV2 namespace. The screenshot was taken from another problem instance so the PID does not reflect the one I mentioned before.

The properties of the process / The threads tab:

You can clearly see that there’s a thread at the top which draws attention. If we open the stack:

Use the copy all button to copy paste in a text file:

…

RPCRT4.dll!I_RpcTransGetThreadEventThreadOptional+0x2f8

RPCRT4.dll!I_RpcSendReceive+0x28

RPCRT4.dll!NdrSendReceive+0x2b

RPCRT4.dll!NDRCContextBinding+0xec

ADVAPI32.dll!LsaICLookupNames+0x1fc

ADVAPI32.dll!LsaICLookupNames+0xba

ADVAPI32.dll!LsaLookupNames2+0x39

ADVAPI32.dll!SaferCloseLevel+0x2e26

ADVAPI32.dll!SaferCloseLevel+0x2d64

cimwin32.dll!DllGetClassObject+0x5214

…

framedynos.dll!Provider::ExecuteQuery+0x56

framedynos.dll!CWbemProviderGlue::ExecQueryAsync+0x20b

wmiprvse.exe+0x4ea6

wmiprvse.exe+0x4ceb

We can clearly see that this process is the one performing the lookups. Now the problem remains… Who ordered this (WMI) query with the WMI provider? WmiPrvSe is only the executor, not the one who ordered it…

In order to get to the bottom of this I needed to find out who performed the WMI query. So enabling WMI tracing was the way to go. In order to correlate the events, it might be convenient to know when the WmiPrvSe.exe procss was created so that I know that I’m looking at the correct events. I don’t want to be looking at one of the the many SCCM initiated WMI queries! In order to know what time we should be looking for events, we’ll check the Security event log.

Using calc.exe we can easily convert the PID: 33016 to HEX: 80F8. Just set it to programmer mode, enter it having Dec selected and then select Hex

In our environment the security event log has entries for each process that is created. If that’s not the case for you, you can play around with auditpol.exe or update your GPO’s. We can use the find option and enter the HEX code to find the appropriate entry in the security event log. You might find some non related events, but in my case using the HEX code worked out pretty well. The screenshot is from another problem instance, but you get the point.

So now all we need is WMI tracing. Luckily with Windows 7 a lot of this stuff is available from with the event viewer: Applications and Services Log > View > Show Analytic And Debug Logs

Microsoft > Windows > WMI-Actvitiy > Trace

One of the problems with this log is that it fills up quite fast. Especially in an environment where SCCM is active as SCCM relies on WMI extensively. This log will show each query that is handled by WMI. The good part is that it will include the PID (process identifier) of the process requesting the query. The challenge here was to have tracing enabled, make sure this specific log wasn’t full as then it would just drop new events, and have the issue, which we couldn't reproduce, occur… With some hardcore patience and a bit of luck I found an instance pretty fast.

So I correlated the events with the process creation time of the WmiPvrSe.exe process:

First event:

Followed by:

Basically it’s a connect to the namespace, execute a query, get some more information and repeat process. Some of the queries:

SELECT AddressWidth FROM Win32_Processor
Select * from __ClassProviderRegistration
select __RELPATH, AddressWidth from Win32_Processor
select * from msft_providers where HostProcessIdentifier = 38668
SELECT Description FROM Win32_TimeZone
…

And the last ones:

GroupOperationId = 260426; OperationId = 260427; Operation = Start IWbemServices::ExecQuery - SELECT Name FROM Win32_UserAccount; ClientMachine = Computer01; User = NT AUTHORITY\SYSTEM; ClientProcessId = 21760; NamespaceName = \\.\root\CIMV2

Now the Select name FROM Win32_UserAccount seems to be the winner here. That one definitely seems to be relevant to our issue. If we open up the MSDN page for WIN32_UserAccount: link there’s actually a warning: Note Because both the Name and Domain are key properties, enumerating Win32_UserAccount on a large network can negatively affect performance. Calling GetObject or querying for a specific instance has less impact.

Now the good part PID 21760 actually leads to something:

The process we found seems to be a service from our antivirus solution:

The service we found to be the culprit is the McAfee Product Improvement Program service (TelemetryServer, mctelsvc.exe). Some background information from McAfee: McAfee.com: Product Improvement Program

In theory this is the info they are gathering:

Data collected from client system
BIOS properties
Operating System properties
Computer model, manufacturer and total physical memory
Computer type (laptop, desktop, or tablet)
Processor name and architecture, Operating System architecture (32-bit or 64-bit), number of processor cores, and number of logical processors
Disk drive properties such as name, type, size,and description
List of all third-party applications (including name, version, and installed date)
AV DAT date and version and Engine version
McAfee product feature usage data
Application errors generated by McAfee Processes
Detections made by On-Access Scanner from VirusScan Enterprise logs
Error details and status information from McAfee Product Logs

I don’t see any reason why they would need to execute that query… But the event log doesn’t lie. In order to be absolutely sure that this is the query that resulted in such a massive amount of traffic we’ll try to execute the query we suspect using wbemtest.

Start > Run > WbemTest

If you have Network Monitor running alongside this test you’ll see a lot of SAMR traffic passing by. So it seemed to be conclusive. The McAfee component had to go. After removing the Product Improvement Program component of each pc we can clearly see the load dropping:

To conclude: I know this is a rather lengthy post and I could also just have said “hey, if you have a lsass.exe CPU issues, just check if you got the McAfee Product Improvement Program component running” but with this kind of blog entries I want to share my methodology with others. For me troubleshooting is not only fixing the issue at hand. It’s also about getting an answer (what? why? how?) and it’s always nice if you learn a thing or two on your quest. In my case I learned about WMI tracing and lsarpc/samr. As always feedback is appreciated!

↧

S.DS.AM GetAuthorizationGroups() Fails on Windows 2008 R2/WIN7

September 24, 2014, 9:37 pm

≫ Next: Error Loading Direct Access Configuration

≪ Previous: Active Directory: Lsass.exe High CPU Usage

Today I got a call from a colleague asking me to assist with an issue. His customer had a Windows 2008 R2 server with a custom .NET application on it. The application seemed to stop working from time to time. After a reboot the application seemed to work for a while.

The logging showed a stack trace that started at UserPrincipals.GetAuthorizationGroups and gave the message: An error (1301) occurred while enumerating the groups. The group's SID could not be resolved.

Exception information:
Exception type: PrincipalOperationException
Exception message: An error (1301) occurred while enumerating the groups.
The group's SID could not be resolved.
at System.DirectoryServices.AccountManagement.SidList.TranslateSids(String target, IntPtr[] pSids)
at System.DirectoryServices.AccountManagement.SidList..ctor(SID_AND_ATTR[] sidAndAttr)
at System.DirectoryServices.AccountManagement.AuthZSet..ctor(Byte[] userSid, NetCred credentials, ContextOptions contextOptions, String flatUserAuthority, StoreCtx userStoreCtx, Object userCtxBase)
at System.DirectoryServices.AccountManagement.ADStoreCtx.GetGroupsMemberOfAZ(Principal p)
at System.DirectoryServices.AccountManagement.UserPrincipal.GetAuthorizationGroups()

The first thing that came to mind was that they deleted some groups and that the application wasn’t properly handling that. But they assured me that was not the case. The only thing they had changed that came to mind was adding a Windows 2012 domain controller.

I could easily reproduce the issue using PowerShell:

Function Get-UserPrincipal($cName, $cContainer, $userName){
    $dsam = "System.DirectoryServices.AccountManagement"
    $rtn = [reflection.assembly]::LoadWithPartialName($dsam)
    $cType = "domain" #context type
    $iType = "SamAccountName"
    $dsamUserPrincipal = "$dsam.userPrincipal" -as [type]
    $principalContext =
        new-object "$dsam.PrincipalContext"($cType,$cName,$cContainer)
    $dsamUserPrincipal::FindByIdentity($principalContext,$iType,$userName)
}

[string]$userName = "thomas"
[string]$cName = "contoso"
[string]$cContainer = "dc=contoso,dc=local"

$userPrincipal = Get-UserPrincipal -userName $userName '
-cName $cName -cContainer $cContainer

$userPrincipal.getGroups()

Source: Hey Scripting Guy

Some googling led me to:

In short, it seems that when a 2012 domain controller was involved, the GetAuthorizationGroups() function would choke on two new groups (SIDs) that are added to a user by default. Patching the server running the application was enough in order to fix this.

The issue wasn’t really hard to solve as the solution was easy to find online, but I think it’s a great example of the type of application/code to give a special look when you’re testing your AD upgrade.

↧

Error Loading Direct Access Configuration

September 30, 2014, 10:48 pm

≫ Next: Configure Windows Logon With An Electronic Identity Card (EID)

≪ Previous: S.DS.AM GetAuthorizationGroups() Fails on Windows 2008 R2/WIN7

This morning I wanted to have a quick look at our Direct Access infrastructure and when opening the console I got greeted with various errors all explaining that there was a configuration load error:

In words: ICMP settings for entry point cannot be determined. Or:

In words: Settings for entry point Load Balanced Cluster cannot be retrieved. The WinRM client cannot process the request. It cannot determine the content type of the HTTP response from the destination computer. The content type is absent or invalid.

Because initially I only stumbled upon the ICMP settings … error I had to dig a bit deeper. I searched online on how to enable additional tracing capabilities, but I couldn’t find anything. From reverse engineering the RaMmgmtUI.exe I could see that more than enough tracing was available. Here’s how to enable it:

Create a REG_DWORD called DebugFlag below HKLM\SYSTEM\CurrentControlSet\Services\RaMgmtSvc\Parameters. For our purpose we’ll give it a value of 8. For an overview of the possible values:

I’m not sure if you can combine those in some way. After finding this registry key, I was able to find the official article on how do this: TechNet: Troubleshooting DirectAccess. I Should have looked a bit better for that information perhaps… After closing the Remote Access Management Console and opening it again, the log file was being filled up:

You can find the trace file in c:\Windows\Tracing and it’s called RaMgmtUIMon.txt After opening the file I stumbled across the following error:

2112, 1: 2014-09-30 11:51:43.116 Instrumentation: [RaGlobalConfiguration.AsyncRefresh()] Exit
2112, 12: 2014-09-30 11:51:43.241 ERROR: The WinRM client cannot process the request. It cannot determine the content type of the HTTP response from the destination computer. The content type is absent or invalid.
2112, 9: 2014-09-30 11:51:43.242 Failed to run Get-CimInstance

I then used PowerShell to try to do the same: connect to the other DA node using WinRM:

The command: winrm get winrm/config –r:HOSTNAME The error:

WSManFault
Message = The WinRM client cannot process the request. It cannot determine the content type of the HTTP response from the destination computer. The content type is absent or invalid.

Error number: -2144108297 0x803380F7
The WinRM client cannot process the request. It cannot determine the content type of the HTTP response from the destination computer. The content type is absent or invalid.

Googling on the error number 2144108297 quickly got me to the following articles:

Basically I was running into this issue because my AD user account was member of a large amount of groups. The MaxTokenSize has been raised in Windows 2012 (R2) so that’s already covered, but winhttp.sys, which WinRM depends on, hasn’t. When running into Kerberos token bloat issues on web applications, typically the MaxRequestBytes and MaxFieldLength values have to be tweaked a bit.

There are various ways to configure these. Using GPO or a manual .reg file that you can just double click:

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\HTTP\Parameters]

"MaxRequestBytes"=dword:0000a000

"MaxFieldLength"=dword:0000ff37

In my environment I’ve set MaxRequestBytes to 40960 and MaxFieldLength to 65335. But I am by no means saying those are the advised values. It’s advised to start with a lower value and slightly increase until you’re good to go.

Conclusion: if you run into any of the above errors when using the Direct Access management console, make sure to check whether WinRM is happy. In my case WinRM was in trouble due to the size of my token.

↧

Configure Windows Logon With An Electronic Identity Card (EID)

October 22, 2014, 11:51 am

≫ Next: Windows Technical Preview: Cannot Update the System Reserved Partition

≪ Previous: Error Loading Direct Access Configuration

Here in Belgium people have been receiving an Electronic Identity Card (EID) for years now. Every once in a while I have a customer who asks me whether this card can be used to logon to workstations. That would mean a form of strong authentication is applied. The post below will describe the necessary steps in order to make this possible. It has been written using a Belgian EID and the Windows Technical Preview (Threshold) for both client and server.

In my lab I kept the infrastructure to a bare minimum.

WS10-DC: domain controller for threshold.local
WS10-CA2: certificate authority (enterprise CA)
W10-Client: client

The Domain Controller(s) Configuration

Domain Controller Certificate:

You might wonder why I included a certificate authority in this demo. Users will logon using their EID and those cards come with certificates installed that have nothing to do with your internal PKI. However, in order for domain controllers to be able to authenticate users with a smart card, they should have a valid certificate as well. If you fail to complete this requirement, your users will receive an error:

In words: Signing in with a smart card isn’t supported for your account. For more info, contact your administrator.

And your domain controllers will log these errors:

In words: The Key Distribution Center (KDC) cannot find a suitable certificate to use for smart card logons, or the KDC certificate could not be verified. Smart card logon may not function correctly if this problem is not resolved. To correct this problem, either verify the existing KDC certificate using certutil.exe or enroll for a new KDC certificate.

And

In words: This event indicates an attempt was made to use smartcard logon, but the KDC is unable to use the PKINIT protocol because it is missing a suitable certificate.

In order to give the domain controller a certificate, that can be used to authenticate users using a smart card, we will leverage the Active Directory Certificate Services (AD CS) role on the WS10-CA2 server. This server is installed as an enterprise CA using more or less default values. Once the ADCS role is installed, your domain controller should automatically request a certificate based upon the “Domain Controller” certificate. This is a V1 template. A domain controller is more or less hardcoded to automatically request a certificate based upon this template.

In my lab this certificate was good enough to let my users authenticate using his EID. After restarting the KDC service and performing the first authentication, the following event was logged though:

In words: The Key Distribution Center (KDC) uses a certificate without KDC Extended Key Usage (EKU) which can result in authentication failures for device certificate logon and smart card logon from non-domain-joined devices. Enrollment of a KDC certificate with KDC EKU (Kerberos Authentication template) is required to remove this warning.

Besides the Domain Controller template there’s also the more recent Domain Controller Authentication and Kerberos Authentication templates which depend on auto-enrollment to be configured.

Computer Configuration > Policies > Windows Settings > Security Settings > Public Key Policies

After waiting a bit, gpupdate and/or certutil –pulse might speed things up a bit, we got our new certificates:

You can see that the original domain controller certificate is gone and replaced by its more recent counterparts. After testing we can confirm that the warning is no longer logged in the event log. We have now covered the certificate the domain controller requires, we’ll need to add a few more settings on the domain controllers for EID logons to work.

Domain Controller Settings

Below HKLM\SYSTEM\CurrentControlSet\Services\Kdc we’ll create two registry keys:

DWORD SCLogonEKUNotRequired 1
DWORD UseCachedCRLOnlyAndIgnoreRevocationUnknownErrors 1

Strictly spoken, the last one shouldn’t be necessary if your domain controller can reach the internet, or at least the URL where the CRL’s used in the EIDs, are hosted. If you use this registry key, make sure to remove a name mapping (more on that later) or disable the user when the EID is stolen or lost. An easy way to push these registry key is using group policy preferences.

Domain Controller Trusted Certificate Authorities

In order for the domain controller to accept the EID of the user, the domain controller has to trust the full path in the issued certificate. Here’s my EID as an example:

We’ll add the Belgium Root CA2 certificate to the Trusted Root Certificate Authorities on the domain controller:

Computer Configuration > Policies > Windows Settings > Security Settings > Public Key Policies > Trusted Root Certification Authorities

And the Citizen CA to the Trusted Intermediate Certificate Authorities on the domain controller:

Computer Configuration > Policies > Windows Settings > Security Settings > Public Key Policies > Intermediate Certification Authorities

Now this is where the first drawback from using EIDs as smartcards comes: there are many Citizen CA’s to add and trust… Each month, sometimes more, sometimes less, a new Citizen CA is issued and used to sign new EID certificates. You can find them all here: http://certs.eid.belgium.be/ So instead of using a GPO to distribute them, scripting a regular download and adding them to the local certificate stores might be a better approach.

The Client Configuration

Settings

For starters we’ll configure the following registry keys:

Below HKLM\SYSTEM\CurrentControlSet\Control\Lsa\Kerberos\Parameters we’ll create two registry keys:

DWORD CRLTimeoutPeriod 1
DWORD UseCachedCRLOnlyAndIgnoreRevocationUnknownErrors 1

Again, if your client is capable of reaching the internet you should not need these. I have to admit that I’m not entirely sure how the client will react when a forward proxy is in use. After all, the SYSTEM doesn’t always know what proxy to use and it might be requiring to authenticate.

Besides the registry keys, there’s also some regular group policy settings to configure. In some articles you’ll probably see these settings also being pushed out as registry keys, but I prefer to use the “proper” settings as they are available anyhow.

Computer Settings > Policies > Administrative Templates > Windows Components > Smart Cards

Allow certificates with no extended key usage certificate attribute: Enabled
- This policy setting lets you allow certificates without an Extended Key Usage (EKU) set to be used for logon.
Allow signature keys valid for Logon: Enabled
- This policy setting lets you allow signature key-based certificates to be enumerated and available for logon.

These two are required so that the EID certificate can be used. As you can see it has a usage attribute of Digital Signature

In some other guides you might also find these Smart Card settings enabled:

Force the reading of all certificates on the smart card
Turn on certificate propagation from smart card
Turn on root certificate propagation from smart card

But my tests worked fine without these.

Drivers

Out of the box Windows will not be able to use your EID. If you don’t install the required drivers you’ll get an error like this:

You can download the drivers from here: eid.belgium.be On the Windows 10 preview I got an error during the installation. But that probably had to do with the EID viewer software. The drivers seem to function just fine.

Active DIrectory User Configuration

As these certificates are issued by the government, they don’t contain any specific information that allows Active Directory to find out to which user should be authenticated. In order to resolve that we can add a name mapping to a user. And this is the second drawback. If you want to put EID authentication in place you’ll have to have some sort of process or tool that allows users to link their EID to their Active Directory User Account. The helpdesk could do this for them or you could write a custom tool that allows users to do it themselves.

In order to do it manually:

First we need the certificate from the EID. You can use Internet Explorer > Internet Options > Content > Certificates

You should see two certificates. The one you want is the one with Authentication in the Issued To. Use the Export… button to save it to a file.

Open Active Directory Users and Computers > View > Advanced Features

Locate the user the EID belongs too > Right-Click > Name Mappings…

Add an X.509 Certificate

Browse to a copy of the Authentication smart card which can be found on the EID

Click OK

Testing the authentication

You should now be able to logon to a workstation with the given EID. Either by clicking other user and clicking the smart card icon

Or if the client has remembered you from earlier logons you can choose smart card below that entry.

An easy way to see if a user logged on using smart card or username/password is the query for the user his group memberships on the client. When users log on with a smart card they get the This organization certificate group SID added to their logon token. This is a well-known group (S-1-5-65-1) that was introduced with Windows 7/ Windows 2008 R2.

Forcing smart card authentication

Now all of the above allows a user to authenticate using smart cards, but it doesn’t forces the user to do it. Username password will still be accepted by the workstations. If you want to force smart card logon there are two possibilities. Each with their own drawbacks.

1. On the user level:

There’s a property Smart card is required for interactive logonthat you can check on the user object in Active Directory. Once this is checked, the users will only be able to logon using a smart card. There’s one major drawback though. Once you click apply, at the same time this will set the password of that user to a random value and password policies will no longer apply for that user. That means that if you got some applications that are integrated with Active Directory, but do so by asking credentials in a username/password form, your user will not be able to logon as they don’t know the password… If you configure this setting on the user you have to make sure all applications are available through Kerberos/NTLM SSO. If you were to use Exchange Active Sync, you would have to change the authentication scheme from username/password to certificate based for instance. So I’m not really sure enforcing this at the user level is a real option. This option seems more feasible for protection high privilege accounts.

2. On the workstation level:

There’s a group policy setting that can be configured on the computer level that enforces all interactive logons to require a smart card. It can be found under computer settings > Policies > Windows Settings > Security Settings > Local Policies > Security Options > Interactive logon: Require smart card

While you’re there, also look at Interactive logon: Smart card removal behavior. It allows you to configure a workstation to lock when a smart card is removed. If you configure this one, make sure to also configure the Smart Card Removal Policy service to be started on your clients. This service is stopped and set to manual by default.

Now the bad news. Just like with the first one, there’s also a drawback. This one could be less critical for some organisations, might it might require people to operate in a slightly different way. Once this setting is enabled, all interactive logons require a smart card:

Ctrl-alt-del logon like a regular user
Remote Desktop to this client
Right-click run as administrator (in case the user is not an administrator himself) / run as different user

For instance right-click notepad and choosing run as different user will result in the following error if you try to provide a username/password

In words: Account restriction are preventing this user from signing in. For example: blank passwords aren’t allowed, sign-in times are limited, or a policy restriction has been enforced. For us this is an issue as our helpdesk often uses remote assistance (built-in the OS) to help users. From time to time they have to provide their administrative account in order to perform certain actions. As the user his smart card is inserted, the helpdesk admin cannot insert his own EID. That would require an additional smart card reader. And besides that: a lot of the helpdesk tasks are done remotely and that means the EID is in the wrong client… There seem to be third party solutions that tackle this particular issue: redirecting smart cards to the pc you’re offering remote assistance.

Now there’s a possible workaround for this. The policy we configure in fact sets the following registry value to 1:

MACHINE\Software\Microsoft\Windows\CurrentVersion\Policies\System\ScForceOption

Using remote registry you could change it to 0 and then perform your run as different again. Changing it to 0 immediately sets the Interactive logon: Require smart card to disabled. Effective immediately. Obviously this isn’t quite elegant, but you could create a small script/utility for it…

Administrative Accounts (or how to link a smart card to two users)

If you would use the same certificate (EID) in the name mapping of two users in Active Directory, your user will fail to login:

In words: Your credentials could not be verified. The reason is quite simple. Your workstation is presenting a certificate to Active Directory, but Active Directory has two principals (users) that map to that certificate. Now which user does the workstation want?

Name hints to the rescue! Let’s add the following GPO setting to our clients:

Computer Settings > Policies > Administrative Templates > Windows Components > Smart Cards

Allow user name hint: enabled

After enabling this setting there’s an optional field called Username hint below the prompt for the PIN.

In this username hint field the person trying to logon using a smart card can specify which AccountName to be used. In the following example I’ll be logging on with my thomas_admin account:

NTauth Certificate Store

Whenever you read into the smart card logon subject you’ll see the NTauth certificate store being mentioned from time to time. It seems to be involved in some way, but it’s still not clear to me. All I can say is that in my setup, using an AD integrated CA for the Domain Controller certificates, I did not had to configure/add any certificates to the NTauth store. Not the Belgian Root CA, Not the Citizen CA. My internal CA was in it of course.

I did some tests, and to my experience, the CA that issued your domain controllers certificate has to be in the NTAuth store on both clients and domain controllers. If you would remove that certificate you’ll be greeted with an error like this:

In words: Signin in with a smart card isn’t supported for your account. For more info, contact your administrator. And on the domain controller the same errors are logged like the ones from the beginning of this article.

Some useful commands to manipulate the NTauth store locally on a client/server:

Add a certificate manually: certutil -enterprise -addstore ntAuth .\ThresholdCA.cer
View the store: certutil -enterprise -viewstore ntAuth
Delete a certificate: certutil -enterprise -viewdelstore ntAuth

Keep in mind that the NTauth store exists both locally on the client/servers and in Active Directory. An easy way to view/manipulate the NTauth store in Active Directory is the pkview.msc management console which you typically find on a CA. Right-click the root and choose manage AD containers to view the store.

A second important fact regarding the NTauth store. Whilst you might see the require CA certificate in the store in AD, your clients and servers will only download the content of the AD NTauth store IF they have auto-enrollment configured!

Summary:

There are definitely some drawbacks to using EID in a corporate environment:

No management software to link the certificates to the AD users. Yes there’s active directory users and computers, but you’ll have to ask the users to either come visit your helpdesk or email their certificate. Depending on the number of users in your organisation this might be a hell of a task. A custom tool might be a way to solve this.
Regular maintenance: as described, quite regular a new Citizen CA (Subordinate Certificate Authority) is issued. You need to ensure your domain controllers have this CA in their trusted intermediate authorities store. This can be done through GPO, but this particular setting seems hard to automated. You might be better off with a script that performs this task directly on your domain controllers.
Helpdesk users will have to face the complexity if the require a smart card setting is enabled.
If an EID is stolen/lost you might have to temporary allow normal logons for that user. An alternative is to have a batch of smart cards that you can issue yourself. An example vendor for such smart cards is Gemalto.
An other point that I didn’t had to chance to test though. What about the password of the users. If they can’t use it to logon, but the regular password policies still apply, how will they be notified of the expiration? Or even better, how will they change it? Some applications might depend on the username/password to logon.

As always, feedback is welcome!

↧

Windows Technical Preview: Cannot Update the System Reserved Partition

October 27, 2014, 3:21 pm

≫ Next: 3PAR: Connect to WebAPI using PowerShell

≪ Previous: Configure Windows Logon With An Electronic Identity Card (EID)

Last week a new build for Windows Technical Preview (“Windows 10”) was available. You can easily find out by going to PC settings: Windows + C > Settings > Change PC Settings > Update and Recovery > Preview Builds

My current version (build 9841):

Upon clicking install now I got the following error:

In words: Failed to install the new preview build, please try again later. 0x80246007

After rebooting and trying again:

In words: Couldn’t install Windows Technical Preview. We couldn’t update the system reserved partition.

In words: Failed to install the new preview build, please try again later. 0xC1900200

I opened up diskmgmt.msc to find out what was wrong with my system reserved partition:

As you can see the first partition (system reserved) was quite full. I assigned a drive letter and starting looking around. The easiest way to do this is to use PsExec (http://technet.microsoft.com/en-us/sysinternals/bb897553.aspx) and start a command prompt as System (psexec –s cmd). If you use a regular command prompt you’ll get some access denieds here and there as your local administrator user might not have access to some system managed files/folders. Using dir /a you’ll be able to drill down the structure. Eventually I came up with H:\Recovery\WindowsRE\ which contained a file WinRE.wim of 309 MB.

This WinRE.wim contains a Windows Recovery Environment which you can boot when your system is having issues. It’s not vital that this is stored in the system reserved partition so I thought Id move it. Using “reagentc.exe /info” or “bcdedit /enum all” you can also see this configuration:

I then started messing around with takeown and eventually I just used Windows Explorer and moved the Recovery folder to my second internal HDD (D:\) which I use as a data volume. After moving the files I could see that the WinRE configuration was disabled. I googled around a bit to find out how I could update the information to reflect the new location. There seemed to be a reagentc command available, but although it stated success my configuration wasn’t updated to reflect the new path. So I used Visual BCD (http://www.boyans.net/) to just easily change the BCD parameters:

I updated both Windows Recovery Device options (edit SdiDevice and chose D: as my partition)

The same for the Windows Recovery Environment loaders (edit ApplicationDevice and OSDevice)

Now my configuration showed as enabled again:

After making some free room I could now successfully install the latest build:

Eventually it seemed that the update process also moved (or recreated?) the WinRE environment on my C:\ drive. The Recovery folder I moved was empty (besides the logs folder). Using reagentc /info I could also see that the WinRE.wim was coming from the C:\ partition. So I guess this worked out fine for me.

On a final note: there’s a new option available to set your preference as to how fast you want to receive new builds:

This is also explained on an official blog of Microsoft:blogs.windows.com: We’re rolling out our first new build to the Windows Insider Program

↧

3PAR: Connect to WebAPI using PowerShell

November 5, 2014, 11:37 am

≫ Next: MaxTokenSize Implications for HTTP.SYS

≪ Previous: Windows Technical Preview: Cannot Update the System Reserved Partition

I’m currently involved in a CloudCruiser implementation. CloudCruiser is not one of my usual technologies, but as it’s something new to me it’s refreshing to do. CloudCruiser allows you to collect information from your infrastructure and then generate billing information. You could generate bills for virtual machine instances or storage usage. My customer has 3PAR storage and I had to wrote a script which runs frequently and collects volume information.

As far as I can tell there are two approaches:

Use the 3PAR CLI utilities
Use the 3PAR Web API

I wanted to avoid the CLI utilities. They need to be installed (or copied) to the server where you want to run the script and integrating these tools and their data with PowerShell is less intuitive. I loved the idea of a Web API. This goes hand in hand with the Invoke-WebRequest cmdlet in PowerShell. This cmdlet does many of the heavy lifting and makes it real easy to talk with a given Web API. Here’s how I connected to the 3PAR device and how I got the volume information.

Calling a method of the 3PAR Web API is a two part job: first you have to call the /credentials method using a HTTP POST and provide a valid username and password. The result of that call will be a session key that you can use in subsequent calls to the Web API.

Getting the session key:

001              
002               
003               
004               
005               
006               
007               
008               
009               
010               
011               
012               
013               
014               
#Credentials 
$username = "3PAR user" 
$password = "3PAR user password" 

#IP of the 3PAR device 
$IP = "10.0.0.1" 
#API URL 
$APIurl = "https://$($IP):8080/api/v1" 

$postParams = @{user=$username;password=$password} | ConvertTo-Json 
$headers = @{} 
$headers["Accept"] = "application/json" 
$credentialdata = Invoke-WebRequest -Uri "$APIurl/credentials" -Body $postParams -ContentType "application/json" -Headers $headers -Method POST -UseBasicParsing 
$key = ($credentialdata.Content | ConvertFrom-Json).key

And that’s it! After this you should get a string in the the $key variable which can be used in calls further down the script. But I have to take a step back. To be honest the above code didn’t work. The problem in my case was that I was accessing the API over HTTPS but the certificate couldn’t be validated. I was using the IP to access the device and it was a self signed certificate. So reasons enough why the Invoke-WebRequest cmdlet was sad… I found the following workaround which you can place somewhere before your first Invoke-WebRequest cmdlet:

001              
002               
003               
004               
005               
006               
007               
008               
009               
010               
011               
012               
013               
014               
#avoid issues with an invalid (self-signed) certificate, try avoid tabs/spaces as this might mess up the string block 
#http://stackoverflow.com/questions/11696944/powershell-v3-invoke-webrequest-https-error 
add-type @"                
    using System.Net;                 
    using System.Security.Cryptography.X509Certificates;                 
    public class TrustAllCertsPolicy : ICertificatePolicy {                 
        public bool CheckValidationResult(                 
            ServicePoint srvPoint, X509Certificate certificate,                 
            WebRequest request, int certificateProblem) {                 
            return true;                 
        }                 
    }                 
"@ 
[System.Net.ServicePointManager]::CertificatePolicy = New-Object TrustAllCertsPolicy

Calling the method:

And now on to the actual magic. Here we’ll do a GET to the /volumes method.

001              
002               
003               
004               
005               
006               
007               
$headers = @{}
$headers["Accept"] = "application/json"
$headers["X-HP3PAR-WSAPI-SessionKey"] = $key
$volumedata = Invoke-WebRequest -Uri "$APIurl/volumes" -ContentType "application/json" -Headers $headers -Method GET -UseBasicParsing 
$volumedataPS = ($volumedata.content | ConvertFrom-Json).members
#also works:
#$volumedata = Invoke-RestMethod -Uri "$APIurl/volumes" -ContentType "application/json" -Headers $headers -Method GET 

And that’s all there is to it! $volumedataPS now contains an array with objects you can iterate through. No need to work with intermediate CSV files or other tricks.

Some additional information:

The UseBasicParsing parameter. When running the PowerShell script as the user I was logged in I didn’t had any troubles. Once I started running it as SYSTEM (for a scheduled task), it gave the following error: Invoke-WebRequest : The response content cannot be parsed because the Internet Explorer engine is not available, or Internet Explorer's first-launch configuration is not complete. Specify the UseBasicParsing parameter and try again. The UseBasicParsing seems to avoid using the IE engine altogether and thus the script runs fine under SYSTEM.

Invoke-WebRequest versus Invoke-RestMethod: It’s to my understanding that they both work for calling the Web API, but Invoke-WebRequest seems to return more information regarding the actual call whereas Invoke-RestMethod simply returns the requested data. I figured this might help when adding additional logging.

The Web API might not be enabled by default. You could provide the following instructions to your 3PAR admin: Veeam: Enabling the HP 3PAR Web Services API Server They are from Veeam but I found them to be accurate.

↧

MaxTokenSize Implications for HTTP.SYS

November 12, 2014, 11:02 pm

≫ Next: Azure Virtual Machines: Event 257: Defrag: Slab Consolidation/ Slab Analysis

≪ Previous: 3PAR: Connect to WebAPI using PowerShell

One of my customers had problems with certain users being member of a lot of Active Directory groups. This resulted in several client side issues. There’s an easy and well-known “fix” for that: raise the MaxTokenSize registry key on all Windows operating systems in your domain. On Windows 8(.1) / 2012 (R2) the MaxTokenSize is already at its maximum (advised) value out of the box. That value is 48.000 bytes. In order to mitigate these users their access problems we raised the MaxTokenSize to 48.000 bytes on all clients and servers that are running Windows 7/ Windows 2008 R2. After this change the typical issues were gone. However new ones came up:

From time to time, when HTTP is involved, issues were encountered:

Opening the Direct Access management console (depends on WinRM)
Open the FIM Portal
Streaming App-V packages over HTTP
…

Typically the user would receive several authentication prompts and even after specifying valid credentials another prompt would reappear. Example browser based issue:

As you can see the browser gives an HTTP 400 Bad Request error. Using a network trace we can easily see why it’s considered bad:

And the packet details:

The details clearly state that The size of the request headers is too long.

The problem here is that the token is allowed to be up to 48.000 bytes where it used to be 12.000 bytes. The http subsystem of a windows server has several parameters that are supposed to protect the server from oversized requests. However, as the token can now be a lot larger, the maximum request size has to be tuned as well:

From: KB820129

Below: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\HTTP\Parameters there are two interesting values:

And from: KB2020943 we can find a formula to calculate the MaxFieldLength to set based on the MaxTokenSIze.

If MaxToken is 48.000 bytes (default in Windows 2012 and configure by GPO for 2008 R2/ Win7):

(4/3 * 48000) + 200 = 64200

We’ll use the maximum allowed value of MaxFieldLength 65534 (=~ 64200) to allow tokens up to 48000 bytes. We’ll also use this value for MaxRequestBytes.

MaxFieldLength: we can take the maximum allowed value: 65534
MaxRequestBytes: 65534

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\HTTP\Parameters

Other useful information:

I specifically wanted to post this information as in many other only articles/posts I always see people just using the maximum allowed value for MaxRequestBytes and I don’t feel 100% comfortable with that. Second, in my opinion it’s advised to have these values pushed out to all your server systems. Especially now that Windows 2012 and up have a MaxTokenSize of 48.000 by default. If you don’t push these HTTP.sys parameters, you’ll end up troubleshooting the same phenomena multiple times from different angles. Why waste time?

↧

Azure Virtual Machines: Event 257: Defrag: Slab Consolidation/ Slab Analysis

March 4, 2015, 11:33 am

≫ Next: Commvault REST API using PowerShell

≪ Previous: MaxTokenSize Implications for HTTP.SYS

I’ve got a customer running some virtual machines on the Azure IAAS platform, and when doing a quick checkup I found the following recurring events in the event log:

The volume (C:) was not optimized because an error was encountered: Neither Slab Consolidation nor Slab Analysis will run if slabs are less than 8 MB. (0x8900002D)
The volume Temporary Storage (D:) was not optimized because an error was encountered: Neither Slab Consolidation nor Slab Analysis will run if slabs are less than 8 MB. (0x8900002D)
The volume Data (F:) was not optimized because an error was encountered: Neither Slab Consolidation nor Slab Analysis will run if slabs are less than 8 MB. (0x8900002D)

Sample event:

I’m aware that Windows does maintenance on a regular base all by itself. One of these tasks is a scheduled defrag:

You’ll probably see that there’s no trigger for this particular task, yet it runs regularly! That’s because there’s another scheduled task called “regular maintanance” who calls this specific task. An excellent writeup can be found here: DataToHelpThePeople: Maintenance at 3AM in the Morning

I did some googling and I quickly came accross this: KB2964429: Storage Optimizer memory use increases when it runs on thin provisioned LUNs

From that article: There's no need to run Storage Optimizer on thin provisioned LUNs that use an allocation size (also known as slab size) of less than 8 MB. Thin provisioned LUNs that have a smaller slab size manage space more efficiently, and the benefits of defragmenting them are not as great.

So that got me curious, how big is the allocation size on these volumes? We can find this information using diskpart:

After selecting a given volume, execute “filesystem”. In our example we have 4 MB which is, as the erorr states, less than 8 MB. Both the OS disk and the Temporary Disk have 4 MB which is the default I guess. My custom disk (F:) also has this value.

If we check the action of the scheduled task we can see that this is the command being executed: %windir%\system32\defrag.exe -c -h -k -g –$ If we execute that command manually, the same events are logged.

From defrag.exe /? I can tell that we could ommit –k in order to avoid slab analysis and consolidation. However upon executing the command I’m not really sure much happens in the background.

In the following screenshot you can clearly see that for each volume an event with ID 258 is logged: The storage optimizer succesfully completed retrim on XYZ. This events are NOT logged when I run the defrag command without the –k switch.

Your first reaction might be to disable this scheduled task all togehter. You might not care about fragmentation. But you might care about your billing statement… If you care check: fabriccontroller: Release unused space from your Windows Azure Virtual Hard Disks to reduce their billable size Bottom line: whenever this maintenance tasks finds unused storage it reclaims it and lowers the overall VHD size. Which means you pay less for your storage account!

I’m not really sure how to move forward. On the one hand I really detest recurring, safe to ignore, errors in events logs. But I don’t like changing system stuff like this without some official guidance. So if you got something to add, feel free to comment!

↧

Commvault REST API using PowerShell

April 26, 2015, 5:14 am

≫ Next: Corrupt Local GPO Files

≪ Previous: Azure Virtual Machines: Event 257: Defrag: Slab Consolidation/ Slab Analysis

A while ago I wrote a blog post about Connecting to the 3PAR WebAPI using PowerShell. Today I’m doing the same but now with the Commvault REST API. Connecting to that API is even easier! Commvault has a nice sandbox to get familiar with the REST API! This one is definately worth a look. It can be accessed at http://commvaultwebconsoleserver.contoso.com/webconsole/sandbox/ It looks like this:

Working with the API is similar to the 3PAR approach: first we authenticate and get a token. Then we use that token to call other services. In order to get this working I just looked at the c# sample code Commvault provides: Commvault Documentation: REST API - Getting Started Using C# Here’s how you can authenticate using PowerShell:

001
002
003
004
005
006
007
008
009
010
011
012
013
014
015

#Credentials
$username = "Contoso\s_apiuser"
$password = P@$$w0rd
#Commvault web console server
$SERVER = "srdccvws0001.rddcmgmt.local"
#API URL
$APIURL = "http://$($SERVER):81/SearchSvc/CVWebService.svc"

$APIURLaction = "$APIURL/login"
$passwordB64byte = [System.Text.Encoding]::UTF8.GetBytes($password)
$encodedPassword = [System.Convert]::ToBase64String($passwordB64byte)
$loginReq = "<DM2ContentIndexing_CheckCredentialReq mode=""Webconsole"" username=""$username"" password=""$encodedPassword"" />"
$result = Invoke-WebRequest -Uri $APIURLaction -Method POST -Body $loginReq -UseBasicParsing
$token = (([xml]$result.content).SelectSingleNode("/DM2ContentIndexing_CheckCredentialResp/@token")).value

The code might seem cryptic but is actually pretty straightforward: we need to call the /login service and pass it some parameters.The service requires the password to be base64 encoded. As you might guess this information goes across the wire… So I would definitely advise to use an account that only has permissions to the items it needs to access. Ideally we’d approach the service over SSL but I’m not sure if Commvault allows SSL to be configured for the services.

The username and encoded password have to be passed in a specific XML string that is passed as the body of the web request. The result is an XML string that contains a token value. We don’t need the whole string but only the part representing the token. Now that we managed to get a token we can start calling specific services. If I'm not mistaken this token will remain valid until there’s a period of inactivity (30’).

Here’s how you can get all Clients:

001
002
003
004
005
006
007
008
009
010

$service = "/client"
$action = "GET"
$APIURLaction = "$APIURL$service"
$headers = @{}
#default is XML
$headers["Accept"] = "application/json"
$headers["Authtoken"] = $token
$result = Invoke-WebRequest -Uri $APIURLaction -Headers $headers -Method $action -UseBasicParsing
$clientInfo = ($result.content | ConvertFrom-Json).App_GetClientPropertiesResponse.clientProperties

If you go through the REST API documentation you’ll notice that some services are GET based and other are POST based. The /client service is GET based, the /jobdetails is POST based. You’ll also notice that we pass an XML string as the body for the request.

001
002
003
004
005
006

$jobID = "3040"
$body = "<JobManager_JobDetailRequest jobId=""$jobID""/>"
$service = "/JobDetails"
$action = "POST"
$result = Invoke-WebRequest -Uri $APIURLaction -Headers $headers -Method $action -UseBasicParsing -Body $body
$jobDetails = ($result.content | ConvertFrom-Json).JobManager_JobDetailResponse.job.jobDetail

I hope with these basic examples you can now successfully connect yourself. As you might see, the above code contains no error handling. Depending on how you will use scripts like that I would advise to add some logging and error handling. Try providing wrong credentials or entering a bad servername and see what happens.

↧

Corrupt Local GPO Files

May 8, 2015, 11:26 am

≫ Next: Protecting a Domain Controller in Azure with Microsoft Antimalware

≪ Previous: Commvault REST API using PowerShell

A while ago I go I looked into a laptop not being able to access anything on the network. As this customer has Direct Access deployed I knew I had to start my troubleshooting with the following command: netsh dns show state

As you can tell from the screenshot above, the laptop thinks it’s outside the corporate network and has Direct Access configured and enabled. I tried pinging various resources (on the domain) but they all failed. That would make sense as the client is trying to build a Direct Access tunnel, but fails to do so. Besides that, the name resolution policy also kicks in. The result is that neither remote or local connectivity is working. In such a situation one should suspect an issue with the Network Location Service that is deployed on the network. However this was an isolated case as no other clients were showing similar issues…

The reason name resolution and thereby all other domain related tasks are failing is the fact that the Direct Access name resolution policies are in place and force all DNS requests for the domain zone to be resolved by the Direct Access DNS service. That one is not reachable as we don’t have a valid Direct Access Connection… In order to mitigate this I thought I’d kill the name resolution policies locally and see if I’d be able to get it talking to the domain again.

Delete both DA-…. keys. They can be found below HKEY_LOCAL_MACHINE\SOFTWARE\Policies\Microsoft\Windows NT\DNSClient\DnsPolicyConfig. Reboot the client afterwards. In my case I could see connectivity (and name resolution) was now working again. But processing GPO’s still failed:

In the event log:

Event 1096: The processing of Group Policy failed. Windows could not apply the registry-based policy settings for the Group Policy object LocalGPO. Group Policy settings will not be resolved until this event is resolved. View the event details for more information on the file name and path that caused the failure.

Some googling led me the following information:

The instructions to fix this:

Rename (or delete) C:\Windows\System32\GroupPolicy\Machine\Registry.pol
Start > run > cmd (as admin)
Gpedit.msc
Below administrative templates change a (not matter which) setting and then revert it. This will trigger the creation of a new registry.pol file
gpupdate /force
Gpo’s should process correctly now.

Now you might wonder, how does this registry.pol gets in such a condition that group policy processing starts the fail? I stumbled across the following post:

http://blogs.technet.com/b/systemcenterpfe/archive/2013/01/11/updated-system-center-2012-configuration-manager-antivirus-exclusions-with-more-details.aspx?pi47623=2

In the comments section there’s a comment from Mike Niccum which seems to be very interesting. We checked our exclusions on our Endpoint Protection and as Mike explains we’re also seeing the missing antivirus exclusion. We added it and in the coming weeks we’ll see whether new issues will pop up or not.

Present (Wrong?) exclusion: C:\Windows\System32\GroupPolicy\Registry.pol
Missing exclusion: C:\Windows\System32\GroupPolicy\Machine\Registry.pol

↧

Protecting a Domain Controller in Azure with Microsoft Antimalware

June 3, 2015, 3:56 am

≫ Next: Synchronizing Time on Azure Virtual Machines

≪ Previous: Corrupt Local GPO Files

I’m getting more and more involved with customers using Azure to host some VM’s in an IAAS scenario. In some cases they like to have a Domain Controller from their corporate domain on Azure. I think it’s a best practice to have some form of malware protection installed. Some customers opt to use their on-premise solution, other opt to use the free Microsoft Antimalware solution. The latter comes as an extension which you can add when creating a virtual machine. Or just add it afterwards. One of the drawbacks is that there’s no central management. You push it out to each machine and that’s it.

Both the old and new portals allow to specify this during the machine creation:

Old portal wizard:

New portal wizard:

However, the new portal allows you to specify additional parameters:

As you can see you can also specify the exclusions. For certain workloads (like SQL) this is pretty important. From past experiences I know that getting exclusions for a given application is a pretty tedious work. You have to go through various articles and compose your list. I took a look at the software installed on an Azure VM and I noticed it was called System Center Endpoint Protection.

Second I went ahead and looked in the registry:

The easiest way to configure those exclusion setting is through PowerShell. The Set-AzureVMMicrosoftAntimalwareExtension cmdlet has a parameter called AntimalwareConfigFile that accepts both an XML or JSON file. Initially I thought I’d just take the XML files from a System Center Endpoint Protection implementation and be done with it. Quickly I found out that the format for this XML file was different than the templates SCEP uses. So I thought I’d do some quick find and replace. But no matter what I tried, issues kept popping inside the guest and the XML file failed to be parsed successfully. This guide explains it pretty well, but I failed to do so: Microsoft Antimalware for Azure Cloud Services and Virtual Machines

I was preferring XML as that format allows for comment tags which is pretty easy to document certain exclusions. Now I had to resort to JSON which is just a bunch of text in brackets/colons. Here’s some sample config files based upon the files from SCEP:

A Regular Server

A SQL Server

{
"AntimalwareEnabled": true,
"RealtimeProtectionEnabled": true,
"ScheduledScanSettings": {
"isEnabled": false,
"day": 1,
"time": 180,
"scanType": "Full"
},
"Exclusions": {
"Extensions": "",
"Paths": "%allusersprofile%\\NTUser.pol;%systemroot%\\system32\\GroupPolicy\\Machine\\registry.pol;%windir%\\Security\\database\\*.chk;%windir%\\Security\\database\\*.edb;%windir%\\Security\\database\\*.jrs;%windir%\\Security\\database\\*.log;%windir%\\Security\\database\\*.sdb;%windir%\\SoftwareDistribution\\Datastore\\Datastore.edb;%windir%\\SoftwareDistribution\\Datastore\\Logs\\edb.chk;%windir%\\SoftwareDistribution\\Datastore\\Logs\\edb*.log;%windir%\\SoftwareDistribution\\Datastore\\Logs\\Edbres00001.jrs;%windir%\\SoftwareDistribution\\Datastore\\Logs\\Edbres00002.jrs;%windir%\\SoftwareDistribution\\Datastore\\Logs\\Res1.log;%windir%\\SoftwareDistribution\\Datastore\\Logs\\Res2.log;%windir%\\SoftwareDistribution\\Datastore\\Logs\\tmp.edb",
"Processes": "%ProgramFiles%\\Microsoft SQL Server\\MSSQL10.MSSQLSERVER\\MSSQL\\Binn\\SQLServr.exe"
}
}

This one is almost identical to the server one, but here we exclude the SQLServr.exe process. The path to this executable might be different in your environment!
A Domain Controller

{
"AntimalwareEnabled": true,
"RealtimeProtectionEnabled": true,
"ScheduledScanSettings": {
"isEnabled": false,
"day": 1,
"time": 180,
"scanType": "Full"
},
"Exclusions": {
"Extensions": "",
"Paths": "%allusersprofile%\\NTUser.pol;%systemroot%\\system32\\GroupPolicy\\Machine\\registry.pol;%windir%\\Security\\database\\*.chk;%windir%\\Security\\database\\*.edb;%windir%\\Security\\database\\*.jrs;%windir%\\Security\\database\\*.log;%windir%\\Security\\database\\*.sdb;%windir%\\SoftwareDistribution\\Datastore\\Datastore.edb;%windir%\\SoftwareDistribution\\Datastore\\Logs\\edb.chk;%windir%\\SoftwareDistribution\\Datastore\\Logs\\edb*.log;%windir%\\SoftwareDistribution\\Datastore\\Logs\\Edbres00001.jrs;%windir%\\SoftwareDistribution\\Datastore\\Logs\\Edbres00002.jrs;%windir%\\SoftwareDistribution\\Datastore\\Logs\\Res1.log;%windir%\\SoftwareDistribution\\Datastore\\Logs\\Res2.log;%windir%\\SoftwareDistribution\\Datastore\\Logs\\tmp.edb;E:\\Windows\\ntds\\ntds.dit;E:\\Windows\\ntds\\EDB*.log;E:\\Windows\\ntds\\Edbres*.jrs;E:\\Windows\\ntds\\EDB.chk;E:\\Windows\\ntds\\TEMP.edb;E:\\Windows\\ntds\\*.pat;E:\\Windows\\SYSVOL\\domain\\DO_NOT_REMOVE_NtFrs_PreInstall_Directory;E:\\Windows\\SYSVOL\\staging;E:\\Windows\\SYSVOL\\staging areas;E:\\Windows\\SYSVOL\\sysvol;%systemroot%\\System32\\Dns\\*.log;%systemroot%\\System32\\Dns\\*.dns;%systemroot%\\System32\\Dns\\boot",
"Processes": "%systemroot%\\System32\\ntfrs.exe;%systemroot%\\System32\\dfsr.exe;%systemroot%\\System32\\dfsrs.exe"
}
}

Again a lot of familiar exceptions as in the server template but also specific exclusions for NTDS related files and DNS related files. Remark: One of the best practices for installing domain controllers in Azure is to relocate the AD database/log files and sysvol to another disk with caching set to none. So the above exclusions might be wrong! Replace %systemroot% with the drive letter containing your AD files!

Special remark: the SCEP templates have a bug where they add %systemroot%\\system32\\GroupPolicy\\Registry.pol which in fact should be %systemroot%\\system32\\GroupPolicy\\Machine\\registry.pol I’ve given an example issue of that here: Setspn.blogspot.com: Corrupt Local GPO Files

The templates above are in the JSON format. I save them as MicrosoftAntiMalware_DC.json

001
002
003

$vm = get-AzureVM -servicename "CoreInfra" -name "SRVDC01"
$vm | Set-AzureVMMicrosoftAntimalwareExtension -AntimalwareConfigFile C:\Users\Thomas\Documenten\Work\MicrosoftAntiMalware_DC.json | Update-AzureVM

Now in the registry on the VM we can verify our extensions are applied:

Some good references:

↧

Synchronizing Time on Azure Virtual Machines

June 5, 2015, 7:55 am

≫ Next: Federating ADFS with the Belnet Federation

≪ Previous: Protecting a Domain Controller in Azure with Microsoft Antimalware

I’m currently setting up a a small Identity infrastructure on some Azure Virtual Machines for a customer. The components we’re installing consist of some domain controllers, a FIM server, a FIM GAL Sync server and an SQL server to support the FIM services. All of those are part of the CONTOSO domain. Besides the Azure virtual machines we also got two on-premises machines, also member of the CONTOSO domain. They communicate with the other CONTOSO servers across a site to site VPN with Azure.

Eventually I came to the task of verifying my time synchronization setup. Throughout the years there have been small variations in recommendations. Initially I had configured time synchronization like I always do: configure a GPO that targets specifically the PDC domain controller. This GPO configures the PDC domain controller to use an NTP server for it’s time.

Administrative Templates > System > Windows Time Service > Global Configuration Settings:

Set to AnnounceFlags to 5 so this domain controller advertise as a reliable time source. Besides that we also need to give a good source for the PDC domain controller:

Administrative Templates > System > Windows Time Service > Global Configuration Settings > Time Providers

In the above example I’m just using time.windows.com as a source and the type is set to NTP. Just for the reference, the WMI filter that tells this GPO to only apply on the PDC domain controller:

Typically that’s all what’s needed. Keep in mind, the above was done on a 2012 R2 based domain controller/GPMC. If you use older versions you might have other values for certain settings, on 2012 R2 they are supposed to be as per current recommendations. But that’s not the point of this post. For the above to work, you should make sure that the NTP client on ALL clients, servers and domain controllers OTHER than the PDC is set to NT5DS:

w32tm /query /configuration

Once the above is al set the following logic should be active:

Put simple: if you got a single domain, single forest topology:

The PDC domain controllers syncs from an internet/external source
The domain controllers sync from the PDC domain controller
The clients/member servers sync from A domain controller

You can verify this by executing w32tm /query /source:

On my PDC (DC001), on a DC (DC002) and on a member server (hosted in Azure):

=> VM IC Time Synchronization Provider

On my DC (DC003)(hosted on premises on VMware):

=> The PDC domain controller

On my member server (hosted on premises on VMware):

=> A domain controller

As you can see, that’s a bit weird. What is that VM IC Time Synchronization Provider? If I’m not mistaken, it’s a component that gets installed with Windows, and is capable of interacting with the hypervisor (E.g. on-premises Hyper-V or Azure Hyper-V). As far as I can tell, VMware guests ignore it. Basically it’s a component that helps the guest sync the time with the physical host it runs on. Now you can imagine that if guests run on different hosts, time might start to drift slowly. In order to mitigate this, we need to ensure the time is properly synchronized using the domain hierarchy.

Luckily it seems we can easily disable this functionality. We can simply set the enabled registry key to 0 for this provider. The good news: setting from 0 –> 1 seems to require a Windows Time Service restart, but I did some tests and setting from 1 –> 0 seems to be come effective after a small period of time. The good news part 2: setting it to 0 doesn’t seem to have a side effect for on-premises VM’s as well.

In my case I opted to use group policy preferences for this:

The registry path: SYSTEM\CurrentControlSet\Services\W32Time\TimeProviders\VMICTimeProvider set the Value Enabled to 0

And now we can repeat our tests again:

On my PDC (hosted in Azure):

On my DC (hosted in Azure):

On a member server (hosted in Azure):

Summary

I’ll try to validate this with some people, and I’ll definitely update this post If I’m proven to be wrong, but as far as I can tell: whenever you host virtual machines in Azure that are part of a Windows Active Directory Domain, make sure to disable to VM IC Time Provider component.

Imho this kind of information is definitely something that should be added to MSDN: Guidelines for Deploying Windows Server Active Directory on Azure Virtual Machines or Azure.microsoft.com: Install a replica Active Directory domain controller in an Azure virtual network

References:

↧

Federating ADFS with the Belnet Federation

June 8, 2015, 1:46 pm

≫ Next: Working with PowerShell DSC and Azure VM’s based on Windows 2012

≪ Previous: Synchronizing Time on Azure Virtual Machines

logo_federation The Belnet federation is a federation where a lot of Belgian educational or educational related institutions are joined to. I’m currently involved in a POC at one of these institutions. Here’s the situation we started from: they have an Active Directory domain for their employees, and are part of the Belnet federation through a Shibboleth server which is configured as an IDP with their AD. Basically this means that for certain services hosted on the Belnet federation, they can choose to login using their AD credentials through the Shibboleth server.

Now they want to host a service themselves. They would like to provide users outside of their organization access to that service, a SharePoint farm. These users will have an account at one of the institutions federated with Belnet. After some research it came clear to use that we would need an ADFS instance to act as a protocol bridge between SAML and WS-FED. SharePoint does not natively speak SAML. Now the next question: how do we get Belnet to trust our ADFS instance and how do we get our ADFS instance to trust the IDP’s part of the Belnet federation?

These are two different problems and both need to be addressed in order for authentication to succeed. We need to find out how we can let Belnet trust our ADFS instance. But first we zoom into the part where we try to trust the IDP’s in the Belnet federation. This federation has over 20 IDP’s in it and it’s metadata is available at the following URL: Metadata XML file - Official Belnet federation From my first contacts with the people responsible for this federation I heard that it would be hard to get ADFS to “talk” to this federation. They mentioned ADFS does speak SAML, but not all SAML specifications are supported. One of the things that ADFS cannot handle is creating a claims provider trust based upon a metadata file which contains multiple IDPs. And guess what this Belnet metadata file contains…

Some research led me to the concept of federation trusts topologies. Suppose you got two partners who want to expose their Identity Provider so that their users can authenticate at services hosted between partners. In the Microsoft world one typically configures one ADFS instance as a claims provider trust and on the other side the other way round: as a relying party trust. And for the other organization the other way round. And that’s it. But what happens if you want to federate with 3 parties? Now each party has to add two claims provider trusts. And what happens when a new organization joins the federation? Each organization that is already active in the federation has to exchange metadata and add the new organization. As the number of partners in the federation grows you can see that the Microsoft approach seems to scale badly for this…

Now after reading up a bit on this subject I learned that there are two types of topologies: full mesh and proxy based. In the proxy approach each party federates with the proxy and the proxy remains in the middle for authentication requests. In the full mesh topology each party federates with each party. As I explained above, a full mesh approach scales bad. The Belnet setup is mostly based upon Shibboleth and each Shibboleth server gets updated automatically whenever an additional IDP or SP is added to the federation. So Belnet is only responsible for distributing the federation partner information to each member. So I came up with the following idea: If I were to take the Belnet XML file and chop it into multiple IDP XML files, I could add those one by one to the ADFS configuration. I got this idea here: Technet (Incommon Federation): Use FEMMA to import IDPs

Here’s a schematic view of the Federation Metadata exchanges. It might makes things a bit more clear. On the schema you’ll see the Shibboleth server, but in fact, for the SharePoint/ADFS instance it’s irrelevant.

Adding Belnet IDP’s to ADFS

Search the Belnet federation XML file for something recognizable like part of the DNS domain: vub.ac.be, or (part of) the name of the IDP: Brussel Once you got the good entry we need everything from this IDP that’s between the <EntityDescriptor> tags. So you should have something like this:

001
002
003
004
005
006
007

<EntityDescriptor entityID="https://idp.vub.ac.be/idp/shibboleth" xmlns="urn:oasis:names:tc:SAML:2.0:metadata" xmlns:ds="http://www.w3.org/2000/09/xmldsig#" xmlns:shibmd="urn:mace:shibboleth:metadata:1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    …
    <GivenName>Technical Support</GivenName>
    <SurName>Technical Support</SurName>
    <EmailAddress>support@vub.ac.be</EmailAddress>
    </ContactPerson>
</EntityDescriptor>

Copy this to a separate file and save it as FederationMetadata_VUB.xml

Now go to the ADFS management console and add a claims provider trust.

When asked, provide the XML file we just created. When you’re done change the Signature hash algorithm. You can find this on the advanced trust. This might differ from trust to trust and you can try without changing, but if your authentication results in an error, check your ADFS event logs and if necessary change this setting.

The error:

In words:

Authentication Failed. The token used to authenticate the user is signed using a weaker signature algorithm than expected.

And that’s it. Repeat for any other IDP’s you care about. Depending on the number of IDP’s this is a task you’d want to script or not. The InCommon federation guide contains a script written in Python which provides similar functionality.

Adding your ADFS as SP to the Belnet Federation

Now the first part seemed easy. We had to do some cutting and pasting, but for a smaller amount of IDP’s this seems doable. Now we have to ensure all involved IDP’s trust our ADFS server. In the worst case we have to contact them one by one and exchange information. But that would mean we’re not benefitting the Belnet federation. Our goal is to have our ADFS trusted by Belnet and that will ensure all Belnet partners trust our ADFS instance. This would ensure we only have to exchange information with one party and thus simplifying this process a lot!

First we need the Federation Metadata from the ADFS instance: https://sts.contoso.com/FederationMetadata/2007-06/FederationMetadata.xml

Then we need to edit a bit so that the Belnet application that manages the metadata is capable of parsing the file we give it. Therefore we’ll remove the blocks we don’t need or that tooling at Belnet is not compatible with:

Signature block: <signature>…</signature>
WS-FED stuff: <RoleDescriptor xsi:type="fed:ApplicationServiceType … </RoleDescriptor>
Some more WS-FED stuff: <RoleDescriptor xsi:type="fed:SecurityTokenServiceType" … </RoleDescriptor>
SAML IDP stuff, not necessary as we’re playing SP: <IDPSSODescriptor protocolSupportEnumeration="urn:oasis:names:tc:SAML:2.0:protocol"> … </IDPSSODescriptor>

We also need to add some contact information:

There should be a block present that looks like this: <ContactPerson contactType="support"/>

Replace it with:

001
002
003
004
005
006
007
008
009
010
011

<Organization>
    <OrganizationName xml:lang="en" xmlns:xml="http://www.w3.org/XML/1998/namespace"> Contoso </OrganizationName>
    <OrganizationDisplayName xml:lang="en" xmlns:xml="http://www.w3.org/XML/1998/namespace"> Contoso Corp </OrganizationDisplayName>
    <OrganizationURL xml:lang="en" xmlns:xml="http://www.w3.org/XML/1998/namespace"> http://www.contoso.com </OrganizationURL>
</Organization>
<ContactPerson contactType="technical">
    <GivenName>Thomas</GivenName>
    <SurName>Vuylsteke</SurName>
    <EmailAddress>adfs.admin@contoso.com</EmailAddress>
</ContactPerson>

Now you’re ready to upload your modified metadata at Belnet: https://idpcustomer.belnet.be/idp/Authn/UserPassword

After some time you’ll be able to logon using the IDP’s you configured. Pretty cool eh! Authentication will rely on the trusts shown below:

Some remarks:

Scoping: once you trust several IDP’s like this, you might be interested in a way to limit the users to the ones your organization works with. The customer I implemented this has an overview of all users in their Active Directory. So we allow the user to log on at their IDP, but we have ADFS authorization rules that only issue a permit claim when we find the user as an enabled AD user in the customer AD. These user are there for legacy reasons and can now be seen as some form of ghost accounts.

Certificates: the manual nature of the above procedure also means you have to keep the certificates up to date manually! If the IDP starts using an other certificate you have to update that IDP specific information. If you change your certificates on the ADFS instance you have to contact Belnet again and have your metadata updated. Luckily most IDP’s in the Belnet federation have expiration dates far away in the future. But not all of them. Definitely a point of attention.

Just drop a comment if you want more information or if you got some feedback.

↧

Working with PowerShell DSC and Azure VM’s based on Windows 2012

June 17, 2015, 8:37 am

≫ Next: Azure Billing Changes

≪ Previous: Federating ADFS with the Belnet Federation

Mostly when I work with Azure VM’s I do the actual VM creation using Azure PowerShell cmdlets. I like how you can have some template scripts that create VM’s from beginning to end. Typically I create a static IP reservation, I join them to an AD domain, I add one or more additional disks,I add the Microsoft Antimalware extension…. When the VM is provisioned I log on and I’m pretty much ready to go. One of the things I noticed is that the Time Zone was set to UTC where I like it to be GMT+1. Obviously this only requires two clicks but I wanted this to be done for me. Now there are various approaches: either us traditional tooling like SCCM, GPO (is there a setting/registry key? ), …. or do it the Azure way. As far as Azure is concerned I could create a custom VM image or use PowerShell DSC (Desired State Configuration).

I prefer DSC over a custom image. The main reason is that I can apply these DSC customizations to whatever image from the gallery I feel like applying them. If the SharePoint team wants to take the latest SharePoint image from the gallery, I can just apply my DSC over it. If there’s a more recent Windows 2012 R2 image, I can just throw my DSC against it and I’m ready to go.

The following example shows how to apply a given DSC configuration to a VM.

001
002
003
004
005

$configurationArchive = "DSC_SetTimeZone.ps1.zip"
$configurationName = "SetTimeZone"
$VM = Get-AzureVM -ServiceName "contoso-svc" -Name "CONTOSO-SRV"
$VM = Set-AzureVMDSCExtension -VM $vm -ConfigurationArchive $configurationArchive -ConfigurationName $configurationName
$VM | update-AzureVM

Now I wont go into all of the details, but here are some things I personally ran into.

Creating and Uploading the DSC configuration archive

Initially I had some trouble wrapping my head around how to get my script to run on a target machine. I had this cool DSC script I found on the internet and tweaked it a bit:

001
002
003
004
005
006
007
008
009
010
011
012
013
014
015
016
017
018
019
020
021
022

#Requires -version 4.0
Configuration SetTimeZone
{
    Param
    (
        #Target nodes to apply the configuration
        [Parameter(Mandatory = $false)]
        [ValidateNotNullorEmpty()]
        [String]$SystemTimeZone="Romance Standard Time"
    )

    Import-DSCResource -ModuleName xTimeZone

    Node localhost
    {
        xTimeZone TimeZoneExample
        {
            TimeZone = $SystemTimeZone
        }
    }
}

This script depends on the xTimeZone DSC resource. As I already knew, those DSC resources, like xTimeZone, come in waves. Would my server have the latest version? Did I have to install that out of band? It seems not. All you need to do is create a configuration archive, a ZIP file, which contains both your script and the resource it depends on. The Azure cmdlets are an easy way to do this. They’ll also make sure all the dependent DSC resources are added to the package.

We got our script in c:\users\thomas\onedrive\documenten\work\blog\DSC. Some steps further I’ll get you the information where to store the DSC resources.

By using the following command we can create and upload the package to the “setspn” storage account:

001
002
003
004

$subscriptionID = "af2f6ce8-e4f3-abcd-abcd-34ab4ce9c7d3"
$storageAccountName = "setspn"
Set-AzureSubscription -SubscriptionId $subscriptionID -CurrentStorageAccount $storageAccountName
Publish-AzureVMDscConfiguration -ConfigurationPath C:\Users\Thomas\OneDrive\Documenten\Work\Blog\DSC\DSC_SetTimeZone.ps1

We need to execute this from an Azure PowerShell prompt. I executed this command from a Windows 8.1 machine that is running PowerShell v4.

It seems to be complaining that we are running this from an x86 prompt instead of an x64 prompt. But the Azure PowerhShell prompt is an x86 prompt… The error in words:

Publish-AzureVMDscConfiguration : Configuration script 'C:\Users\Thomas\SkyDrive\Documenten\Work\Blog\DSC\DSC_SetTimeZo
ne.ps1' contained parse errors:
At C:\Users\Thomas\SkyDrive\Documenten\Work\Blog\DSC\DSC_SetTimeZone.ps1:2 char:1
+ Configuration SetTimeZone
+ ~~~~~~~~~~~~~
Configuration is not supported in a Windows PowerShell x86-based console. Open a Windows PowerShell x64-based console,
and then try again.
At C:\Users\Thomas\SkyDrive\Documenten\Work\Blog\DSC\DSC_SetTimeZone.ps1:3 char:1
+ {
+ ~
Unexpected token '{' in expression or statement.
At C:\Users\Thomas\SkyDrive\Documenten\Work\Blog\DSC\DSC_SetTimeZone.ps1:21 char:1
+ }
+ ~
Unexpected token '}' in expression or statement.
At line:1 char:1
The error is quite misleading. I tried the various “DSC” cmdlets, like Get-DSCResource, and they all failed saying that the cmdlet could not be found. So it seems I needed the WMF framework to be installed. Shame on me. Here’s some explanation towards the prerequisites: TechNet Gallery: DSC Resource Kit (All Modules) Using the WMF 5.0 installer got me further.

Now off to creating the package again: again an error…

Now it seems to complain it can’t find the DSC resources… But I installed them?! The error in words:

VERBOSE: Parsing configuration script: C:\Users\Thomas\SkyDrive\Documenten\Work\Blog\DSC\DSC_SetTimeZone.ps1
VERBOSE: Loading module from path 'C:\Program Files
(x86)\WindowsPowerShell\Modules\xTimeZoneSource\DSCResources\xTimeZone\xTimeZone.psm1'.
Publish-AzureVMDscConfiguration : Configuration script 'C:\Users\Thomas\SkyDrive\Documenten\Work\Blog\DSC\DSC_SetTimeZo
ne.ps1' contained parse errors:
At C:\Users\Thomas\SkyDrive\Documenten\Work\Blog\DSC\DSC_SetTimeZone.ps1:16 char:9
+ xTimeZone TimeZoneExample
+ ~~~~~~~~~
Undefined DSC resource 'xTimeZone'. Use Import-DSCResource to import the resource.
At line:1 char:1
After some googling I found out that the PowerShell prompt imports to module it finds in it’s PATH variable. As we are running from an x86 prompt, the folder that was loaded was different. Typically all DSC guides tell you to install DSC resources below C:\Program Files\WindowsPowerShell\Modules but in fact for the Azure PowerShell prompt you need to put them in C:\Program Files (x86)\WindowsPowerShell\Modules or you have to modify your PATH variable to include the x64 location…. I choose to copy the module to the x86 location:

And all seems fine now:

Applying the DSC to a Windows 2012 VM

The DSC script I created worked fine on a newly install Windows 2012 R2 VM, but on a Windows 2012 the extension seemed to have troubles. Now that wasn’t supposed to happen… The good thing about the Azure DSC extension is that the logging is quite decent. Inside the VM you can find some log files in the following location: C:\WindowsAzure\Logs\Plugins\Microsoft.Powershell.DSC\1.10.1.0

The following extract comes from the DscExtensionHandler log file:

VERBOSE: [2015-05-28T21:43:17] Applying DSC configuration:
VERBOSE: [2015-05-28T21:43:17]     Sequence Number:              0
VERBOSE: [2015-05-28T21:43:17]     Configuration Package URL:
https://setspn.blob.core.windows.net/windows-powershell-dsc/DSC_SetTimeZone.ps1.zip
VERBOSE: [2015-05-28T21:43:17]     ModuleSource:
VERBOSE: [2015-05-28T21:43:17]     Configuration Module Version:
VERBOSE: [2015-05-28T21:43:17]     Configuration Container:      DSC_SetTimeZone.ps1
...
VERBOSE: [2015-05-28T21:44:27] [ERROR] Importing module xTimeZone failed with error - File C:\Program
Files\WindowsPowerShell\Modules\xTimeZone\DscResources\xTimeZone\xTimeZone.psm1 cannot be loaded because running scripts is
disabled on this system. For more information, see about_Execution_Policies at http://go.microsoft.com/fwlink/?LinkID=135170.

Now that’s a pretty well known message… Bummer. Seems the execution policy on the Windows 2012 machine is set to restricted. Now there’s a way around that. Scripts could be executed with the option “-executionpolicy bypass”. But we can’t control that as the DSC extension is responsible for this. Kind of a bummer. The Windows 2012 R2 image seems to have RemoteSigned as default execution policy…

Now this got me curious. Would the run PowerShell script extension also suffer from this? If it would not, I could have a small PowerShell script execute first that alters the execution policy!

001
002

Set-ExecutionPolicy remotesigned

I create a PowerShell script with this line in it. Saved it to disk and then used AzCopy to copy it a container in a storage account. Executing the script:

After executing I can confirm that the execution policy has changed:

The logging for this extension can be found here: C:\WindowsAzure\Logs\Plugins\Microsoft.Compute.CustomScriptExtension\1.4 From the log file we can see that the PowerShell script extension run scripts in a more robust way:

2015-05-28T21:36:58.7646763Z [Info]: HandlerSettings = ProtectedSettingsCertThumbprint: , ProtectedSettings: {}, PublicSettings: {FileUris: [https://storaccount.blob.core.windows.net/windows-powershell-dsc/test.ps1?sv=2014-02-14&sr=b&sig=eijcTn9I2kWuOPU1CK%2F9zQ3tAO1NIUrs8wT2gUE8z0o%3D&se=2015-05-29T21%3A07%3A00Z&sp=r], CommandToExecute: powershell -ExecutionPolicy Unrestricted -file test.ps1 }

As you can see this script is called while specifying the execution policy. Now we’ll be able to apply our DSC extension

And the log file contents:

VERBOSE: [2015-05-29T00:00:00] Import script as new module:
C:\Packages\Plugins\Microsoft.Powershell.DSC\1.10.1.0\bin\..\DSCWork\DSC_SetTimeZone.ps1.1\DSC_SetTimeZone.ps1
...
VERBOSE: [2015-05-29T00:00:12] Executing Start-DscConfiguration...
...
VERBOSE: [2015-05-29T00:00:20] [SRV2012]: LCM: [ Start Set      ]
VERBOSE: [2015-05-29T00:00:22] [SRV2012]: LCM: [ Start Resource ] [[xTimeZone]TimeZoneExample]
VERBOSE: [2015-05-29T00:00:22] [SRV2012]: LCM: [ Start Test     ] [[xTimeZone]TimeZoneExample]
VERBOSE: [2015-05-29T00:00:22] [SRV2012]: LCM: [ End    Test     ] [[xTimeZone]TimeZoneExample] in 0.1100 seconds.
...

Conclusion

Configuring the time zone using DSC might be a bit overkill. But it’s an excellent exercise to get the hang of this DSC stuff. For a good resource on DSC check this tweet from me. I myself plan to create more DSC scripts in the near future. I tear VM’s up and down all the time. I would love to have a DSC that creates me a Windows AD domain, a Microsoft Identity Manager installation, a ….

↧