Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add MSBuild binary log (.binlog) component detector #1250

Draft
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

brettfo
Copy link
Member

@brettfo brettfo commented Sep 19, 2024

Adds a new NuGet component detector to scan MSBuild binary log (*.binlog) files.

There was one major reason to add this: the current NuGet component detector that scans project.assets.json isn't always 100% accurate. This can happen when a given NuGet package has a transitive dependency on a System.* package, but the installed SDK has a newer version of that file that is replaced at build time.

One common example: a project has <PackageReference Include="Microsoft.Extensions.Configuration.Json" Version="6.0.0" />. That package has a transitive dependency on System.Text.Json/6.0.0. No matter the SDK used to restore and build the project, the project.assets.json file will report that the package System.Text.Json/6.0.0 was used and implies that it'll be present at runtime. However, if the 6.0.424 SDK or newer was used to build, the System.Text.Json.dll file from the 6.0.0 package is removed and replaced with a newer System.Text.Json.dll from the SDK's runtime package.

To avoid reporting a false positive for a package that was replaced at build time (and therefore won't be present at runtime) the MSBuild binary log scanner was added.

If a binary log is generated with the /bl flag, every MSBuild event will be recorded. The new detector in this PR scans each event for the relevant AddItem and RemoveItem elements (corresponding to things like <SomeItemGroup Include="..." ... /> and <SomeItemGroup Remove="..." ... /> and uses that to build a 100% accurate dependency set.

Most of the content of this PR is to support the tests to make them as reliable as possible through 2 different means:

  1. Generate a fresh .binlog file for each test by running the appropriate dotnet build ... command. This ensures there are no large binary test assets added to this repo; they're simply generated as needed.
  2. Mock all necessary NuGet packages to prevent network access. To accomplish this, 3 common NuGet packages are faked for the tests: Microsoft.NETCore.App.Ref, Microsoft.AspNetCore.App.Ref, and Microsoft.WindowsDesktop.App.Ref.

Future work

To fully prevent reporting false positives, it would be a good idea to merge the binary log detector and the project.assets.json detector so that if a binary log is present and covers a given .csproj, only the binary log detector is used and fall back to the original detector otherwise. Without this work, the binary log detector will properly not report System.Text.Json/6.0.0, but the project.assets.json detector would, so the false positive will still appear.

@brettfo brettfo force-pushed the msbuild-binlog branch 3 times, most recently from 99ffda7 to 2f8d76a Compare September 19, 2024 18:40
Copy link

codecov bot commented Sep 19, 2024

Codecov Report

Attention: Patch coverage is 96.27767% with 37 lines in your changes missing coverage. Please review.

Project coverage is 89.2%. Comparing base (8360853) to head (0a38a99).

Files with missing lines Patch % Lines
...tDetection.Detectors.Tests/MSBuildTestUtilities.cs 93.6% 11 Missing and 4 partials ⚠️
...rs/nuget/NuGetMSBuildBinaryLogComponentDetector.cs 94.5% 8 Missing and 5 partials ⚠️
...tection.Detectors.Tests/Utilities/TemporaryFile.cs 71.4% 5 Missing and 1 partial ⚠️
...sts/NuGetMSBuildBinaryLogComponentDetectorTests.cs 99.3% 3 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##            main   #1250     +/-   ##
=======================================
+ Coverage   88.9%   89.2%   +0.2%     
=======================================
  Files        359     363      +4     
  Lines      27672   28663    +991     
  Branches    1784    1856     +72     
=======================================
+ Hits       24613   25569    +956     
- Misses      2674    2701     +27     
- Partials     385     393      +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

if (!topLevelDependencies.TryGetValue(projectEvaluation.ProjectFile, out var topLevel))
{
topLevel = new(StringComparer.OrdinalIgnoreCase);
topLevelDependencies[projectEvaluation.ProjectFile] = topLevel;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one project file may be evaluated and built several times, you'll often see several evaluations per .csproj. Should we pick the best evaluation somehow or is it fine to do for each evaluation? Restore does an eval, then the build does another eval, and each target framework is a separate eval. We probably want a union of all evaluations?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll experiment a bit to see if I can find inaccurate results, but generally we really only care about the item groups that NuGet populates like @(RuntimeCopyLocalItems) and I think that only happens during the Restore evaluation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The restore evaluation is going to be pretty useless (it gets packages downloaded but doesn't produce any of the related items). You will definitely need to do all of the inner-build evaluations so that you get the superset of references, e.g.

<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <TargetFrameworks>net8.0;net472</TargetFrameworks>
  </PropertyGroup>

  <ItemGroup>
    <PackageReference Include="Newtonsoft.Json" Version="13.0.3"
                      Condition=" '$(TargetFramework)' == 'net8.0' " />
    <PackageReference Include="System.Text.Json" Version="8.0.4"
                      Condition=" '$(TargetFramework)' == 'net472' " />
  </ItemGroup>

</Project>

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just pushed a commit that covers this scenario; check the unit test for some notes, but the end result is that we're scanning the .binlog correctly and after a dotnet build /bl we'd pick up and report both Newtonsoft.Json/13.0.3 and System.Text.Json/8.0.4 and ultimately, that's the end goal: map a .csproj (or rather anything that's not .sln) to a set of package names and versions that came from it, regardless of the TFM that it came from.

topLevelDependencies[projectEvaluation.ProjectFile] = topLevel;
}

if (doRemoveOperation)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a bit worried that you might be getting additem/removeitem from different evaluations in an interleaved way, and since they key by the project path this might get confused.

On the other hand it shouldn't happen because the binlog is a tree, and you're walking the tree linearly, so in theory each evaluation should be processed sequentially one after another.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What might be an indication that the traversal is bad? If I try to process RemoveItem but the item isn't already present? Or is that something that MSBuild won't allow?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I guess, but as I said, maybe I'm just being paranoid

projectDependencies[packageName] = packageVersion;
}

project = project.GetNearestParent<Project>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, why are you walking up the project chain? I don't think these items flow to the calling project?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found I needed this to track transitive dependencies. E.g., if library.csproj has <PackageReference Include="Some.Package" /> and unitTests.csproj has <ProjectReference Include="..\library\library.csproj" /> this will add Some.Package first to library.csproj and then crawl up the chain to unitTests.csproj so that the dependency is properly reported for both projects. The only oddity with this (and I may have to special case it) is the .sln file also appears in a Project node, but I do seem to be getting the proper hierarchy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I understand this. unitTests.csproj should have Some.Package in its assets file and pull assets out of it in its ResolvePackageAssets. Can you share a log or project setup where this is necessary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We explicitly can't use the assets file, because it's not 100% correct, so we have to do it this way. The PR description explains a scenario where the assets file is wrong, but from my manual testing, this will result in a correct reporting of a project and any package that ultimately came from building it. I couldn't think of a scenario where this wasn't the case, but let me know if I missed one, it would make a great test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm saying that the targets that read the assets file should produce Some.Package in the transitive case, so walking the project graph here doesn't make sense to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...that read the assets file...

That's the problem, the assets file could be wrong, so I need to crawl it manually. The PR description explains a scenario where the assets file isn't correct.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assets file will still be a superset of what the project might use. You get the "real usage" by looking at the outcome of package resolution / conflict resolution.

If you walk the project graph you end up trying to replay parts of the build. We shouldn't do that when we can just observe what it did.

Also - the assets file isn't "Incorrect" here - it is correct from NuGet's perspective. It's just that the build has more policy that it applies to decide if it will actually use a package's contents. If that package contributes assets which are "older" than some other contribution it will be dropped. The most common case is framework, as you mentioned -- it's what we designed the feature for in the SDK. Technically it could be any conflict though - when any two packages try to provide the same file the build will compare them to decide who's copy wins.

@KirillOsenkov
Copy link
Member

As an alternative, you could also see how binlogtool listnuget msbuild.binlog output.txt does it. It opens all embedded project.assets.json files, reads the contents into a model, and walks all of them to collect a flat list of all packages and versions used by the build.

I think your approach is fine too (?)

@jeffkl as an expert FYI, in case he wants to take a quick look

@KirillOsenkov
Copy link
Member

actually we could diff the results of binlogtool listnuget with these results, and see if there are any discrepancies. Would be like an oracle!

@brettfo
Copy link
Member Author

brettfo commented Sep 19, 2024

As an alternative, you could also see how binlogtool listnuget msbuild.binlog output.txt does it. It opens all embedded project.assets.json files, reads the contents into a model, and walks all of them to collect a flat list of all packages and versions used by the build.

I think your approach is fine too (?)

@jeffkl as an expert FYI, in case he wants to take a quick look

I explicitly don't want the contents of project.assets.json because they don't necessarily reflect what ends up in the output. There are more details in the original description, but in some cases the SDK will replace an assembly that NuGet has already resolved, so according to NuGet, a certain package ended up in the build, but in reality it's not there and shouldn't get reported.

@KirillOsenkov
Copy link
Member

I see, makes sense! 👍🏻

@brettfo
Copy link
Member Author

brettfo commented Sep 19, 2024

For those interested, I ran the package Microsoft.Extensions.Configuration.Json/6.0.0 through both the binlogtool suggested above and this PR. The results are as follows:

binlogtool:

Microsoft.Extensions.Configuration/6.0.0
Microsoft.Extensions.Configuration.Abstractions/6.0.0
Microsoft.Extensions.Configuration.FileExtensions/6.0.0
Microsoft.Extensions.Configuration.Json/6.0.0
Microsoft.Extensions.FileProviders.Abstractions/6.0.0
Microsoft.Extensions.FileProviders.Physical/6.0.0
Microsoft.Extensions.FileSystemGlobbing/6.0.0
Microsoft.Extensions.Primitives/6.0.0
System.Runtime.CompilerServices.Unsafe/6.0.0
System.Text.Encodings.Web/6.0.0
System.Text.Json/6.0.0

This PR:

Microsoft.Extensions.Configuration/6.0.0
Microsoft.Extensions.Configuration.Abstractions/6.0.0
Microsoft.Extensions.Configuration.FileExtensions/6.0.0
Microsoft.Extensions.Configuration.Json/6.0.0
Microsoft.Extensions.FileProviders.Abstractions/6.0.0
Microsoft.Extensions.FileProviders.Physical/6.0.0
Microsoft.Extensions.FileSystemGlobbing/6.0.0
Microsoft.Extensions.Primitives/6.0.0

The difference being that the SDK (6.0.425 on my machine) removed the System.* packages.

if (!topLevelDependencies.TryGetValue(projectEvaluation.ProjectFile, out var topLevel))
{
topLevel = new(StringComparer.OrdinalIgnoreCase);
topLevelDependencies[projectEvaluation.ProjectFile] = topLevel;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The restore evaluation is going to be pretty useless (it gets packages downloaded but doesn't produce any of the related items). You will definitely need to do all of the inner-build evaluations so that you get the superset of references, e.g.

<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <TargetFrameworks>net8.0;net472</TargetFrameworks>
  </PropertyGroup>

  <ItemGroup>
    <PackageReference Include="Newtonsoft.Json" Version="13.0.3"
                      Condition=" '$(TargetFramework)' == 'net8.0' " />
    <PackageReference Include="System.Text.Json" Version="8.0.4"
                      Condition=" '$(TargetFramework)' == 'net472' " />
  </ItemGroup>

</Project>


if (doRemoveOperation)
{
topLevel.Remove(packageName);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically I think this is going to cause problems if one TF of a project references a thing and the other gets it removed by framework unification.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have an example of this? I'd like to add a test to make sure we don't remove anything that shouldn't be removed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try this:

<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <TargetFrameworks>net472;net8.0</TargetFrameworks>
  </PropertyGroup>

  <ItemGroup>
    <PackageReference Include="System.Text.Json" Version="6.0.0" />
  </ItemGroup>

</Project>

This project DOES use STJ 6.0.0, but ONLY for the net472 TF; it should be removed by conflict resolution against the net8.0 framework in that TF.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, this is a great example. I've updated it locally to track the add/remove operations per project evaluation ID so the evaluation of the project with net8.0 doesn't remove the add operation from net472. I'll work on adding a test for this.

projectDependencies[packageName] = packageVersion;
}

project = project.GetNearestParent<Project>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I understand this. unitTests.csproj should have Some.Package in its assets file and pull assets out of it in its ResolvePackageAssets. Can you share a log or project setup where this is necessary?

<PackageVersion Include="MSTest.TestFramework" Version="3.5.1" />
<PackageVersion Include="MSTest.Analyzers" Version="3.5.1" />
<PackageVersion Include="MSTest.TestAdapter" Version="3.5.1" />
<PackageVersion Include="Microsoft.Build.Framework" Version="17.5.0" />
<PackageVersion Include="Microsoft.Build.Locator" Version="1.6.1" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a concern here. Are you running as a standalone .NET 6 application currently? If so, you won't be able to load MSBuild from new SDKs. Have you tried on a machine that has only .NET 8 installed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to work in the unit tests which should only have .NET6 installed (it's installing here and if I understand that action correctly, it uses global.json to determine what to install, so no .NET 8 on the CI machine)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, the problem will be when you run the tool on a different machine which has only .NET 8 installed.

This corresponds to a project with multiple TFMs where the same package is imported in each case, but with a different version each time.
["ResolvedSingleFileHostPack"] = ("NuGetPackageId", "NuGetPackageVersion"),
["ResolvedComHostPack"] = ("NuGetPackageId", "NuGetPackageVersion"),
["ResolvedIjwHostPack"] = ("NuGetPackageId", "NuGetPackageVersion"),
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it also possible to observe props/targets that NuGet might have imported from the package? For those you'd need to parse out the path, but I do think you could reverse it from any props or targets that are under $(NuGetPackageRoot)

// regular restore operations
["NativeCopyLocalItems"] = ("NuGetPackageId", "NuGetPackageVersion"),
["ResourceCopyLocalItems"] = ("NuGetPackageId", "NuGetPackageVersion"),
["RuntimeCopyLocalItems"] = ("NuGetPackageId", "NuGetPackageVersion"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants