Automate the creation of a NOTICE.md file
When working on a project that’s being distributed you are often required to create a NOTICE file giving the necessary attribution to work you rely on. As written on the Apache site on this topic:
If the Work includes a “NOTICE” text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License.
It’s not hard or complex to create these files, but it’s a LOT of work when starting.
For reasons I can’t disclose, I was tasked to create such a file. This would require me to:
- Track down all project sites from our NuGet dependencies, copy the licenses, add the dependency, version number and license in the
NOTICE.md
file.
Or
- Automate the above
Can you guess what I did?
Of course, the automation.
Create the NOTICE.md file with a script
The first thing that needs to be done is iterate over all NuGet dependecies and get their corresponding licenses.
There’s a .NET tool for that called nuget-license. You can install this using the dotnet CLI.
dotnet tool install --global nuget-license
Once installed, navigate to the source folder of the project/solution and create the license information files using the nuget-license tool. I’ve used the following command.
nuget-license -i MySolution.sln -d .\license-information -o JsonPretty -fo licenses.json
This makes sure all the license information will be stored in a folder called license-information
, along with the validation output in the licenses.json
folder.
All information requred to create the NOTICE.md
file is now present on your filesystem.
I’ve created a small Python script to iterate over all files in the license-information
folder. Some licenses are provided in plain-text, others in HTML. To work with the HTML, I’m using the beautifulsoup4 package which you can install in your environment by running pip install beautifulsoup4
.
import os
import json
from bs4 import BeautifulSoup
# Define file paths
licenses_json_path = "./src/licenses.json"
license_folder_path = "./src/license-information"
output_file_path = "./NOTICE.md"
# Check if the JSON file and license folder exist
if not os.path.exists(licenses_json_path):
raise FileNotFoundError(f"licenses.json file not found: {licenses_json_path}")
if not os.path.exists(license_folder_path):
raise FileNotFoundError(f"License folder not found: {license_folder_path}")
# Load the JSON file
with open(licenses_json_path, "r", encoding="utf-8") as f:
licenses = json.load(f)
# Initialize the NOTICE.md content
notice_content = """# NOTICES AND INFORMATION
Do Not Translate or Localize
This software incorporates material from third parties. Below is a list of the open-source packages used in this project along with their respective licenses.
"""
# Iterate over each package in the JSON file
for package in licenses:
package_id = package.get("PackageId")
package_version = package.get("PackageVersion")
license_url = package.get("LicenseUrl")
project_url = package.get("PackageProjectUrl")
authors = package.get("Authors", "Unknown")
# Add package metadata to the NOTICE content
notice_content += f"""## {package_id} ({package_version})
- **Authors**: {authors}
- **License**: [{license_url}]({license_url})
- **Source**: [{project_url}]({project_url})
"""
# Check if a license file exists for the package
license_file_html = os.path.join(license_folder_path, f"{package_id}__{package_version}.html")
license_file_txt = os.path.join(license_folder_path, f"{package_id}__{package_version}.txt")
license_text = None # Initialize license_text as None
if os.path.exists(license_file_txt):
# If it's a .txt file, use the raw content
with open(license_file_txt, "r", encoding="utf-8") as txt_file:
license_text = txt_file.read().strip()
elif os.path.exists(license_file_html):
# If it's an .html file, parse it and extract the raw HTML content under the header "License text"
with open(license_file_html, "r", encoding="utf-8") as html_file:
soup = BeautifulSoup(html_file, "html.parser")
# Find the header with the text "License text" (h1, h2, or h3)
license_header = soup.find(lambda tag: tag.name in ["h1", "h2", "h3"] and tag.get_text(strip=True) == "License text")
if license_header:
# Collect all sibling elements until the next header (h1, h2, or h3)
raw_html_content = ""
for sibling in license_header.find_next_siblings():
if sibling.name in ["h1", "h2", "h3"]:
break # Stop if another header is encountered
raw_html_content += str(sibling)
license_text = raw_html_content.strip()
# Append the license text only if it exists
if license_text:
notice_content += f"""### License Text
{license_text}
"""
# Write the NOTICE.md file
with open(output_file_path, "w", encoding="utf-8") as f:
f.write(notice_content)
print(f"NOTICE.md file has been generated at: {output_file_path}")
I’ve used this script to create our initial notice file and it works quite well. You can even run this in your build pipelines, so it’s always reflecting the most recent state of the project.
This script has saved me a LOT of hours for the project I’m working and I hope it’ll help you too.