Saturday, January 24, 2026

JWT (pronounced jot) JSON Web Token

JSON Web Token (JWT) is an open standard (RFC 7519) that defines a compact and self-contained way for securely transmitting information between parties as a JSON object. This information can be verified and trusted because it is digitally signed. JWTs can be signed using a secret (with the HMAC algorithm) or a public/private key pair using RSA or ECDSA.

Although JWTs can be encrypted to also provide secrecy between parties, we will focus on signed tokens. Signed tokens can verify the integrity of the claims contained within it, while encrypted tokens hide those claims from other parties. When tokens are signed using public/private key pairs, the signature also certifies that only the party holding the private key is the one that signed it.

In its compact form, JSON Web Tokens consist of three parts separated by dots (.), which are:

  • Header
  • Payload
  • Signature

Whenever the user wants to access a protected route or resource, the user agent should send the JWT, typically in the Authorization header using the Bearer schema.

Validation and Verification

Validate a JWT to make sure the token makes sense, adheres to the expected standards, contains the right data.

Verify a JWT to make sure the token hasn't been altered maliciously and comes from a trusted source.

JWT validation generally refers to checking the structure, format, and content of the JWT:

  • Structure: Ensuring the token has the standard three parts (header, payload, signature) separated by dots.
  • Format: Verifying that each part is correctly encoded (Base64URL) and that the payload contains expected claims.
  • Content: Checking if the claims within the payload are correct, such as expiration time (exp), issued at (iat), not before (nbf), among others, to ensure the token isn't expired, isn't used before its time, etc.

JWT verification involves confirming the authenticity and integrity of the token:

  • Signature Verification: This is the primary aspect of verification where the signature part of the JWT is checked against the header and payload. This is done using the algorithm specified in the header (like HMAC, RSA, or ECDSA) with a secret key or public key. If the signature doesn't match what's expected, the token might have been tampered with or is not from a trusted source.
  • Issuer Verification: Checking if the iss claim matches an expected issuer.
  • Audience Check: Ensuring the aud claim matches the expected audience.

source: https://www.jwt.io/introduction#when-to-use-json-web-tokens

Public and Private Keys with JWTs

With JWT, the possession and the use of the key materials are exactly the same as in any other contexts where cipher operations occur.

Signing

  • The private key is owned by the issuer and is used to compute the signature.
  • The public key can be shared with all parties that need to verify the signature.

Encryption

  • The private key is owned by the recipient and is used to decrypt the data.
  • The public key can be shared with any party that wants to send sensitive data to the recipient.

Encryption is rarely used with JWT. Most of the time the HTTPS layer is sufficient and the token itself only contains a information that is not sensitive.

The issuer of the token (the authentication server) has a private key to generate signed tokens (JWS). These tokens are sent to the clients (an API server, a web/native application). 

The clients can verify the token with the public key. The key is usually fetched using a public URI.

If you have sensitive data that shall not be disclosed to a third party (phone numbers, personal address), then the encrypted tokens (JWE) are highly recommended. In this case, each client (i.e. recipient of a token) shall have a private key and the issuer of the token must encrypt the token using the public key of each recipient. This means that the issuer of the token can select the appropriate key for a given client.

source: https://stackoverflow.com/a/60540274/113701

Saturday, January 17, 2026

PKI (Public Key Infrastructure)

Public key infrastructure (PKI) refers to tools used to create and manage public keys for encryption, which is a common method of securing data transfers on the internet. PKI is built into all web browsers used today, and it helps secure public internet traffic. Organizations can use it to secure the communications they send back and forth internally and also to make sure connected devices can connect securely.

The most important concept associated with PKI is the cryptographic keys that are part of the encryption process and serve to authenticate different people or devices attempting to communicate with the network.

source: https://www.fortinet.com/resources/cyberglossary/public-key-infrastructure

The components of public key infrastructure include:
  • PKI keys: A key pair used for encryption. This protects data by making it unreadable to anyone except the intended recipient. In cryptography, each public key is paired with a private key. The public key is distributed freely and openly, while the private key is secret to the owner.
  • Digital certificates: Electronic credentials that link the certificate holder’s identity to a key pair that can be used to encrypt and sign information.
  • Certificate authority (CA): An entity that verifies identities and issues digital certificates.
  • Registration authority (RA): Responsible for accepting certificate requests and authenticating the individual or organization behind them.
  • Certificate repositories: Secure storage systems that hold digital certificates for lookup and validation.
  • Centralized management software: Software that lets organizations manage keys and digital certificates from one place.
  • Hardware security module (HSM): Physical devices that perform cryptographic operations and store private keys securely.
A digital certificate, sometimes called a “public key certificate,” is an electronic document used to identify the owner of a public key. This allows the recipient to confirm the key came from a legitimate source, mitigating the risk of an MITM (man in the middle) attack. 

PKI certificates typically include:
  • Identifiable information, such as the certificate holder’s name, the certificate’s serial number, and its expiration date
  • A copy of the public key, which others can use to encrypt data and verify digital signatures, supporting both confidentiality and authentication
  • The digital signature of the issuing CA to confirm authenticity
A certificate authority (CA) is a trusted third-party organization that creates and issues digital certificates. They validate identities and help establish trust chains for secure digital communications.

All CAs maintain certificate revocation lists (CRLs), which document certificates revoked before their scheduled expiration date. This helps organizations identify certificates that are no longer valid or secure.

Saturday, January 10, 2026

TPU (Tensor Processing Unit)

Tensor Processing Units (TPUs) are a type of application-specific integrated circuit (ASIC) to address the growing computational demands of machine learning. TPUs are engineered specifically for tensor operations, which are fundamental to deep learning algorithms.

Due to their custom architecture optimized for matrix multiplication, a key operation in neural networks, they excel in processing large volumes of data and executing complex neural networks efficiently, enabling fast training and inference times.

source: https://www.datacamp.com/blog/tpu-vs-gpu-ai#what-is-a-tpu?-tenso

Saturday, September 20, 2025

5S + 1 (process improvement)

  1. Sort - remove unneeded item from the immediate workspace
  2. Straighten or set in order - position items strategically in their place
  3. Shine or sweep - clean the area
  4. Standardize - make the first 3 S's part of daily activity
  5. Sustain or self-discipline - accountability, audit, and maintain
  6. Safety - periodic audits for a safe environment

source: https://www.linkedin.com/learning/lean-six-sigma-analyze-improve-and-control-tools/5s

Saturday, August 30, 2025

Benford's Law

Benford’s law describes the relative frequency distribution for leading digits of numbers in datasets. Leading digits with smaller values occur more frequently than larger values. This law states that approximately 30% of numbers start with a 1 while less than 5% start with a 9. According to this law, leading 1s appear 6.5 times as often as leading 9s! Benford’s law is also known as the First Digit Law.

If leading digits 1 – 9 had an equal probability, they’d each occur 11.1% of the time. However, that is not true in many datasets. The graph displays the distribution of leading digits according to Benford’s law.


Analysis of datasets shows that many follow Benford’s law. For example, analysts have found that stock prices, population numbers, death rates, sports statistics, financial and tax information, and billing amounts often have leading digits that follow this distribution. 

Uses for Benford’s Law

Analysts have used it extensively to look for fraud and manipulation in financial records, tax returns, applications, and decision-making documents. They compare the distribution of leading digits in these datasets to Benford’s law. When the leading digits don’t follow the distribution, it’s a red flag for fraud in some datasets.

When Does Benford’s Law Apply and Not Apply

Benford’s law generally applies to data that fit some of the following guidelines:

  • Quantitative data.
  • Data that are measured rather than assigned.
  • Ranges over orders of magnitudes.
  • Not artificially restricted by minimums or maximums.
  • Mixed populations.
  • Larger datasets are better.

Elaborations on Guidelines

Benford’s law often does not apply to assigned numbers, such as ID numbers, phone numbers, and zip codes.

It works best for data that range over multiple orders of magnitudes from very low to very high. You can cover the 10s, 100s, 1000s, and so on. For example, population and incomes can range from very low to very high.

Conversely, if the range of values is restricted, it affects the leading digits, and Benford’s law is less likely to apply. For example, human characteristics naturally fall into restricted ranges. Consequently, this distribution doesn’t apply to human ages, heights and weights. Similarly, limits imposed on potential values can also invalidate this law. Awards in small claims courts have an upper limit, which can negate Benford’s law.

Interestingly, mathematicians have proven that numbers from mixed populations follow Benford’s law. Mixed populations are things like all numbers pulled from a magazine issue. Obviously, those numbers will represent various topics and types of values. Benford himself did that with Reader’s Digest and newspapers. You can also combine data from different sources to achieve the same effect.

Like all distributions, larger datasets will produce observed relative frequencies that more closely approximate the theoretical values of Benford’s law. Smaller datasets can create relatively large deviations due to random error. Some analysts say datasets as small as 100 are acceptable, but most think a minimum size of 500 or even 1,000 is necessary.

Curiously, it will work in some cases where it should not. For example, it applies to house numbers even though those are assigned.

Benford’s Law Formula

Benford’s law formula is the following:

Where d = the values of the leading digits from 1 to 9.

The formula calculates the probability for each leading digit. The table below displays the probabilities that Benford’s law formula calculates for all digits.

Digit     Probability

1     30.1%

2     17.6%

3     12.5%

4     9.7%

5     7.9%

6     6.7%

7     5.8%

8     5.1%

9     4.6%

source: https://statisticsbyjim.com/probability/benfords-law/

Saturday, August 23, 2025

monotropism

Monotropism is an individual's tendency to focus their attention on a small or singular number of interests at any time, with them neglecting or not perceiving lesser interests. 

The word mono ("one, single") here is chosen in contrast to poly ("many"); whereas -tropism points to "directional movement or growth".

A tendency to focus attention tightly can be seen as a state of "tunnel vision". While monotropism tends to cause people to miss things outside their attention tunnel, within it, their focused attention can lend itself to intense experiences, deep thinking, and more specifically, flow states. However, this form of hyperfocus makes it harder to redirect attention, including starting and stopping tasks.

source: https://en.wikipedia.org/wiki/Monotropism

Saturday, July 19, 2025

SIFERS (front-end unit testing technique)

Simple Injectable Functions Explicitly Returning State (SIFERS) are a way to capture what the tests should do when setting up the testing environment as well as returning a mutable clean state. A SIFERS is just a function that accepts some overridable/injectable parameters and returns some form of state. 

source: https://medium.com/@kolodny/testing-with-sifers-c9d6bb5b362

AI Overview

Core Idea

Explicit State Management

SIFERS moves away from implicit state management (e.g., using beforeEach hooks) and instead uses a single function that explicitly defines and returns the necessary state for each test. 

Function as a Service

This setup function, often called setup(), acts as a service that provides the necessary mocks, dependencies, and initial state for the unit tests. 

No State Leakage

By returning new object references for each test, SIFERS prevents any unintended side effects or state leakage from one test to another. 

How it Works

1. Define a setup() function:

This function will be responsible for creating and returning the necessary state for your tests. It can accept parameters to customize the setup based on the specific test case.

2. Return state:

The setup()function returns an object containing all the necessary dependencies, mocks, and initial state for the test.

3. Use the returned state:

In each test case, you can access the state returned by the setup()function and use it to interact with the system under test. 

Saturday, July 12, 2025

Passwordless Authentication

Passwordless Authentication is an authentication method that allows a user to gain access to an application or IT system without entering a password or answering security questions. Instead, the user provides some other form of evidence such as a fingerprint, proximity badge, or hardware token code. Passwordless Authentication is often used in conjunction with Multi-Factor Authentication (MFA) and Single Sign-On solutions to improve the user experience, strengthen security, and reduce IT operations expense and complexity.

source: https://www.cyberark.com/what-is/passwordless-authentication/

Saturday, July 5, 2025

Canonicalization, Normalization

In computer science, canonicalization (sometimes standardization or normalization) is a process for converting data that has more than one possible representation into a "standard", "normal", or canonical form. 

This can be done to compare different representations for equivalence.

URL

A canonical URL is a URL for defining the single source of truth for duplicate content.

XML

XML canonical form, briefly defined, removes whitespace within tags, uses particular character encodings, sorts namespace references and eliminates redundant ones, removes XML and DOCTYPE declarations, and transforms relative URIs into absolute URIs.

source: https://en.wikipedia.org/wiki/Canonicalization

Phone Number

A canonical phone address is a text string with the following structure:

+ CountryCode Space [(AreaCode) Space] SubscriberNumber

For example, +1 (425) 882-8080

source: https://tapiex.com/TPNet_Help/Canonical%20Addresses.htm

Unicode Normalization Forms

Canonical and Compatibility Equivalence

Canonical equivalence is a fundamental equivalency between characters or sequences of characters which represent the same abstract character, and which when correctly displayed should always have the same visual appearance and behavior.

Normalization Forms

The Unicode Normalization Algorithm puts all combining marks in a specified order, and uses rules for decomposition and composition to transform each string into one of the Unicode Normalization Forms.

The four Unicode Normalization Forms are: 

  • Normalization Form D (NFD) = Canonical Decomposition
  • Normalization Form C (NFC) = Canonical Decomposition, followed by Canonical Composition
  • Normalization Form KD (NFKD) = Compatibility Decomposition
  • Normalization Form KC (NFKC) = Compatibility Decomposition, followed by Canonical Composition

source: https://unicode.org/reports/tr15/

See also data normalization (TWOTW)

Saturday, June 28, 2025

NLWeb and MCP

Natural Language Web

NLWeb, short for Natural Language Web, aims to be the fastest and easiest way to effectively turn your website into an AI app. A natural language interface for websites using the model of their choice and their own data. Every NLWeb instance is also a Model Context Protocol (MCP) server, allowing websites to make their content discoverable and accessible to agents and other participants in the MCP ecosystem if they choose.

How does it work?

NLWeb leverages semi-structured formats like Schema.org, RSS and other data that websites already publish, combining them with LLM-powered tools to create natural language interfaces usable by both humans and AI agents. The NLWeb system enhances this structured data by incorporating external knowledge from the underlying LLMs for richer user experiences.

How do I get started?

The NLWeb GitHub repo contains everything you need to get started:

  • The lightweight code that controls the core service to handle natural language queries, as well as documentation on how this can be extended and customized.
  • Connectors to some of the most popular models and vector databases, as well as documentation to add other models of your choice.
  • Tools for adding your data in Schema.org, JSONL, RSS and other formats to your chosen vector database.
  • A web server frontend for the service and a simple UI that allows users to send queries to the web server.

source: https://news.microsoft.com/source/features/company-news/introducing-nlweb-bringing-conversational-interfaces-directly-to-the-web/

Written by Microsoft Corporate Blogs Published May 19, 2025

Model Context Protocol

The Model Context Protocol, or MCP for short, is a standard for connecting AI assistants to the systems where data resides.

MCP lets AI models draw data from sources like business tools and software to complete tasks, as well as from content repositories and app development environments.

MCP enables developers to build two-way connections between data sources and AI-powered applications (e.g., chatbots). Developers can expose data through “MCP servers” and build “MCP clients” — for instance, apps and workflows — that connect to those servers on command.

source: https://techcrunch.com/2024/11/25/anthropic-proposes-a-way-to-connect-data-to-ai-chatbots/

Transforming the Web with Natural Language: My NLWeb Presentation at Nashua CLOUD .NET & DevBoston

View the slides on SlideShare: https://www.slideshare.net/slideshow/transform-any-website-into-a-conversational-experience-with-nlweb/281034902

What Is NLWeb?

NLWeb (Natural Language Web) is a robust protocol and toolset developed by Microsoft that turns any traditional website into a conversational interface, leveraging the power of large language models. It’s built around the Model Context Protocol (MCP), allowing developers to process natural-language queries and respond using structured Schema.org JSON.

In my session, I demonstrated how NLWeb works, highlighting its design for flexibility (enabling the swapping out of models, vector databases, and embeddings), and how it seamlessly connects to data and APIs to deliver intelligent, real-time responses to users.

Real-World Impact

I also highlighted real-world use cases where NLWeb is already in action:

  • Tripadvisor – enabling users to plan family trips through conversation
  • Eventbrite – allowing event discovery through natural-language search
  • O’Reilly, Qdrant, Delish, Shopify, and others – showcasing early success in turning structured content into AI-driven UX

These examples demonstrate how businesses are already leveraging the potential of conversational web interfaces to drive engagement and discovery.

How to Get Started

For those interested in experimenting or building with NLWeb, here are a few resources I shared:

  • GitHub: https://github.com/microsoft/NLWeb
  • Quick start guide: docs/nlweb-hello-world.md
  • Local test interface: http://localhost:8000/static/debug.html
  • Azure deployment: docs/setup-azure.md

Whether you’re a developer, architect, or product leader, NLWeb offers a modern and modular approach to embedding LLM-driven intelligence into any web property.

source: https://udai.io/transforming-the-web-with-natural-language-my-nlweb-presentation-at-nashua-cloud-net-devboston/