background
The Internet Computer protocol coordinates the creation and interaction of subnet blockchains created by standardized node machines run by independent owners and installed in independent data centers around the world to ensure decentralization. Compared with other blockchains, the Internet Computer has stronger requirements on the performance and availability of nodes.
This is because most node resources are dedicated to performing useful work, such as executing smart contracts and participating in threshold cryptography - these tasks need to be performed by all nodes of a given subnet blockchain, and having much lower replication than other blockchains is more important for energy and cost efficiency.
The Internet Computer is designed in a way that allows anyone to become a Node Provider (NP) in a decentralized manner, with each Node Provider being verified and voted on by token holders through the Network Nervous System (NNS, the DAO that governs the Internet Computer). The NNS acts as a decentralized algorithmic authority responsible for overseeing the operations and development of the network, including expanding the capacity of the Internet Computer by adding more nodes.
In this regard, it makes sense to be able to measure node contribution and allow its providers to efficiently diagnose node problems. With trustworthy node metrics, the compensation model for node providers can be adjusted to reward precise node contribution rather than a fixed monthly amount to cover hardware and operating costs.
Trustworthy indicators
Until now, the health of nodes is measured by collecting and analyzing logs and metrics on the Internet Computer’s external infrastructure. When there is a deviation from the expected metric value, the corresponding node providers and data centers are currently responsible for fixing the situation, but this is not completely trustless.
Over the past few months, the Internet Computer protocol has been improved through the changes outlined below to allow the network nodes themselves to perform certain monitoring tasks in a fully automated, trustless manner. Now, any party can gather information about the health of any node and its contributions purely by interacting with the Internet Computer itself, without the need for additional trust assumptions.
While users typically need to process all blocks to infer information on other blockchains, users on the Internet Computer can rely on chain key technology and threshold signatures to directly retrieve node metrics.
In the long term, the availability of verifiable node metrics will lead to further refinement and improvement of the node compensation process, as insights gained through decentralized monitoring will allow for automatic adjustments to payments based on a node’s performance, or rather lack thereof.
ICP Architecture for Trusted Node Metrics
How consensus always works
The job of the Internet Computer consensus layer is to order the inputs to a subnet so that all nodes in the subnet process them in the same order. The Internet Computer consensus protocol does this by creating a blockchain containing the inputs and handing the contents off to the message routing layer, which will ensure that the inputs reach their destination.
To do this, the consensus protocol relies on an unbiased and unpredictable pseudo-random function to determine which node should create the next block. If the selected node is not fast enough, the pseudo-random function chooses another node to generate a block.
Recently Updated
Consensus now provides Message Routing (MR) with information about which nodes succeeded in becoming block producers and which nodes failed to become block producers even though it was their turn.
In turn, the MR layer adds this information to the replicated state, which is threshold-signed by the nodes in the subnet to ensure that all honest nodes have the same state. For a node belonging to a subnet, a measure of the number of successfully proposed blocks and the number of their failures is accumulated in the replicated state.
For each day during the 60-day period, this accumulated state is saved as a snapshot, including the last replication state update before midnight in the snapshot queue (sorted in ascending chronological order). Snapshots in the queue are immutable, meaning they do not include the current state.
More functionality must be provided to make it useful. More precisely, node providers and members of the ICP community may be interested in different ranges. Therefore, there is now functionality to query a date range, returning the difference between the end and start values of the range. Since subnet membership may change over time, a pruning mechanism must be provided.
If, when a new snapshot is about to be pushed, the relevant node ID has not recorded any stat changes compared to the previous snapshot, then the node ID will be pruned, and this must also be taken into account when taking the difference between range query snapshots.
To make it available externally, a new endpoint node_metrics_history is created via the management container which returns data from a snapshot of a given date range as explained above, with more details described in the IC interface specification.
Please note that this API is considered experimental, in other words, while feedback is greatly appreciated, container developers must be aware that the API may evolve in non-backwards-compatible ways.
Since retrieving node metrics consumes resources (CPU, memory, bandwidth), the endpoint can only be called by containers to prevent abuse, and each request to obtain metrics is charged, making it more difficult for malicious users to exploit the interface for DOS attacks.
Tools for trusted node metrics
The DFINITY R&D team has created open source tooling that allows node providers and any other interested parties to pull metrics from the management container for all subnets and inspect them in detail.
Additionally, it provides information about subnet membership changes (for example, when a node joins a subnet, it will not contribute blocks until it has completed state synchronization). The tool retrieves metrics from all subnets in parallel to reduce the time required to obtain them.
All data is retrieved via update calls to prevent potentially malicious nodes from providing false data. Typically, it takes less than 10 seconds to collect the latest metrics from all 37 subnets. The metrics can then be stored in a local file in JSON format and further analyzed by other tools.
Please refer to the following resources for more information:
dfinity.github.io/dre/trustworthy-metrics/trustworthy-metrics.html
A gateway to more milestones
The ability to obtain trustworthy node metrics brings the Internet Computer to the next milestone in transparency and operational efficiency. By providing clear insights into node performance, it lays the foundation for decentralized data-driven decision-making and future enhancements to the node compensation process.
Get trustworthy node metrics:
dfinity.github.io/dre/trustworthy-metrics/trustworthy-metrics.html
Join the discussion:
forum.dfinity.org/t/trustworthy-node-metrics-for-useful-work/22989
For more information about nodes on the Internet Computer:
internetcomputer.org/node-providers
IC content you care about
Technology Progress | Project Information | Global Activities
Collect and follow IC Binance Channel
Get the latest news