Misplaced Pages

News server: Difference between revisions

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
Browse history interactively← Previous editContent deleted Content addedVisualWikitext
Revision as of 22:11, 6 August 2007 edit190.82.251.18 (talk) rv repeated vandalism by DMacks← Previous edit Latest revision as of 01:30, 27 November 2024 edit undoGalzigler (talk | contribs)Extended confirmed users1,385 editsm Transit server 
(94 intermediate revisions by 53 users not shown)
Line 1: Line 1:
{{short description|Type of server software}}
A '''news server''' is a set of ] used to handle ] articles. A ''reader server'' provides an interface to read and post articles, generally with the assistance of a ]. A ''transit server'' exchanges articles with other servers. Most servers can provide both functions.
{{more citations needed|date=May 2024}}
]
A '''news server''' is a collection of software used to handle ] articles.<ref name="Usenet The Other Internet">{{cite news |last=Pegoraro |first=Rob |date=January 30, 1990 |title=Usenet: The 'Other' Internet
|url=https://www.washingtonpost.com/wp-srv/tech/ffwd/education/usenet.htm |newspaper=Washington Post |access-date=July 28, 2020}}</ref> It may also refer to a computer itself which is primarily or solely used for handling Usenet. Access to Usenet is only available through news server providers.


==Transit server== ==Articles and posts==
End users often use the term "posting" to refer to a single message or file posted to Usenet. For articles containing plain text, this is synonymous with an article. For binary content such as pictures and files, it is often necessary to split the content among multiple articles. Typically through the use of numbered Subject: headers, the multiple-article postings are automatically reassembled into a single unit by the ]. Most servers do not distinguish between single and multiple-part postings, dealing only at the level of the individual component articles.<ref name="Administering Usenet">{{cite book |last1=McDermott |first1=James |last2=Phillips |first2=John |date=May 1, 1997 |title=Administering Usenet News Servers: A Comprehensive Guide to Planning, Building, and Managing Internet and Intranet News Services |publisher=Addison-Wesley |isbn=020141967X}}</ref>
Modern transit servers usually use ] to exchange news continually over the ] and similar always-on connections. In the past, servers normally employed the ] protocol, which was designed for intermittent dial-up connections. Other ''ad hoc'' protocols, including ], are less commonly seen. News servers normally connect with multiple peers, with the redundancy helping to spread loads and ensure that articles are not lost. Smaller sites, called ''leaf nodes'', are connected to one other major server.


===Headers and overviews===
Articles are routed based on information found in the header lines defined in RFC 1036. Of particular interest to a transit server are:
Each news article contains a complete set of header lines, but in common use the term "headers" is also used when referring to the ] database.<ref name="Administering Usenet" /> The overview is a list of the most frequently used headers, and additional information such as article sizes, typically retrieved by the client software using the NNTP {{mono|]}} command. Overviews make reading a newsgroup faster for both the client and server by eliminating the need to open each individual article to present them in list form.
*''Message-ID'' - a globally unique key
*''Newsgroups'' - a list of one or more ]s where the article is intended to appear
*''Distribution'' - (optional) a supplement to Newsgroups, used to restrict circulation of articles.
*''Date'' - the time when the article was created
*''Path'' - a list of the servers an article passed through on its way to the local server
*''Expires'' - (optional) the time when it is requested that the article be deleted
*''Approved'' - (optional) indicates an article that has been accepted for a moderated newsgroup
*''Control'' - (optional) contains command requests


If non-overview headers are required, such as for when using a ], it may still be necessary to use the slower method of reading all the complete article headers.<ref name="Usenet The Other Internet" /> Many clients are unable to do this, and limit filtering to what is available in the summaries.<ref name="Administering Usenet" />
In most cases, the sending server controls the article transfer process. It compares the Newsgroups and Distribution of each newly arrived article against a set of patterns called ''newsfeeds'', listing each remote server and the newsgroups its operator wishes to receive. Some senders also examine the Path; if the receiving server appears in this line, it is not offered. Other local rules may also be added. The sender transmits matching articles' Message-IDs to the receiving server. The receiver indicates which Message-IDs it has not yet stored locally, and those articles are sent.


==News server attributes==
The receiving server examines the incoming articles. A message is normally discarded if the Message-ID is duplicated by an article already received (i.e., another server sent it in the meantime), the Date or Expires lines indicate that the article is too old, the header syntax appears to be invalid, the Approved header is missing for a moderated newsgroup, or additional local rules disallow it. Most servers also maintain a list of active newsgroups. If the Newsgroups header of a new article does not match the active list, it may be discarded or placed in a special "junk" newsgroup. Once the article is stored, the server attempts to retransmit it to any servers in its own newsfeed list.
Among the operators and users of commercial news servers, common concerns are the continually increasing storage and network capacity requirements and their effects.<ref name="Administering Usenet" /> Completion (the ability of a server to successfully receive all traffic), retention (the amount of time articles are made available to readers) and overall system performance. With the increasing demands, it is common for the transit and reader server roles to be subdivided further into numbering, storage and front end systems. These server farms are continually monitored by both insiders and outsiders, and measurements of these characteristics are often used by consumers when choosing a commercial news service.


=== Speed ===
Articles with Control lines are given special handling. They are typically filed in special "control" newsgroups and may cause the server to automatically carry out exceptional actions. The ''newgroup'' and ''rmgroup'' commands can cause newsgroups to be created or removed; ''checkgroups'' can be used to reconcile the local active list with a commonly accepted set; and ''cancel'' commands are used to request the deletion of a specific article. ''ihave'' and ''sendme'' are sometimes used with UUCP to transmit lists of offered and wanted Message-IDs. Other commands (''version'', ''sendsys'', ''uuname'') are requests for server configuration details. Once used to create network maps, they now are generally obsolete.
Speed, in relation to Usenet, is how quickly a server can deliver an article to the user. The server that the user connects to is typically part of a server farm that has many servers dedicated to multiple tasks. How fast the data can move throughout this farm is the first thing that affects the speed of delivery.{{citation needed|date=August 2020}}


The speed of data traveling throughout the farm can be severely bottlenecked through hard drive operations. Retrieving the article and overview information can cause massive stress on hard drives.{{citation needed|date=August 2020}} To combat this, caching technology and cylindrical file storage systems have been developed. {{citation needed|date=August 2020}}
Specialized transit servers may omit some of these checks. Other hosts will then need to perform the checks, but the reduced processing overhead allows articles to be relayed in less time.


Once the farm is able to deliver the data to the network, then the provider has limited control over the speed to the user. Since the network path to each user is different, some users will have good routes and the data will flow quickly. Other users will have overloaded routers between them and the provider which will cause delays. About all a provider can do in that case is try moving the traffic through a different route. If the ] has limited connectivity to the network, routing changes may have little effect.
==Reader server==
A reader server is one that makes the articles available in the hierarchical ] ] format originated by ] 2.10, or offers the NNTP or ] commands, for use by newsreaders. A reader server typically also works as a transit server, but it may operate independently or serve as an alternative interface to an ]. When receiving news, this type of server must perform the additional steps of filing articles into newsgroups and assigning sequential numbers within each group. An ''Xref'' line is usually added, listing all the groups where the message appears and the sequence numbers. Unlike Message IDs, the numbers and ordering of articles will differ on each server; but related servers may force agreement by operating in a slave mode, re-using their siblings' Xref lines. Reader servers typically also maintain a ] (NOV) database that allows newsreaders to quickly obtain message summaries and present messages in threaded form.


Frequently a user can reduce the impact of network problems by using multiple connections. Some servers allow as many as 60 simultaneous connections, but this varies widely based on the provider.<ref>{{cite web |title=Usenet Server Connections Explained
Most reader servers support posting, either through NNTP or a special ''inews'' program. When an article is posted, the process is much the same as when a transit server receives news, but with additional checks. For posting, the server will normally fill in missing Path and Message-ID lines and check the syntax of headers intended for human readers, such as ''From'' and ''Subject''. If the article is posted to a moderated group, the server will attempt to mail it to the newsgroup moderator if the Approved header is absent. Additional identity checks and filters are also typically applied at this point.
|url=http://www.techsono.com/usenet/faq/server-connections |publisher=TechSono Engineering |access-date=July 28, 2020 }}</ref>


==Hybrid server== ===Article sizes===
Article sizes are limited to what each news server will accept. The larger the article size, the more space it occupies, and thus the fewer articles on each server. This generally means that a server can run with less overhead which makes for a more efficient server, but gives less articles for users to access.{{citation needed|date=August 2020}}
Smaller sites with limited network bandwidth may operate "sucking" or ] servers. These perform the same reader server role as conventional news servers, but themselves act as newsreaders to exchange articles with other reader servers. Hybrid servers allow greater flexibility in that received groups can be adjusted without manual intervention by remote server operators. They may also be the only available means to obtain articles from remote servers that do not offer conventional feeding.


===Retention===
Because hybrid servers usually use the posting function to send news, article headers are reformatted by the posting function and tracing information can be lost. Also, the delayed sucking process can result in excess activity on the remote reader servers. For these reasons, the use of hybrid servers is often discouraged or disallowed without prior agreement.
Retention is simply defined as how long the server keeps articles.<ref>{{cite web |title=Usenet Newsgroups Retention
|date=16 May 2020 |url=https://www.usenet.com/usenet-newsgroups-retention/ |publisher=Usenet.com |access-date=July 28, 2020 }}</ref> Historically, most users want retention to be long enough so that they don't need to access the server every day but not overly long retention that can overwhelm users with slow computers or network connections.<ref name="Usenet The Other Internet" /> In the modern era, high speed connections, large storage capacity, and advanced search tools allows users to utilize extensive retention without any drawbacks.


Retention is generally quoted separately for text and binary articles, though it may also vary between different groups within these categories. The times vary greatly according to the amount of storage available on the servers and continually increasing traffic. As of 2009, it is common for average news providers to have text retention of over 1000 days and binary retention of over 200 days.{{citation needed|date=August 2020}} Large news providers offer text retention up to 2480 days and binary retention of 850 days or more.{{citation needed|date=August 2020}} It's important to understand that retention time varies between different newsgroups within the text and binary categories. Omicron's HW Media is currently the Usenet server with the highest amount of binary retention, while Google is the Usenet server with the highest amount of text retention.{{citation needed|date=August 2020}}
==Common news server packages==
Well known news servers include:


It can be difficult for end users to accurately measure the retention of a server. One common method is to examine the oldest articles in a group and examine the date, but this is not always accurate. Some articles in a group may be retained for longer than others, articles from remote servers do not always arrive promptly, and at times the date headers are simply incorrect. A sampling of many or all articles, preferably in more than one newsgroup, is required to detect such anomalies.
* ] - free, transit/reader (obsolete) (])
* ] - free, transit/reader (obsolete) (Unix)
* ] - free, transit/reader, suitable for small sites (Unix, ])
* - commercial, transit or transit/reader, large feeds (Unix)
* - commercial, transit/reader/hybrid, small to medium newsfeeds (Unix, ])
* - free, transit/reader, for large feeds (Unix)
* - free, hybrid (Windows)
* ] - free, transit/reader, small to large newsfeeds (Unix, Windows)
* ] - free, hybrid, proxy for caching small feeds on a local machine (Unix)
* - commercial, transit/reader/hybrid (])
* - commercial, hybrid (Unix)
* - free, adds hybrid functionality to C News and INN
* - commercial server designed to run a private newsgroup (])


News servers do not have unlimited storage, and due to this fact they can only hold posts for a length of time before they must delete them in order to make room for new posts. This is a particular problem to ]s which transmit large volumes of articles.
==News servers in operation==
''Main article: ]''


For news servers provided by ]s as part of a user's subscription package, typical retention rates are usually only 2–4 days.{{citation needed|date=August 2020}} To deal with the increase of Usenet traffic, many providers turn to a hybrid system, in which old articles not found on the provider's server will request the article from another server with longer retention.
Among the operators and users of commercial news servers, common concerns are the continually increasing storage and network capacity requirements and their effects. Completion (the ability of a server to successfully receive all traffic), retention (the amount of time articles are made available to readers) and overall system performance are the topics of frequent discussion. With the increasing demands, it is common for the transit and reader server roles to be subdivided further into numbering, storage and front end systems. These server farms are continually monitored by both insiders and outsiders, and measurements of these characteristics are often used by consumers when choosing a commercial news service.

===Completion===
Given the large number of articles transferred between servers and the large size of individual articles, their complete propagation to any one server farm is not guaranteed. The term "completion" is used to describe how well a service is keeping up with the traffic.{{citation needed|date=August 2020}}

The primary obstacle to calculating the completion percentage is how many articles were posted. Looking at only one server, one cannot know how many articles were actually inserted throughout the network.{{citation needed|date=August 2020}} Articles may never make their way outside the originating server, or may fail to find their way out to the transit cloud. Very large articles are frequently dropped, and tend to propagate less well than smaller ones.{{citation needed|date=August 2020}}

One way to measure completion is to access multiple servers and retrieve lists of articles. Because Message-ID: headers are nominally unique throughout the network, comparison of the lists is mostly a straightforward task. Practical limitations to this type of measurement include the impossibility of obtaining lists from all servers worldwide, the fact that many servers filter out ] or employ ], and that some servers mask incompletion by hiding multipart binary sets with missing articles.{{citation needed|date=August 2020}} It is also necessary to take into account propagation times and retention; an article may simply have not yet arrived at a given server, or it may have been present but already expired. {{citation needed|date=August 2020}}

==News server operation==
===Peering===
All Usenet servers peer with one or more other servers in order to exchange articles. Occasionally, new servers appear. Although there are several web resources which may aid in finding peers, a better resource is the newsgroup (Google Groups portal). {{citation needed|date=August 2020}}

As of 2020, text feeds can usually be attained for free, while full binary feeds can be free or paid (depending on how many articles each server sends to the other). Due to the large amount of data in a full binary+text Usenet feed (can be high as 30 terabytes a day) and the high costs of transmitting that data through an IP transit provider like ], ], or ], most Usenet providers will only engage in binary peering when they are interconnected at an Internet exchange like ], ], or ].

===Spools===
When the server stores the body of an article, it places it in a disk storage area generically called a "spool".<ref name="Administering Usenet" /> There are several common ways in which the spool may be organized:

*One file per article is the oldest storage scheme, still in common use on smaller servers and replicated in many clients. Its performance capability is a direct function of the underlying ]'s ability to create, remove and locate files within a directory, and often this scheme is insufficient to keep up with modern Usenet traffic. It does, however, allow for the greatest flexibility in managing the amount and location of storage used by the server. Nearly all current software using this scheme stores articles using the ] 2.10 layout.
*Cyclical storage has been in increasingly common use since the 1990s. In this storage method, articles are appended serially to large indexed container files. When the end of the file is reached, new articles are written at the beginning of the file, overwriting the oldest entries. On some servers, this overwriting is not performed, but instead new container files are created as older ones are deleted. The major advantages of this system include predictable storage requirements if an overwriting scheme is employed, and some freedom from dependency on the underlying performance of the operating system. There is, however, less flexibility to retain articles by age rather than space used, and traditional text manipulation tools such as ] are less well suited to analyzing these files. Some degree of article longevity control can be exercised by directing subsets of the ]s to specific sets of container files.
*In some cases, a ] or similar is used to contain the spool. This is most commonly seen with ] software that also offers an NNTP interface.
*Some servers, such as ], allow multiple storage schemes to be used at once. Various hybrid storage schemes have also been used in news servers, including different organizations of the file-per-article method, or smaller containers carrying perhaps 100 articles apiece.

==Types of Servers==
A ''reader server'' provides an interface to read and post articles, generally with the assistance of a ]. A ''transit server'' exchanges articles with other servers. Most servers can provide both functions.

===Transit server===
Modern transit servers usually use ] to exchange news continually over the ] and similar always-on connections. In the past, servers normally employed the ] protocol, which was designed for intermittent dial-up connections. Other ''ad hoc'' protocols, including ], are less commonly seen. News servers normally connect with multiple peers, with the redundancy helping to spread loads and ensure that articles are not lost. Smaller sites, called ''leaf nodes'', are connected to one other major server.<ref name="Administering Usenet" />

Articles are routed based on information found in the header lines defined in RFC 1036.{{citation needed|date=August 2020}} Of particular interest to a transit server are:

*'']'' – a ]
*''Newsgroups'' – a list of one or more ]s where the article is intended to appear
*''Distribution'' – (optional) a supplement to Newsgroups, used to restrict circulation of articles.
*''Date'' – the time when the article was created
*''Path'' – a list of the servers an article passed through on its way to the local server
*''Expires'' – (optional) the time when it is requested that the article be deleted
*''Approved'' – (optional) indicates an article that has been accepted for a ]
*''Control'' – (optional) contains ]

In most cases, the sending server controls the article transfer process. It compares the Newsgroups and Distribution of each newly arrived article against a set of patterns called ''newsfeeds'', listing each remote server and the newsgroups its operator wishes to receive. Some senders also examine the Path; if the receiving server appears in this line, it is not offered. Other local rules may also be added. The sender transmits matching articles' Message-IDs to the receiving server. The receiver indicates which Message-IDs it has not yet stored locally, and those articles are sent.<ref name="Administering Usenet" />

The receiving server examines the incoming articles. A message is normally discarded if the Message-ID is duplicated by an article already received (i.e., another server sent it in the meantime), the Date or Expires lines indicate that the article is too old, the header syntax appears to be invalid, the Approved header is missing for a moderated newsgroup, or additional local rules disallow it.{{citation needed|date=August 2020}} Most servers also maintain a list of active newsgroups. If the Newsgroups header of a new article does not match the active list, it may be discarded or placed in a special "junk" newsgroup. Once the article is stored, the server attempts to retransmit it to any servers in its own newsfeed list.<ref name="Administering Usenet" />

Articles with Control lines are given special handling. They are typically filed in special "control" newsgroups and may cause the server to automatically carry out exceptional actions. The ''<code>newgroup</code>'' and ''<code>rmgroup</code>'' commands can cause newsgroups to be created or removed; ''<code>checkgroups</code>'' can be used to reconcile the local active list with a commonly accepted set; and ''<code>cancel</code>'' commands are used to request the deletion of a specific article. ''<code>ihave</code>'' and ''<code>sendme</code>'' are sometimes used with UUCP to transmit lists of offered and wanted Message-IDs. Other commands (''<code>version</code>'', ''<code>sendsys</code>'', and ''<code>uuname</code>'') are requests for server configuration details. Once used to create network maps, they now are generally obsolete.<ref name="Administering Usenet" />

===Reader server===
A reader server is one that makes the articles available in the hierarchical ] ] format originated by ] 2.10, or offers the NNTP or ] commands, for use by newsreaders. A reader server typically also works as a transit server, but it may operate independently or serve as an alternative interface to an ]. When receiving news, this type of server must perform the additional steps of filing articles into newsgroups and assigning sequential numbers within each group. An ''Xref'' line is usually added, listing all the groups where the message appears and the sequence numbers. Unlike Message IDs, the numbers and ordering of articles will differ on each server; but related servers may force agreement by operating in a slave mode, re-using their siblings' Xref lines. Reader servers typically also maintain a ] (NOV) database that allows newsreaders to quickly obtain message summaries and present messages in threaded form.<ref name="Administering Usenet" />

Most reader servers support posting, either through NNTP or a special ''inews'' program.{{citation needed|date=August 2020}} When an article is posted, the process is much the same as when a transit server receives news, but with additional checks. For posting, the server will normally fill in missing Path and Message-ID lines and check the syntax of headers intended for human readers, such as ''From'' and ''Subject''. If the article is posted to a moderated group, the server will attempt to mail it to the newsgroup moderator if the Approved header is absent. Additional identity checks and filters are also typically applied at this point.<ref name="Administering Usenet" />

===Hybrid or cache server===
Smaller sites with limited network bandwidth may operate "sucking" or ] servers. These perform the same reader server role as conventional news servers, but themselves act as newsreaders to exchange articles with other reader servers.{{citation needed|date=August 2020}} Hybrid servers allow greater flexibility for the server operator in that received groups can be adjusted without manual intervention by operators. They may also be the only available means to obtain articles from remote servers that do not offer conventional feeding.

Because hybrid servers usually use the posting function to send news, article headers are reformatted by the posting function and tracing information can be lost. Also, the delayed sucking process can result in excess activity on the remote reader servers. For these reasons, the use of hybrid servers is often discouraged or disallowed without prior agreement.<ref name="Administering Usenet" />

==See also==
*]

==References==
{{Reflist}}


==External links== ==External links==
*{{dmoz|Computers/Software/Internet/Servers/Usenet|Usenet Servers}}
*{{dmoz|Computers/Usenet/Public_News_Servers|Public News Servers}}
* Blacklisted Usenet News Servers.
* Search Engine for Free News Servers.
* Search Engine for Free News Servers.
* Search Engine for Free News Servers.
* Free Web-Based Usenet Servers.
]


{{Usenetnav}}
]

]
]
]
]

Latest revision as of 01:30, 27 November 2024

Type of server software
This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.
Find sources: "News server" – news · newspapers · books · scholar · JSTOR (May 2024) (Learn how and when to remove this message)
Usenet Provider Map
Usenet Provider Map

A news server is a collection of software used to handle Usenet articles. It may also refer to a computer itself which is primarily or solely used for handling Usenet. Access to Usenet is only available through news server providers.

Articles and posts

End users often use the term "posting" to refer to a single message or file posted to Usenet. For articles containing plain text, this is synonymous with an article. For binary content such as pictures and files, it is often necessary to split the content among multiple articles. Typically through the use of numbered Subject: headers, the multiple-article postings are automatically reassembled into a single unit by the newsreader. Most servers do not distinguish between single and multiple-part postings, dealing only at the level of the individual component articles.

Headers and overviews

Each news article contains a complete set of header lines, but in common use the term "headers" is also used when referring to the News Overview database. The overview is a list of the most frequently used headers, and additional information such as article sizes, typically retrieved by the client software using the NNTP XOVER command. Overviews make reading a newsgroup faster for both the client and server by eliminating the need to open each individual article to present them in list form.

If non-overview headers are required, such as for when using a kill file, it may still be necessary to use the slower method of reading all the complete article headers. Many clients are unable to do this, and limit filtering to what is available in the summaries.

News server attributes

Among the operators and users of commercial news servers, common concerns are the continually increasing storage and network capacity requirements and their effects. Completion (the ability of a server to successfully receive all traffic), retention (the amount of time articles are made available to readers) and overall system performance. With the increasing demands, it is common for the transit and reader server roles to be subdivided further into numbering, storage and front end systems. These server farms are continually monitored by both insiders and outsiders, and measurements of these characteristics are often used by consumers when choosing a commercial news service.

Speed

Speed, in relation to Usenet, is how quickly a server can deliver an article to the user. The server that the user connects to is typically part of a server farm that has many servers dedicated to multiple tasks. How fast the data can move throughout this farm is the first thing that affects the speed of delivery.

The speed of data traveling throughout the farm can be severely bottlenecked through hard drive operations. Retrieving the article and overview information can cause massive stress on hard drives. To combat this, caching technology and cylindrical file storage systems have been developed.

Once the farm is able to deliver the data to the network, then the provider has limited control over the speed to the user. Since the network path to each user is different, some users will have good routes and the data will flow quickly. Other users will have overloaded routers between them and the provider which will cause delays. About all a provider can do in that case is try moving the traffic through a different route. If the ISP has limited connectivity to the network, routing changes may have little effect.

Frequently a user can reduce the impact of network problems by using multiple connections. Some servers allow as many as 60 simultaneous connections, but this varies widely based on the provider.

Article sizes

Article sizes are limited to what each news server will accept. The larger the article size, the more space it occupies, and thus the fewer articles on each server. This generally means that a server can run with less overhead which makes for a more efficient server, but gives less articles for users to access.

Retention

Retention is simply defined as how long the server keeps articles. Historically, most users want retention to be long enough so that they don't need to access the server every day but not overly long retention that can overwhelm users with slow computers or network connections. In the modern era, high speed connections, large storage capacity, and advanced search tools allows users to utilize extensive retention without any drawbacks.

Retention is generally quoted separately for text and binary articles, though it may also vary between different groups within these categories. The times vary greatly according to the amount of storage available on the servers and continually increasing traffic. As of 2009, it is common for average news providers to have text retention of over 1000 days and binary retention of over 200 days. Large news providers offer text retention up to 2480 days and binary retention of 850 days or more. It's important to understand that retention time varies between different newsgroups within the text and binary categories. Omicron's HW Media is currently the Usenet server with the highest amount of binary retention, while Google is the Usenet server with the highest amount of text retention.

It can be difficult for end users to accurately measure the retention of a server. One common method is to examine the oldest articles in a group and examine the date, but this is not always accurate. Some articles in a group may be retained for longer than others, articles from remote servers do not always arrive promptly, and at times the date headers are simply incorrect. A sampling of many or all articles, preferably in more than one newsgroup, is required to detect such anomalies.

News servers do not have unlimited storage, and due to this fact they can only hold posts for a length of time before they must delete them in order to make room for new posts. This is a particular problem to binary newsgroups which transmit large volumes of articles.

For news servers provided by Internet Service Providers as part of a user's subscription package, typical retention rates are usually only 2–4 days. To deal with the increase of Usenet traffic, many providers turn to a hybrid system, in which old articles not found on the provider's server will request the article from another server with longer retention.

Completion

Given the large number of articles transferred between servers and the large size of individual articles, their complete propagation to any one server farm is not guaranteed. The term "completion" is used to describe how well a service is keeping up with the traffic.

The primary obstacle to calculating the completion percentage is how many articles were posted. Looking at only one server, one cannot know how many articles were actually inserted throughout the network. Articles may never make their way outside the originating server, or may fail to find their way out to the transit cloud. Very large articles are frequently dropped, and tend to propagate less well than smaller ones.

One way to measure completion is to access multiple servers and retrieve lists of articles. Because Message-ID: headers are nominally unique throughout the network, comparison of the lists is mostly a straightforward task. Practical limitations to this type of measurement include the impossibility of obtaining lists from all servers worldwide, the fact that many servers filter out spam or employ Usenet Death Penalties, and that some servers mask incompletion by hiding multipart binary sets with missing articles. It is also necessary to take into account propagation times and retention; an article may simply have not yet arrived at a given server, or it may have been present but already expired.

News server operation

Peering

All Usenet servers peer with one or more other servers in order to exchange articles. Occasionally, new servers appear. Although there are several web resources which may aid in finding peers, a better resource is the newsgroup news.admin.peering (Google Groups portal).

As of 2020, text feeds can usually be attained for free, while full binary feeds can be free or paid (depending on how many articles each server sends to the other). Due to the large amount of data in a full binary+text Usenet feed (can be high as 30 terabytes a day) and the high costs of transmitting that data through an IP transit provider like Cogent, Telia, or Zayo, most Usenet providers will only engage in binary peering when they are interconnected at an Internet exchange like AMS-IX, SIX, or DeCIX.

Spools

When the server stores the body of an article, it places it in a disk storage area generically called a "spool". There are several common ways in which the spool may be organized:

  • One file per article is the oldest storage scheme, still in common use on smaller servers and replicated in many clients. Its performance capability is a direct function of the underlying operating system's ability to create, remove and locate files within a directory, and often this scheme is insufficient to keep up with modern Usenet traffic. It does, however, allow for the greatest flexibility in managing the amount and location of storage used by the server. Nearly all current software using this scheme stores articles using the B News 2.10 layout.
  • Cyclical storage has been in increasingly common use since the 1990s. In this storage method, articles are appended serially to large indexed container files. When the end of the file is reached, new articles are written at the beginning of the file, overwriting the oldest entries. On some servers, this overwriting is not performed, but instead new container files are created as older ones are deleted. The major advantages of this system include predictable storage requirements if an overwriting scheme is employed, and some freedom from dependency on the underlying performance of the operating system. There is, however, less flexibility to retain articles by age rather than space used, and traditional text manipulation tools such as grep are less well suited to analyzing these files. Some degree of article longevity control can be exercised by directing subsets of the newsgroups to specific sets of container files.
  • In some cases, a relational database or similar is used to contain the spool. This is most commonly seen with Internet forum software that also offers an NNTP interface.
  • Some servers, such as INN, allow multiple storage schemes to be used at once. Various hybrid storage schemes have also been used in news servers, including different organizations of the file-per-article method, or smaller containers carrying perhaps 100 articles apiece.

Types of Servers

A reader server provides an interface to read and post articles, generally with the assistance of a news client. A transit server exchanges articles with other servers. Most servers can provide both functions.

Transit server

Modern transit servers usually use NNTP to exchange news continually over the Internet and similar always-on connections. In the past, servers normally employed the UUCP protocol, which was designed for intermittent dial-up connections. Other ad hoc protocols, including e-mail, are less commonly seen. News servers normally connect with multiple peers, with the redundancy helping to spread loads and ensure that articles are not lost. Smaller sites, called leaf nodes, are connected to one other major server.

Articles are routed based on information found in the header lines defined in RFC 1036. Of particular interest to a transit server are:

  • Message-ID – a globally unique key
  • Newsgroups – a list of one or more newsgroups where the article is intended to appear
  • Distribution – (optional) a supplement to Newsgroups, used to restrict circulation of articles.
  • Date – the time when the article was created
  • Path – a list of the servers an article passed through on its way to the local server
  • Expires – (optional) the time when it is requested that the article be deleted
  • Approved – (optional) indicates an article that has been accepted for a moderated newsgroup
  • Control – (optional) contains command requests

In most cases, the sending server controls the article transfer process. It compares the Newsgroups and Distribution of each newly arrived article against a set of patterns called newsfeeds, listing each remote server and the newsgroups its operator wishes to receive. Some senders also examine the Path; if the receiving server appears in this line, it is not offered. Other local rules may also be added. The sender transmits matching articles' Message-IDs to the receiving server. The receiver indicates which Message-IDs it has not yet stored locally, and those articles are sent.

The receiving server examines the incoming articles. A message is normally discarded if the Message-ID is duplicated by an article already received (i.e., another server sent it in the meantime), the Date or Expires lines indicate that the article is too old, the header syntax appears to be invalid, the Approved header is missing for a moderated newsgroup, or additional local rules disallow it. Most servers also maintain a list of active newsgroups. If the Newsgroups header of a new article does not match the active list, it may be discarded or placed in a special "junk" newsgroup. Once the article is stored, the server attempts to retransmit it to any servers in its own newsfeed list.

Articles with Control lines are given special handling. They are typically filed in special "control" newsgroups and may cause the server to automatically carry out exceptional actions. The newgroup and rmgroup commands can cause newsgroups to be created or removed; checkgroups can be used to reconcile the local active list with a commonly accepted set; and cancel commands are used to request the deletion of a specific article. ihave and sendme are sometimes used with UUCP to transmit lists of offered and wanted Message-IDs. Other commands (version, sendsys, and uuname) are requests for server configuration details. Once used to create network maps, they now are generally obsolete.

Reader server

A reader server is one that makes the articles available in the hierarchical disk directory format originated by B News 2.10, or offers the NNTP or IMAP commands, for use by newsreaders. A reader server typically also works as a transit server, but it may operate independently or serve as an alternative interface to an Internet forum. When receiving news, this type of server must perform the additional steps of filing articles into newsgroups and assigning sequential numbers within each group. An Xref line is usually added, listing all the groups where the message appears and the sequence numbers. Unlike Message IDs, the numbers and ordering of articles will differ on each server; but related servers may force agreement by operating in a slave mode, re-using their siblings' Xref lines. Reader servers typically also maintain a News Overview (NOV) database that allows newsreaders to quickly obtain message summaries and present messages in threaded form.

Most reader servers support posting, either through NNTP or a special inews program. When an article is posted, the process is much the same as when a transit server receives news, but with additional checks. For posting, the server will normally fill in missing Path and Message-ID lines and check the syntax of headers intended for human readers, such as From and Subject. If the article is posted to a moderated group, the server will attempt to mail it to the newsgroup moderator if the Approved header is absent. Additional identity checks and filters are also typically applied at this point.

Hybrid or cache server

Smaller sites with limited network bandwidth may operate "sucking" or cache servers. These perform the same reader server role as conventional news servers, but themselves act as newsreaders to exchange articles with other reader servers. Hybrid servers allow greater flexibility for the server operator in that received groups can be adjusted without manual intervention by operators. They may also be the only available means to obtain articles from remote servers that do not offer conventional feeding.

Because hybrid servers usually use the posting function to send news, article headers are reformatted by the posting function and tracing information can be lost. Also, the delayed sucking process can result in excess activity on the remote reader servers. For these reasons, the use of hybrid servers is often discouraged or disallowed without prior agreement.

See also

References

  1. ^ Pegoraro, Rob (January 30, 1990). "Usenet: The 'Other' Internet". Washington Post. Retrieved July 28, 2020.
  2. ^ McDermott, James; Phillips, John (May 1, 1997). Administering Usenet News Servers: A Comprehensive Guide to Planning, Building, and Managing Internet and Intranet News Services. Addison-Wesley. ISBN 020141967X.
  3. "Usenet Server Connections Explained". TechSono Engineering. Retrieved July 28, 2020.
  4. "Usenet Newsgroups Retention". Usenet.com. 16 May 2020. Retrieved July 28, 2020.

External links

Usenet
History
Terminology
Hierarchies
News server
Newsgroups (List)
alt.*
By topic
Clients
Newsreaders (List, Comparison)
Categories: