چکیده:
Controlled vocabularies have been frequently used in information retrieval
systems. Control of the vocabularies and evaluating the utility of their terms are
two critical questions. This research aims at the development of Persian subject
headings through statistical analyses. The current research was conducted on
more than 450,000 records extracted from the electronic version of National
Bibliography of Iran (NBI). Data has been processed through data mining
techniques. The correlation analysis was performed to determine the relationship
between the number of items in NBI and the number of Persian subject headings
as well as the rank of each subject heading and its use frequency in NBI.The
count of new subject headings vs. the count of new catalogued materials in NBI
grew linearly at the beginning and increased logarithmically when the number of
catalogued materials reached 3,200. The analysis of the use frequency of distinct
headings within NBI resulted in three classes: most, frequent, and normal used
subject headings. The findings partly agree with Lancaster’s prediction, as he
states that a controlled vocabulary will grow very fast in the beginning. It was
also found that the majority of subject headings are rarely used by NBI. It is due
to absence of a mechanism to control the building of new headings.
خلاصه ماشینی:
The correlation analysis was performed to determine the relationship between the number of items in NBI and the number of Persian subject headings as well as the rank of each subject heading and its use frequency in NBI.
Keywords: National Bibliography of Iran, Persian Subject Headings, Controlled Vocabularies, Use Frequency.
A pan of the current research focuses on the size of Persian Subject Headings which is a controlled vocabulary developed by the NBI as a tool for cataloguing Persian books.
Making use of the method that Zipf (1949) established to study the distribution of words in natural texts may help evaluate the utility of terms within the controlled vocabularies.
They reported that a log-normal (logarithmic) relationship exists between total index entries and distribution of term usage, but Cleverdon et al (1966) in a work related to the factors determining the performance of indexing systems found a log-log (power-law) relationship between the number of documents and the number of index terms .
Materials and Methods There is a relationship between the expansion of NBI items and the development of the Persian subject headings list.
The first one focused on achieving data to find the relationship between the inclusion of items to NBI and the development of Persian subject headings.
Conclusion The main objectives of this research were: l- to find the relationship between the inclusion of new items to NBI and the need for creation of further headings; and, 2- to determine the frequency distribution of Persian subject headings in NBI.