This is the first half of a two-part workshop. The second half, presented by Wright, can be found at: Part II : What do I do with it?
Note: There is an accompanying slide presentation
This document is also part of a larger set of documents: An Introduction to Using Data at DPLS
Introduction: What is a Data Library?
One of the first data archives, if not the first, was the Roper Center, established in 1946 as an archive for public opinion polls. In 1962 the Inter-university Consortium for Political Research was established as a political data archive. During the 1960s a number data archives providing local service were established, including those at Madison, Wisconsin and Berkeley, California. Data libraries, to be distinguished from archives (which make an archival commitment to preserve data in perpetuity) at Princeton and Northwestern, to name two, were also created. In Europe, National Data Centers such as ZA - Zentralarkiv fuer Europaeische Sozialforschung, The ESRC Archive at Essex, and the Norwegian Social Science Data Services were also established during this period.
In the mid-1960s, at Wisconsin, the Data and Program Library Service (DPLS), and its administrative parent, the Data and Computation Center (DACC) were established as a result of an articulated demand for assistance in social science research problems related to large-scale data collection and computation. The DPLS was created because faculty members of the social sciences, especially in economics, political science and sociology, became convinced that the University needed a facility for managing the increasing quanitity of available machine-readable social science data being produced on the campus and elsewhere. They recognized the importance of preserving data, collected often at considerable cost, which had significant subsequent value for other researchers and students. Thus, the DPLS was designated as the local campus repository for quantitative social science machine-readable data. Its major functions were defined as acquisition, storage, maintenance, and dissemination of data files.
Although the DPLS was considered a campus library, it was not incorporated into the campus library system because of the system's inability to accept computer data as a legitimate information resource. This perception that data files were either too difficult or too obscure for traditional libraries persevered through the 1970s, and as a result other data libraries that formed during this period were nornally housed within academic departments, research institutes, or computer centers. The perception of data as a marginal resource has changed; the major functions of the DPLS have not. DPLS continues to be both a data library, acquiring and facilitating access to data as they become available from individual researchers, other archives, national data repositories and profit and nonprofit agencies, and a data archive, acquiring, preserving and disseminating studies created by faculty, students and researchers on its campus.
A. The Nature of Information-Seeking
Information-seeking can be defined as any activity of an individual that is undertaken to identify a message that satisfies a perceived need. In other words, information seeking begins when someone perceives that the current state of possessed knowledge is less than that needed to deal with some issue or problem (Krikelas).
Traditional library literature points out the importance of looking at the information seeking patterns of researchers and the activities associated with each kind of information gathering process. This emphasis in part refocuses attention on library patrons, rather than on the library and its materials in isolation from users (e.g., the difference between user studies and circulation studies). This approach is reflected in much of the library literature from the 1970's to the present, and is exemplified by the use of terms such as "customer service", "commitment to service", and "customer satisfaction", as well as emphasizing the customer-oriented mindset (Weingard).
Because most of our users are in the process of gathering and using information, and our goal is to assist them, the processes they undertake are important to examine. The following model represents the general characteristics of information-seeking, in particular in the social sciences. This model is based on a grounded-theory approach to information-seeking patterns as developed by researchers at the University of Shelton in the U.K. The eight categories are not part of the conscious vocabulary of researchers, and the model itself does not necessarily represent any explicit conceptualization of those activities by the researcher. Instead, the model should be thought of as a set of categories that, taken together, can be used to describe or explain the components of information-seeking patterns. It highlights the key features of those components and provides a framework in which the perception and activities that make up those patterns can be related in a coherent form (Ellis).
The process of deriving the categories and their properties, and the elucidation of their relationship to each other--as well as the overall structure--is one of the most creative and intellectually demanding parts of the researcher's task. From a traditional viewpoint the library's role is most significant in the selecting process: functions such as cataloguing and inter-library loan play an important part in it. Somewhat to the surprise of the profession, library user surveys such as INFROSS (Investigation into Information Requirements of the Social Sciences) in the 1970's and INISS (Information Needs in Social Services) in the 1980's showed that, while social scientists attached a high level of importance to the locating of references, they relied on citations from journals to identify those resources, or on colleagues and subject experts, rather than "traditional bibliographic tools" (Folster).
However, as we will see, in more specialized libraries such as the data library, where staff are expert and often involved in projects from their inception to their conclusion, a more universal role is available, particularly in the search for relevant sources. This differentiates data libraries in an important way from traditional libraries, which have served primarily as a source for obtaining previously identified information, but not as a resource to identify relevant information. We will more closely examine such differences, as well as the similarities, between traditional reference and data library reference.
B. Data Reference and the Traditional Library
Data library operations do not occur in a vacuum. A data reference service benefits from collaboration with "traditional" library reference services and government documents departments. The value of such collaboration has been recognized for decades (Jones), but recent trends point to it becoming even more critical as an increasing number of traditional libraries directly receive research data on CD-ROM from government sources and private vendors. Furthermore, there has been an accompanying change in traditional libraries from not only providing access to information about data to providing access to data themselves.
Traditional reference is often viewed as a spectrum, stretching from Ready Reference at one end to Research Projects at the other. In between there is a continuum of reference types such as the Selective Dissemination of Information, where the library staff notifies a user on an ongoing basis about new sources of information; or Individual Instruction, where the librarian provides instruction in using the library or a specific tool. In the data library we do all of these, but the majority of what we do resides closer to the Research Projects end of the spectrum, focusing on in-depth assistance and often relying upon non-traditional strategies for identifying information. In this respect there are close similarities between data library reference and the reference that is performed in both government documents libraries and traditional archives (Gray, Geraci).
It is useful to look at these similarities. Like archival material, data files can be seen as "unpublished", primary source material. In secondary analysis data are often used for purposes that differ from those for which they were originally collected. In both data libraries and traditional archives highly specialized collections are well-known by their expert staff, and in the traditional archive, like the data library, the most valuable finding aid is often the reference person.
In government documents parallels can be drawn from typical reference questions such as the "Which agency collects the information?" to queries we commonly hear in the data library like "What agency collects the data?" or "What survey might contain those variables?". Like government documents librarians, data librarians are often very much concerned with source material, and identifying the appropriate print document can be an important first step to locating the original data. For example, the question "Which statistical report contains information on unemployment and years of education?" may precede "What data were used to produce this table or report?"
A common thread between these three environments is the importance of adequate and experienced staffing. Avoiding turnovers and having sufficient staff to support the level of service required for research project reference is essential. In government documents, traditional archives and data archives there is a shared concern for source material, and the need to go beyond traditional resources and strategies to locate information. Networking with colleagues, contacts with producing agencies, and maintaining relationships with past researchers can all play an important role in the search.
During the reference interview the librarian makes a series of choices regarding what topics to address by questioning. The topics chosen will depend on a variety of factors, and may include the following (Bopp and Smith):
This is a wonderful and instructive list. As we look more closely at the data reference interview we will see variations on the themes that it outlines. Keeping this list in mind while progressing through a reference interview can play an important part in enhancing the quality of the effort.
It is recognized that both patrons and librarians come into the interview process with unique frames of reference that include sets of assumptions or preconditions. While such preconditions are recognized as useful in the reference interview situation, they must be tested through communication during the negotiation process so that they do not contribute to irrelevant or inappropriate responses.
In all conversations with users it is important to use clear, direct sentence structure. The use of jargon, which as we will see is unavoidable, must be done with caution. The employment of encouraging questions or comments such as "Can you give me an example?" or "I see what you mean" can facilitate the query process, as will the utilization of open-ended questions.
Open-ended questions frequently begin with words such as what, where, and how. They are to be distinguished from closed questions that often ask the user to make a definite (often "either-or") response. Both types of questions are used in reference work but they serve very different purposes, and often times it will be open-ended questions that are the most useful, and the most difficult to form. An example of an open-ended question in response to a user asking "Do you have any data on elections?" might be "What kind of election data are you seeking?" rather than "Local or national elections?". Simply put, open-ended questions reveal possibilities for further discussion, closed questions tend to focus narrowly and distinctly on a particular subject or source and have the role of further refining the professional's understanding of the client's information need.
It is also important to use neutral questions, which help avoid the pitfall of making premature judgements about a user's needs based on the professional's experience, biases, or history with that client. The importance of objectivity versus subjectivity, and keeping a close watch over non-neutral observations, will be more fully dealt with later.
Finally, research shows that most people listen at 25% efficiency. This doesn't mean that they actually heard 25 percent of what was said but rather that of all they heard, they got 25 percent of it right. This is often because although we listen, we are often also trying to plan what we will say next. The reality of this startling statistic is that while almost everyone physically hears, listening requires an additional ingredient: the act of attending. In the library literature this is often referred to as active listening: that is, focusing on what a person is saying, concentrating on them, and paying attention to them. Active listening includes asking clarifying questions and understanding the nature of the problem fully before offering advice. It is an acquired skill obtained through observation and practice, and refined via feedback from users. It is always important to be wary of things that get in the way of active listening such as emotions, egos, and other distractions.
"The use of subject description to represent intellectual content is clearly a linguistic act of some kind" (Blair).
There is a growing amount of library literature that focuses on the importance of language in the reference interview. Much of this literature deals with the fact that there are a potentially unlimited number of ways that a 'document' can be requested (for example by author, title, subject; singularly and in combination). Traditional library literature refers to context versus subject document representation, describing the first as representing the context which produced the document (or in which it currently exists), and being fairly straightforward and unambiguous (e.g., author name, title, etc.), but the latter, based on "intellectual content" and the selection of subject "terms", as being possibly ambiguous and complex. Later in the context of data library reference we will use the terms "citation" and "topical" searching to represent similar concepts.
In the interaction between an inquirer and a human advisor (defined here as someone who understands what the inquirer wants and not simply showing them how to operate an information system -- that is aiding intellectual access rather than physical access), a great number of subject descriptions and search query alternatives can exist. However, as in an interaction with a computerized system, the search queries must be constructed from a set of mutually understandable terms.
This may be easier said than done. One might look to professional indexing as a model for the development of such mutually understandable language, but traditionally an indexer's job is to accurately describe the content and context of documents regardless of how the inquirer might describe that content rather than in ways that would be understandable to the inquirer (this is a variation material-based versus user-based dichotomy discussed previously). Furthermore, studies of inter-indexer consistency have found agreement in selection of single subject terms rarely being greater than 75% (Zunde). What this means is that even among expert professionals, there are linguistic variations and often disagreement about the language of description. Language differences between librarians and users will obviously be even more pronounced. This is particularly true in highly specialized fields such as economics, where the use of jargon is commonplace. This issue of language is so important that Blair puts it on the top of his list of the most significant features of the information-seeking or subject-searching process:
If this list looks familiar it is because its a variation on the list of topics provided by Bopp and Smith, combined with the Shelton information-seeking model. It clearly outlines the critical issues of information-seeking as they relate to the reference process, and identifies what Blair sees as the most important features of information-seeking whether via the reference interview or a computerized search. This list also accentuates another somewhat surprising set of statistics. User studies show that there is an overwhelming preference for human rather than computer assistance, and face-to-face contact has consistently appeared at the top of source preference lists. This is an important consideration when decisions are made for resource allocation and the future direction of service. Why is this true? Four reasons can be postulated:
Bopp and Smith write about defining the problem situation within the reference interview; it is the first of their list of topic choices. This is a holistic approach to the reference interview, where we need to consider context and environment. What is the context of the query being made? What is the problem environment of the question? The problem environment provides the context from which the need for information arises. Or in other words, information needs result from problems arising from specific situations. This is a spin on the definitions of information-seeking made by Krikelas, and it stresses the importance of differentiating problems from questions. The problem is a compression of the user situation with all of the important elements intact. A question, however, does not retain all those elements that make up a problem. It is not the situation made smaller, but only a part of the situation. In other words:
In the holistic approach the entire experience of the user applies to the entire search process. It demands that the librarian is able to respond not only to the single question -- What do you want to know? -- but with the companion questions -- How and why is the information needed? How is it likely to help? What does the user know already? What is expected? What are the parameters of the problem? Questions that as data librarians we ask every day.
In the context of data libraries and archives `data' means computer-readable data. We acquire, store and disseminate data for secondary research. This implies that the data collected for a primary purpose are then made available for research by other individuals or groups. This research may seek to replicate analyses already carried out by primary researchers in order to verify, extend, or elaborate upon the original results, or to analyze the data from an entirely different perspective. Censuses and large surveys carried out by governments for their own policy purposes are particularly rich sources of data for further exploration.
For many, their introduction to using data is in an introductory course in statistical analysis. In that context the data are often in preformatted statistical package files. It can be quite a shock, therefore, the first time someone comes looking for their own research data and is presented with raw ASCII files.
| Data Structures |
| The relationship between the records in a data set and its fields |
| constitutes the data structure: |
| Logical Record or Rectangular structure: each line of data or each |
| record contains all of the variables for a single observation. |
| Multiple Record or Card-image structure: several lines of data |
| contain all of the variables for a particular observation. |
| Hierarchical structure: contains multiple levels of related records |
| within the same data file. For example, a file containing both |
| household records and household member records. |
| Relational structure: multiple files that can be merged on |
| the basis of a predefined structure or variable (the relationship). |
| For example, a file containing student data and another file |
| containing their course transcripts data. |
A. Putting the Search into Context
Most data users who walk into your library will go through a process that consists of four separate steps:
These users possess varying levels of experience as they make their way through this sequence, which is not necessarily linear; for example, it may not be until the user reaches Step 4 that they realize that the data do not meet their needs. Like many of the lists of processes we discuss, any of these actions may be the final one; any action may be omitted; any action may result in returning to an earlier action or in starting an entirely new effort. While often somewhat sequential, the actions can take place in virtually any order and users may, in fact, go through the sequence or parts of it numerous times.
One complicating factor in facilitating the sequence is a lack of standardized terminology, thesauri, or other tools to assist users in phrasing their questions. As we have seen, the prevalence of jargon, and the lack of mutually understandable terminology, complicate the interaction between users and librarians. Along that same line, there is a analogous deficiency of standardized terminology to describe data, its formatting, and storage. These two factors can make our job more difficult because they impede the communication process. They also emphasize the importance of co-instruction: both users and librarians benefit from teaching each other the language needed to facilitate communication.
The four steps described above must be seen in the context of the entire research process, which is a superset of the information-seeking process we examined earlier. The research process is what unites our users with a common purpose. Understanding this process is a necessary part of understanding how to assist them. The research process itself has undergone considerable transformation in the last several decades, in large part due to technological change. But, for most of our users, it can be divided into five basic parts:
While the research paradigm includes a wide number and variety of players, the role of the data library can be seen as ubiquitous; we have a potential role to play in all five parts. Some of these roles are obvious to us, others may be less so and are possible only after the staff gains experience and confidence.
Athough the potential role of the data library spans all five steps, Step 1, the identification of sources, is at the core of the reference interview, and is one of the most difficult things that we do -- as we rely on all the communication and search skills we can muster. The more complex the problem, the more difficult finding appropriate data can be. The less experience the user has using data, the longer it can take to help them refine their initial perspective and expectations to a feasible scope. But while locating the perfect data set can be an arduous task, it must always be kept in mind that it is only one step in the research process.
B. The Data Reference Interview -- In General
Distilling what we've seen of the traditional literature, the goals of the data reference librarian should be:
It may take several iterations of each step in the consultation process for a user to refine a project appropriately.
As we have seen, data reference services employ most of the processes of traditional reference. There are significant differences, though, such as the level of abstraction that is involved in working with data, and the widely varied levels of user experience and sophistication. Another difference between doing traditional reference and data reference is that we continue to see the user long after the initial reference encounter. In short, finding appropriate data will probably not be the end of our interaction. In this respect what we do tends to rest at the end of the reference interview spectrum opposite what is called "ready reference". We often see a user many times over an extended period, and for a variety of reasons. It is therefore important to encourage users to keep track of sources used or avenues pursued (or even titles considered!) during the span of the interaction. Most users will not expect librarians to be all-knowing but they may need to be reminded that to avoid duplication of effort that it is their responsibility to keep track of ground covered and paths taken.
The different questions you ask and resources you offer depend on the needs of the individual user. The most important thing is to listen closely to what the user says, and ask questions that help them to define their needs. Furthermore, its is a challenge to communicate explicit information in the needed form, at the time that it is needed, and without information overload: while you may know a lot about a particular product, it is a distinct challenge to communicate appropriate information at the appropriate time. This is the difference between simply providing information and being informative. Feedback plays a role in ascertaining if you are providing the right kind of information in the right amounts. Ask questions like "Do I understand your question correctly?" or "Am I telling you what you want to know?".
The following considerations describe data reference issues that may or may not be relevant to any given circumstance. In addition, the order in which they are broached varies. Some issues may not initially be seen as important by users, but will be very important to bring up early on in the process, such as the computer resources available to the user.
C. The Data Reference Interview -- Broken Down
1. Does the user have a specific query?
Many users come in quite well prepared. The two basic query types identified in the library literature (context and subject) can be expanded upon in the data library environment to citation and topical:
Keep in mind that there also maybe be multiple topics, for example, union membership and voting behavior. The topical query is one place where the librarian can learn a lot from the user: never be afraid of sounding dumb or asking a question when a user makes use of an acronym you don't know or a term you do not understand. Most users will be thrilled to exercise their instructional skills on you.
| What is the Unit of Analysis? |
| One of the most important ideas in a research project |
| is the Unit of Analysis. It is the major entity that you are |
| analyzing in your study. The analysis you do in your |
| study determines what the unit is. For example it can be: |
| individuals |
| groups |
| artifacts (newspapers, photos) |
| geographical units (town, census tract, state) |
| social interactions (divorces, arrests) |
Is the user in the right place? Can you tell whether it is a problem or a question that brings the user to you? The user may say he or she wants data, but data may mean a few statistics to be put in a paper or millions of census records to be run through sophisticated statistical analyses.
Distinguish between a request for information and a request for source materials to be analyzed in research.
Sometimes what someone really wants are just some pre-packaged statistics, or to look at a codebook or a questionnaire. Other times, someone is writing a research proposal and simply wants to know if you have the data or if they are going to have to buy it. With growing CD-ROM publication, fewer statistics are available in print form and more people may be coming to you with simple information requests for a few numbers or a table. Requests for source materials typically require a higher level of service than that for information.
A good initial question might be something like:
Do you want data you can analyze using a computer or are you looking for tables of information?
Another pragmatic approach is to ask:
Do you have an idea of the number of measures you will need?
If a user is looking for tables or only a few measures, you may need to direct them to a resource such as the American Public Opinion Polls or even something as basic as the Statistical Abstract of the United States. However the differentiation between an information request versus a request for source materials does not necessarily derive from these sorts of questions, further queries may be necessary on the basis of context and use.
You may want to ask questions that focus on what they want to do with the data:
These sorts of questions can help define what limits may need to be set for the data search. The scope and depth of the user's project and length of time involved are extremely important, and knowing these limitations is essential. In this respect the data reference interview or negotiation can become a lengthy and continuing process. The complex, multi-step and multifaceted search strategies we undertake will differ based on the type of research question and the computing resources available. The skills discussed earlier like active listening, constructing open-ended questions, and paraphrasing, enhance the process of defining the problem environment.
| There are three basic types of questions that research projects can address: |
| Descriptive: when a study is designed primarily to describe what is going on |
| or what exists. Public opinion polls are often primarily descriptive in nature. |
| Relational: when a study is designed to look at relationships between two or |
| more variables. For example the relationship between gender and voting behavior. |
| Causal: when a study is designed to determine whether one or more variables causes |
| or affects one or more outcome variables. For example, a public opinion poll that |
| tries to determine whether a recent political advertising campaign changed voter |
| preference. |
Every project has a goal. A topic to be addressed. A question to be answered. The user needs to define the topic precisely enough to identify appropriate data, and may have done so before coming to the data library. Being able to discuss the project or topic with the user will help to further clarify the search strategy. Most users will be willing if not anxious to discuss these issues with an attentive listener.
The way a user answers these questions will impact on the advise you offer. For example,
Throughout this process it is important to remember that researchers are best served when they are well prepared. Familiarity with the secondary literature of the topic, particularly citations to literature which utilize a particular data set, will be of great value.
Many of the complexities of locating and accessing data derive from the fact that data files come in many shapes and sizes, and that they come from many, many different sources. Access points to data files via reference tools and other resources have traditionally been problematic. For example, indexing at the variable level is still rare. The quality of both data and documentation varies significantly between and even within individual producers.
This portion of the reference interview should ideally result in a list of measures or "variables" and a list of possible bibliographic resources or data sources, from most probable to least probable.
| What is an hypothesis? |
| A hypothesis is a specific statement of prediction. It describes in |
| concrete (rather than theoretical) terms what you expect will |
| happen in your study. Not all studies have hypotheses -- |
| sometimes a study is designed to be exploratory or inductive. |
| A single study may have one or more hypotheses. |
| An alternative hypothesis is the hypothesis that you support. |
| The null hypothesis describes all the remaining possible outcomes. |
The size and complexity of a project need to match the user's computer capabilities.
| Time in Research |
| Cross-sectional: a study that takes place at a single point in time. |
| Longitudinal: takes place over time (2 or more waves of measurement). |
| Repeated measures: two or a few waves of measurement for use in |
| an analysis of variance (ANOVA). |
| Time Series: many waves (20 or more) of measurement for use in |
| times series analysis. |
Temper any advice with the eventual outcome in mind. People writing a short-term class paper for an introductory-level class may not need to concern themselves with the more esoteric aspects of data and their documentation. Alternatively, dissertators need to be made aware of all the resources that may accompany a given data set (for example, bibliographies, special users guides, flowcharts, appendices and supplements).
Topics such as sample design, measurement, method of collection, and quality of the data are all important in determining whether a set of data is appropriate to the research problem, its context, and the proposed techniques.
The user needs to understand how data measurement and quality are documented. Quality and other collection issues should be addressed in the study's codebook. But note that the quality of codebooks, and of data documentation in general, varies widely. With the advent of easily-accessible data on CD-ROM there has been an accompanying decrease in the quality of the product documentation. A potential pitfall is the "if its easy it must be right" misconception. A good example is the vast amount of country-level data made available via inter-governmental organizations: important issues of harmonization and comparability are often not fully elucidated. Another example are studies that use "missing" data indicators of various kinds, and the complex skip patterns that may exist. A cursory look at codebooks may not reveal complexities such as the complicated skip patterns in data sets such as the NLSY or PSID. Census data have separate flag variables that indicate when a data point has been "allocated" (estimated) or suppressed, which should be extracted along with the actual data points. Other issues that users need to think about, with or without the help of the data librarian, can include:
| A variable is any entity that can take on different values. For example: gender. |
| An attribute is a specific value on a variable. For example: male and female. |
| An independent variable is what you or nature manipulates. |
| The dependent variable is what is affected by the independent variable. |
| For example, if you are studying the effects of a new educational program on |
| student achievement, the program is the independent variable and your measures |
| of achievement are the dependent variables. |
Finally, depending on how the user answers any or all of these questions, the reference person can point them to the resources they will need to locate and then understand their data. It is possible that their need will be clearly identifiable (often times they will already know which data set they want, other times it will be obvious that, for example, the NLSY is a perfect match). Possible resources might be a printed codebook or reference work, the ICPSR on-line catalog, or the Internet. Once they have found their data it is important to make clear what else they can expect in terms of assistance from the data library, and where they can go for further help.
Be careful that any advice you give is objective and well-considered. As experts, our opinions are valuable, and offering advice based on years of experience is one of the most useful services you can provide. However there is the trap of after hearing the same questions year after year that your answers can become automatic. When you give advice, always remember to say why you are saying what you are saying, and when you offer an opinion, always cite your sources.
8. Who's got the bug?
There are types of reference that we do that are unique to us, such as troubleshooting data problems. Inevitably users will come back to you explaining that they are having problems with their data. Often times these problems will be in the context of data manipulation or statistical analysis. The "who's got the bug?" game is an important and unavoidable part of the reference process in the data library. Experience will help deal with troubleshooting, but it is also imperative to make clear to your users what your capabilities and limitations are, and what level of service you are willing to provide. Frequently users will come in explaining that there is a problem with their data, but often the actual problem is with their statistical code, or in how they interpreted the codebook. It helps to have a cursory understanding of SAS and SPSS, particularly the data definition modules. Being able to produce and read frequencies is helpful. Other common problems are caused by users' incomplete understanding of skip patterns, allocation flags, or universes. In these cases it is often helpful to sit down with the user and the documentation and pour over relevant sections. Beware of being overly-sanctimonious; usually the error is the user's, but occasionally the data are bad, and in those cases you need to have a procedure in place to keep track of such data-related problems. "Who's got the bug?" can become a wonderfully challenging and interesting process, as well as a learning experience.
D. The Importance of Evaluation and Experience.
The importance of inquirer evaluation can not be understated and should be an on-going process to measure the effectiveness of the reference interview and the data library as a whole. The nature of the one-on-one reference interview provides a perfect opportunity to obtain feedback directly or indirectly from users. As we've seen, experience plays an important role in designing an effective search strategy. The best way to gain experience is to work through the process with a researcher, asking about the value of the resources, learning why data files were selected or rejected, and acquiring knowledge about the content of various files especially in the context of their use. Because of the close and ongoing relationships we often develop with our users, we are able to work as an intermediary at a profound level, becoming a valuable member of the research team and increasing our ability to provide assistance to future users. Learning to identify information needs allows us to create new services and reshape existing service delivery patterns.
It is also vital to stay on top of your profession and what your users are doing. Be familiar with your data and its documentation. Keep up with error reports and other data-related information. Subscribe to newsletters and read what your users are writing. Go to brownbags and other talks. When you go to conferences save participant lists and any directories you may find. When you find a good human source of information write down their name or get their business card.
E. The Data Library -- Providing What Services?
(Jim Jacobs' Levels of Service article: Providing Data Services for Machine-Readable Information in an Academic Library: Some Levels of Service, The Public-Access Computer Systems Review vol. 2, no. 1 (1991): 144-160).
And if all of this isn't enough, remember the importance of having a clear mission statement that defines what services your library and its staff will and will not do. Jacobs' article lists levels of service for four categories by increasing degree of complexity. There is quite a bit of overlap between the categories because, in part, the functions themselves overlap. Also, its clear that most libraries will not be able to supply all levels of service for all categories, but based on resources, staff, and other variables, choices will have to be made. Its useful to have a mission statement in writing that you can provide to users to help them put your library into context. Despite this, don't be surprised if your users frequently ask for help beyond your normal limits.
Levels of General Data Services:
Levels of Computing Services:
Levels of Library Data Services
Levels of Reference Data Services
Some possible reference services to think about include (Geraci, Humphrey, Jacobs):
F. DPLS Searching the Web and Introduction to Using Data at DPLS
I'm sure anyone reading this can come up with their own list of Commandments. The list presented here was derived from conversations with colleagues. It presents salient concepts useful to keep in mind for anyone doing data reference.
Much of what has been presented in this workshop is based on readings both general to the library profession and specific to data librarianship. Parts of it (for example Section E in its entirey) is shamelessly derived from the work of others (in that case, Jacobs, Geraci and Humphrey). It has been a joy to read the old familiar pieces as well as the excellant work currently being produced by writers like Weingard and Durrance.
There was a first wave of publishing in the 1970's as the profession got its act together. This wave is exemplified by Rowe's article which expressed concern over the continued marginalization of data services in the eyes of traditional libraries. In the mid-1980's a second wave of publishing was stimulated by the increasing recognition by traditional librarians that data were a legitimate and important resource. This wave is typified by Dionne's article, which provides basic information to help reference librarians begin to work with machine-readable data.
The following list is by no means an exhaustive review of the literature, but rather a list of those publications used in the creation of this workshop.
Blair, David C. Language and Representation in Information Retrieval. (New York: Elsevier Science Publishers, 1990).
Bopp, Richard E. and Linda C. Smith. Reference and Information Services. (Englewood, Co: Libraries Unlimited, Inc, 1995).
Dionne, JoAnn. "Numeric Social Science Databases and the Library," Choice (22, 646-652, 1985).
Durrance, Joan C. "Information Needs: Old Song, New Tune," in Rethinking the Library in the Information Age. (Washington: U.S. Department of Education, 1986(?)).
Ellis, David. "Modeling the Information-Seeking Patterns of Academic Researchers: A Grounded Theory Approach," Library Quarterly (63 1993).
Folster, Mary B. "Information Seeking Patterns: Social Sciences," The Reference Librarian (No. 49/50, 1995).
Geraci, Diane, Chuck Humphrey and Jim Jacobs. "Data: Services and Collections (Part 2)," in Management of Machine-Readable Social Science Information (ICPSR Summer School, August, 1996).
Gray, Ann S. and Diane Geraci. "Complex Reference Services: Data Files for Social Research," The Reference Librarian (No. 48 1995).
Grover, Robert and Janet Carabell. "Toward Better Information Service: Diagnosing Information Needs," Special Libraries (Winter 1995).
Hunt, Patrick J. "Interpreters as Well as Gatherers The Librarian of Tomorrow...Today," Special Libraries (Summer 1995).
Jacobs, Jim. "Providing Data Services for Machine-Readable Information in an Academic Library: Some Levels of Service," The Public-Access Computer Systems Review (2 1991).
Jones, Ray and Colleen Seale. "Expanding Networks: Reference Services for MRDF," Reference Services Review (16(1-2), 1988).
Krikelas, James. "Information-seeking Behavior: Patters and Concepts," Drexel Library Quarterly (Spring 1983).
Rowe, Judith S. "Expanding Social Science Reference Service to Meet the Needs of Patrons More Adequately," Library Trends (Winter 1982).
Weingard, Darlene. Customer Service Excellence. (Chicago and London: American Libraries Association, 1997).
Zunde, P. and M.E. Dexter. American Documentation. (Volume 20, No. 3: 1969).