Deviation of Proportions measure. Key Words In Context format and collocation. It discusses four association measures i. After that, the author introduces the notion of lexical constellations, which reflect hierarchical and asymmetric relationships between collocates.

The main advantage of the book, in my opinion, is the fact that it offers a detailed explanation of classical statistical techniques. The text contains many examples, which will definitely help a novice to understand the logic behind the statistical tests. The book can thus be used as a supplement to more practically oriented textbooks, e.

Another strong point is the systematic discussion and comparison of parametric and non-parametric methods offered in Chapter 3. Since linguistic data tend to deviate from normality, this approach is very welcome.

- Google Maps mashups with Google Mapplets?
- Drug Stereochemistry: Analytical Methods and Pharmacology, Third Edition.
- Quantitative Methods for Linguistic Data.
- Statistics for Linguistics with R.

That being said, there are a few concerns. First, I have some doubts that the book fully achieves its goal formulated in the preface, namely, to demonstrate how statistics can contribute to linguistic studies.

## References

Unfortunately, the examples and topics covered in the book are too limited from a theoretical point of view. Most illustrations come from foreign language acquisition e. This is a bit odd, since the application of quantitative methods in contemporary linguistic research has been extremely productive in many areas, especially within the usage-based paradigm and in variationist research, psycholinguistics and typology. In addition, the data in examples are often fictional or come from an unnamed source, especially in the first chapters of the book.

Second, in the age of the statistical software boom, it is somewhat surprising to find no practical guidelines regarding how to perform statistical tests with the help of existing packages for instance, SPSS, which is extensively used by the author. After all, these calculations are no longer done with pencil and paper. It would be useful, therefore, if the book were to contain at least an appendix with relevant codes. Another problematic issue is the imprecise use of statistical terminology.

Consider the following, more small-scale errors: i. Figure 1. The term 'probability ratios' 26 should be substituted by simple 'probabilities' or 'proportions'; iii. Some errors are, however, more serious on conceptual grounds: i. This assumption is erroneous. In fact, there exist perfectly legitimate solutions that enable one to incorporate categorical predictors e.

## How to do Linguistics with R

The strategy of fitting the regression model described on pp. A model with 8 independent variables and only 32 observations runs a huge risk of overfitting. As a result, such a model cannot be extrapolated to new data, which makes it useless see Harrell Finally, the book contains a few minor misprints, which can be a source of confusion for beginners: i. The mean, median and mode should be in the reverse order in Figure 1. Analyzing Linguistic Data.

### 2nd Edition

Cambridge: Cambridge University Press. Dunning, Ted. Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19 1. Gries, Stefan Th.

Dispersions and adjusted frequencies in corpora. International Journal of Corpus Linguistics 13 4. Statistics for Linguistics with R. A practical introduction. Berlin: De Gruyter Mouton. Harrell, Frank E. Regression Modeling Strategies. New York: Springer. International Journal of Corpus Linguistics 17 1. Foundations of Statistical Natural Language Processing. Her thesis was based on multivariate statistical analyses of periphrastic causatives in Netherlandic and Belgian Dutch.

Parametric tests include the t-test for independent and paired samples, analysis of variance ANOVA , Pearson's correlation coefficient and simple linear regression, while the non-parametric section deals with the Mann-Whitney U-test, the sign test, the chi-squared test, the median test and Spearman's rank correlation.

**follow site**

## Quantitative Corpus Linguistics with R: A Practical Introduction - PDF Free Download

The author explains the underlying assumptions and theoretical principles of each test, and provides extensive illustrations. Chapter 4 describes four multivariate statistical methods: cluster analysis in its hierarchical and non-hierarchical i. As in the previous chapter, the assumptions that should be met are discussed for each method. The author walks the reader through all the main conceptual steps of each analysis. Most calculations in this chapter are done by the author with the help of SPSS.

Chapters 5 and 6 deal with some fundamental issues in corpus linguistics related to word frequency lists and collocation measures. Chapter 5 is probably the most heterogeneous one of the book. First, it discusses at length the usefulness of different ways of sorting frequency lists, and illustrates Dunning's method of finding the keywords in a text or corpus.

The 'keyness' is determined with the help of the log-likelihood test. The method is illustrated by computing the keyness of words in one of Barack Obama's speeches. The reference corpus, which is used to measure the degree of unexpectedness of the words in Obama's speech is, somewhat surprisingly, the British National Corpus. In addition, the author mentions different types of corpus annotation, and suggests a method of comparing wordlists from different domains with the help of meta-frequency lists, which are conceptually similar to the popular Venn's diagrams.

Next, the author moves on to discuss type and token distribution in a corpus, as well as Zipf's law. Finally, he describes how to measure dispersion of a word in a corpus by using Gries' DP i. Deviation of Proportions measure. Key Words In Context format and collocation. It discusses four association measures i. After that, the author introduces the notion of lexical constellations, which reflect hierarchical and asymmetric relationships between collocates.

The main advantage of the book, in my opinion, is the fact that it offers a detailed explanation of classical statistical techniques. The text contains many examples, which will definitely help a novice to understand the logic behind the statistical tests. The book can thus be used as a supplement to more practically oriented textbooks, e. Another strong point is the systematic discussion and comparison of parametric and non-parametric methods offered in Chapter 3.

### ADVERTISEMENT

Since linguistic data tend to deviate from normality, this approach is very welcome. That being said, there are a few concerns. First, I have some doubts that the book fully achieves its goal formulated in the preface, namely, to demonstrate how statistics can contribute to linguistic studies. Unfortunately, the examples and topics covered in the book are too limited from a theoretical point of view.

Most illustrations come from foreign language acquisition e. This is a bit odd, since the application of quantitative methods in contemporary linguistic research has been extremely productive in many areas, especially within the usage-based paradigm and in variationist research, psycholinguistics and typology.

- Data exploration and statistical analysis.
- Statistics for Linguistics with R: A Practical Introduction!
- Stefan Th. Gries.
- Statistics Linguistics by Stefan Gries.
- Linguist List - Reviews Available for the Book.

In addition, the data in examples are often fictional or come from an unnamed source, especially in the first chapters of the book. Second, in the age of the statistical software boom, it is somewhat surprising to find no practical guidelines regarding how to perform statistical tests with the help of existing packages for instance, SPSS, which is extensively used by the author.

After all, these calculations are no longer done with pencil and paper. It would be useful, therefore, if the book were to contain at least an appendix with relevant codes. Another problematic issue is the imprecise use of statistical terminology. Consider the following, more small-scale errors: i. Figure 1. The term 'probability ratios' 26 should be substituted by simple 'probabilities' or 'proportions'; iii. Some errors are, however, more serious on conceptual grounds: i.

This assumption is erroneous.