Title: | Read 'CEP' and Legacy 'CANOCO' Files |
---|---|
Description: | Read Condensed Cornell Ecology Program ('CEP') and legacy 'CANOCO' files into R data frames. |
Authors: | Jari Oksanen [aut, cre] , Gavin L. Simpson [aut] |
Maintainer: | Jari Oksanen <[email protected]> |
License: | MIT + file LICENCE |
Version: | 1.2-2 |
Built: | 2024-11-13 04:43:17 UTC |
Source: | https://github.com/vegandevs/cepreader |
readCEP
reads a file formatted by relaxed strict CEP format
used by Canoco
software, among others.
readCEP(file, maxdata = 10000, positive = TRUE, sparseMatrix = FALSE, ...)
readCEP(file, maxdata = 10000, positive = TRUE, sparseMatrix = FALSE, ...)
file |
File name (character variable). |
maxdata |
Maximum number of non-zero entries. |
positive |
Read only positive data entries (normal in community data). |
sparseMatrix |
Return data as a sparse matrix of Matrix
package (see |
... |
Other arguments passed to |
Cornell Ecology Programs (CEP) introduced several data formats
designed for punched cards. One of these was the ‘condensed
strict’ format which was adopted by popular software DECORANA
and TWINSPAN
. Later, Cajo ter Braak (1984) wrote
Canoco
based on DECORANA
, where he adopted the format,
but relaxed it somewhat (that's why I call it “relaxed
strict” format). Further, he introduced a more ordinary
“free” format, and allowed the use of classical Fortran style
“open” format with fixed field widths. This function should
be able to deal with all these Canoco
formats, whereas it
cannot read many of the traditional CEP alternatives.
All variants of CEP formats have:
Two or three title cards, most importantly specifying the
format (or word FREE
) and the number of items per record
(number of species and sites for “open” and “free”
formats).
Data in one of three accepted formats:
Condensed format: First number on the line is the site identifier (an integer), and it is followed by pairs (“couplets”) of numbers identifying the species and its abundance (an integer and a floating point number).
Open Fortran format, where the first number on the line must be the site number, followed by abundance values in fields of fixed widths. Empty fields are interpreted as zeros.
“Free” format, where the numbers are interpreted as abundance values. These numbers must be separated by blank space, and zeros must be written as zeros.
Species and site names, given in Fortran format
(10A8)
: Ten names per line, eight columns for each.
With option positive = TRUE
the function removes all lines
and columns with zero or negative marginal sums. In community data
with only positive entries, this removes empty sites and species.
If data entries can be negative, this ruins data, and such data sets
should be read in with option positive = FALSE
.
Returns a data frame (default), where columns are species and rows are
sites. Column and row names are taken from the CEP file, and changed
into unique R names by make.names
after stripping the
blanks.
Alternatively the function can return a sparse matrix of Matrix package. This can give considerably saving in disk storage. However, typically the sparse matrix will be expanded to full dense matrix in community analysis. For instance, centred or standardized data matrices are dense. Moreover, some functions may be unable to analyse sparse matrices, but you must cast these to ordinary dense data matrices or data frames before the analysis.
The function calls an external Fortran program to manipulate data so that they can be read into R. The function launches a separate program to read the data. This can trigger a warning with some security settings, and users may need to give permission for the operation.
If you transfer files between operating systems or platforms, you
should always check that your file is formatted to your current
platform. For instance, if you transfer files from Windows to Linux,
you should change the files to unix
format, or your session may
crash when Fortran program tries to read the invisible characters that
Windows uses at the end of each line.
This function was included in vegan up to version 2.4-5. It
was moved to a package of its own because compiled functions using
Fortran I/O can interfere with compiled code written in other
languages. This can disturb vegan, other loaded packages and
packages that depend on vegan. Current versions of vegan
have function read.cep
that uses R code to
read in “condensed” data. However, the R code is not as
robust as the Fortran code in this package, and it can fail to read
or read correctly several legacy files, and does not read
“open” and “free” format data.
Jari Oksanen
Ter Braak, C.J.F. (1984–): CANOCO – a FORTRAN program for canonical community ordination by [partial] [detrended] [canonical] correspondence analysis, principal components analysis and redundancy analysis. TNO Inst. of Applied Computer Sci., Stat. Dept. Wageningen, The Netherlands.
## classic example cepfile <- file.path(path.package("cepreader"), "testdata", "dune.spe") ## peek at the file structure head(readLines(cepfile), n=10) tail(readLines(cepfile), n=10) ## as a data frame readCEP(cepfile) ## as a sparse matrix (Matrix package) readCEP(cepfile, sparseMatrix = TRUE)
## classic example cepfile <- file.path(path.package("cepreader"), "testdata", "dune.spe") ## peek at the file structure head(readLines(cepfile), n=10) tail(readLines(cepfile), n=10) ## as a data frame readCEP(cepfile) ## as a sparse matrix (Matrix package) readCEP(cepfile, sparseMatrix = TRUE)