Improvments on many Coreutils functions
Coreutils: joining files with a wildcard
A =( sample*.tsv)
set ${A[ @]}
recursive_join() {
if [ $# -eq 1 ]; then
join - $1 >> output.tsv
else
f =$1 ; shift
join - $1 | recursive_join " $@ "
fi
}
if [ $# -le 2 ]; then
join " $@ "
else
f1 =$1 ; f2 =$2 ; shift 2
join " $f1 " " $f2 " | recursive_join " $@ "
fi
Vectools: joining files with a wildcard
vectools join sample*.tsv > output.tsv
Easy-to-use machine learning
# Calculate dipeptide composition of positive and negative sets.
vectools ncomp --kmer-len 2 pos.faa > pos.vec
vectools ncomp --kmer-len 2 neg.faa > neg.vec
# Find best parameters via gird search, k-fold testing,
# and independent set testing. Then build SVM model.
vectools svmtrain \
--folds 5 --kernel rbf \
--best-metrics ca.best_stats \
--model ca.model \
pos.vec neg.vec
# Get dipeptide composition from multi-fasta of unknowns.
vectools ncomp -r --kmer-len 2 unknowns.faa > unknowns.vec
# Predict classes of unknowns.
vectools svmclassify -r --model ca.model unknowns.vec > preds.tsv
#class class_ID score
0 pos.vec 0.42779
1 neg.vec -0.30745
0 pos.vec 0.16307
....
Extensive documentation
usage: vectools slice [-h] [--keep-cols [KEEP_COLS]]
[--remove-cols [REMOVE_COLS]] [-c] [-r] [-d [DELIMITER]]
[--roundto ROUNDTO]
[matrices [matrices ...]]
positional arguments:
matrices Matrices to add to a base matrix.
optional arguments:
-h, --help show this help message and exit
--keep-cols [KEEP_COLS] The columns which should be kept (comma-separated).
--remove-cols [REMOVE_COLS] The to remove (comma-separated). Omitted if --keep-cols is present.
-c, --column-titles The matrix has column titles.
-r, --row-titles The matrix has row titles.
-d [DELIMITER], --delimiter [DELIMITER] The character separating columns. default: TAB
--roundto ROUNDTO Round to n decimal places.
This function is to slice a matrix
You have to say which columns you want to keep (--keep-cols) or to remove (--remove-cols)
in a comma separated list like 1,3,7 or 1,4:7,9:
See function chop if you want to remove rows
#Examples:
$ cat matrix.tsv
id,c,d,e
a,1,2,3
b,3,4,5
$ vectools slice --keep-cols 0,2 --delimiter , --row-titles --column-titles matrix.tsv
id,c,e
a,1,3
b,3,5
$ vectools slice --remove-cols 1:2 --delimiter , --column-titles matrix.tsv
id,e
a,3
b,5
Extensive Unit and Integration Testing
Scenario: run params test for max r # features/analysis.feature:134
Given the file test containing # features/steps/matrix_helper.py:81 0.000s
"""
0,1,2,3,4,5
1,2,3,4,5,6
2,3,4,5,6,7
3,4,5,6,7,8
"""
Given file test as parameter # features/steps/matrix_helper.py:104 0.000s
Given parameter --delimiter = , # features/steps/matrix_helper.py:11 0.000s
Given last parameter --row-titles # features/steps/matrix_helper.py:73 0.000s
When we run minimum from analysis with tmpfile # features/steps/analysis.py:114 0.001s
Then we expect the matrix # features/steps/matrix_helper.py:53 0.000s
"""
1,2,3,4,5
"""