A command line utility for vector manipulation.

Vectools increases the development speed and reproducability of bioinformatics analyses by offering a high-quality alternative to writing custom scripts.

  • Improvments on many Coreutils functions
  • Linear Algebra
  • Machine learning
cat f1.tsv   cat f2.tsv   cat f3.tsv 
A 1          A 3          A 5 
B 2          B 4          B 6

cat i1.tsv
C   10  11  12
D   13  14  15

vectools join -s 1 f*.tsv | \
    vectools add -r - i1.tsv

A   11  14  17
B   15  18  21


Features

Improvments on many Coreutils functions

Coreutils: joining files with a wildcard
A=(sample*.tsv)
set ${A[@]}
recursive_join() {
    if [ $# -eq 1 ]; then  
        join - $1 >> output.tsv
    else
        f=$1; shift
        join - $1 | recursive_join "$@"
        fi
}
if [ $# -le 2 ]; then
    join "$@"
else
    f1=$1; f2=$2; shift 2
    join "$f1" "$f2" | recursive_join "$@"
fi
Vectools: joining files with a wildcard
vectools join sample*.tsv > output.tsv















Easy-to-use machine learning

# Calculate dipeptide composition of positive and negative sets. 
vectools ncomp --kmer-len 2 pos.faa > pos.vec
vectools ncomp --kmer-len 2 neg.faa > neg.vec

# Find best parameters via gird search, k-fold testing, 
# and independent set testing. Then build SVM model. 
vectools svmtrain                \
   --folds 5 --kernel rbf        \
   --best-metrics ca.best_stats  \
   --model ca.model              \
   pos.vec neg.vec

# Get dipeptide composition from multi-fasta of unknowns. 
vectools ncomp -r --kmer-len 2 unknowns.faa > unknowns.vec

# Predict classes of unknowns. 
vectools svmclassify -r --model ca.model unknowns.vec > preds.tsv

#class    class_ID    score
0    pos.vec    0.42779
1    neg.vec    -0.30745
0    pos.vec    0.16307
....

Extensive documentation


usage: vectools slice [-h] [--keep-cols [KEEP_COLS]]
                      [--remove-cols [REMOVE_COLS]] [-c] [-r] [-d [DELIMITER]]
                      [--roundto ROUNDTO]
                      [matrices [matrices ...]]

positional arguments:
  matrices              Matrices to add to a base matrix.

optional arguments:
  -h, --help            show this help message and exit
  --keep-cols [KEEP_COLS] The columns which should be kept (comma-separated).
  --remove-cols [REMOVE_COLS] The to remove (comma-separated). Omitted if --keep-cols is present. 
  -c, --column-titles   The matrix has column titles.
  -r, --row-titles      The matrix has row titles.
  -d [DELIMITER], --delimiter [DELIMITER] The character separating columns. default: TAB
  --roundto ROUNDTO     Round to n decimal places.

    This function is to slice a matrix
    You have to say which columns you want to keep (--keep-cols) or to remove (--remove-cols)
    in a comma separated list like 1,3,7 or 1,4:7,9:

    See function chop if you want to remove rows

    #Examples:

    $ cat matrix.tsv
    id,c,d,e
    a,1,2,3
    b,3,4,5

    $ vectools slice --keep-cols 0,2 --delimiter , --row-titles --column-titles matrix.tsv
    id,c,e
    a,1,3
    b,3,5

    $ vectools slice --remove-cols 1:2 --delimiter , --column-titles matrix.tsv
    id,e
    a,3
    b,5 

Extensive Unit and Integration Testing


  Scenario: run params test for max r              # features/analysis.feature:134
    Given the file test containing                 # features/steps/matrix_helper.py:81 0.000s
      """
      0,1,2,3,4,5
      1,2,3,4,5,6
      2,3,4,5,6,7
      3,4,5,6,7,8
      """
    Given file test as parameter                   # features/steps/matrix_helper.py:104 0.000s
    Given parameter --delimiter = ,                # features/steps/matrix_helper.py:11 0.000s
    Given last parameter --row-titles              # features/steps/matrix_helper.py:73 0.000s
    When we run minimum from analysis with tmpfile # features/steps/analysis.py:114 0.001s
    Then we expect the matrix                      # features/steps/matrix_helper.py:53 0.000s
      """
      1,2,3,4,5
      """
       

Install

Linux & Mac

# Install
python3 -m pip install vectools

# Upgrade
python3 -m pip install vectools --upgrade

More Examples

Improvments on many Coreutils functions

Coreutils: multi-column joins
join -j4 -t $'\t' -o 1.1,1.2,1.3,2.3 \
  <(awk '{print $0"\\t"$1"-"$2}' A.tsv | sort -k4) \
  <(awk '{print $0"\\t"$1"-"$2}' B.tsv | sort -k4) 
Vectools: multi-column joins
vectools join -j INNER -s 2 -k1 0,1 -k2 0,1 \
    A.tsv B.tsv 



       

About

Please Cite: Manuscript coming soon.

Get Help

Documentation

Wikipedia

Bitbucket Bitbucket/wiki

Issues

Bug reports

Bitbucket Bitbucket/issues