Call Summary Training Set Constraints Background Models PWM Logo Sites Probs Explanation


COSMO - Constrained search for motifs

COSMO version 1.0 (Release date: 4/11/06)

For further information on how to interpret this output, please access http://cosmoweb.berkeley.edu.

You can download these results in XML format here.

Date job was run:13:36 Tue 04/11/2006
Web page expires:13:36 Tue 04/18/2006
Run time:5 minutes 15 seconds
Run on server:biostat-05.berkeley.edu

Reference

If you use this software in your research, please cite

O. Bembom, M.J. van der Laan (2006). Supervised detection of conserved motifs in DNA sequences with cosmo. UC Berkeley Division of Biostatistics Working Paper Series. Working Paper 209. http://www.bepress.com/ucbbiostat/paper209.

cosmo makes use of the donlp2() function by Peter Spellucci. The use of donlp2 must be acknowledged in any publication which contains results obtained with cosmo or parts of it. Citation of the author's name and netlib-source is suitable.


Call summary

This information can also be useful in the event you wish to report a problem with the cosmo software

command: cosmo cosmo.seqs.13573 -minw 7 -maxw 12 -zoops -tcm -addfree -con cosmo.cons.13573 -xml -status

ConstraintsnumConSets:2nonempty:TRUE
criterion:pwmCVfold:2trunc:100
DistributionOOPS:FALSEZOOPS:TRUETCM:TRUE
approx:OVERcutfac:5
criterion:evalfold:5trunc:100
Widthminw:7maxw:12
criterion:evalfold:5trunc:100
NumSitesminsites:2maxsites:50
criterion:evalfold:5trunc:100
Startstype:evalnumber:5
Backgroundsource:samecvorder:TRUEfold:5
Datanumseqs:20numnucs:4000rev:TRUE

Training set

Sequence NameLengthSequence NameLength
TestSeq0:138200TestSeq1:144200
TestSeq2:95200TestSeq3:168200
TestSeq4:8200TestSeq5:51200
TestSeq6:44200TestSeq7:128200
TestSeq8:73200TestSeq9:116200
TestSeq10:3200TestSeq11:175200
TestSeq12:161200TestSeq13:147200
TestSeq14:88200TestSeq15:12200
TestSeq16:6200TestSeq17:130200
TestSeq18:28200TestSeq19:121200

Constraints

Original constraint fileInterpreted constraints
@ ConstraintSet 1@ Constraint set 1
>IntervalSetup>IntervalSetup
Length: 3 bpLength: 3 bp
Length: variableLength: variable
Length: 3 bpLength: 3 bp
>IcBounds>ICBounds
Interval: 2Interval: 1
Bounds: 0 to 0.8Bounds: 1.000 to 2.000
>IcBounds>ICBounds
Interval: 1Interval: 2
Bounds: 1.0 to 2.0Bounds: 0.000 to 0.800
>Pal>Palindrome
Intervals: 1 and 3Intervals: 1 and 3
ErrorTol: 0.05ErrorTol: 0.050
@ Constraint set 2
>IntervalSetup
Length: variable

Estimated background model

Order of background Markov model chosen by likelihood-based CV: 1

Kullback-Leibler divergences for candidate orders 0 to 6:

OrderKL divergence
0271.68
1269.72
2269.83
3319.49
45783.6
5inf
6inf

Estimated transition matrix for order 0

PrefixACGT
-0.30430.19430.18700.3145

Estimated transition matrix for order 1

PrefixACGT
A0.29730.19490.17750.3303
C0.28260.21160.21420.2916
G0.25200.22780.18870.3315
T0.35620.16290.17890.3019

Summary of candidate models

ConstraintsDistributionWidthwCritdistCritconCrit
1ZOOPS71.01e+03------
1ZOOPS81.46e-061.46e-061.27
1ZOOPS917.6------
1ZOOPS100.0404------
1ZOOPS1123.1------
1ZOOPS1278.9------
1TCM7111------
1TCM81.621.62---
1TCM9652------
1TCM1046.3------
1TCM11710------
1TCM123.59e+03------
2ZOOPS70.00147------
2ZOOPS81.54e-061.54e-061.36
2ZOOPS91.07e-05------
2ZOOPS104.6e-05------
2ZOOPS110.000596------
2ZOOPS120.0125------
2TCM70.944------
2TCM80.02230.0223---
2TCM90.324------
2TCM100.353------
2TCM111.36------
2TCM1222.6------

Selected model

ParameterChoiceCriterionCriterion value
Constraints1PWM-based CV1.27
DistributionZOOPSE-value1.46e-06
Width8E-value1.46e-06
NumSites19E-value1.46e-06

Estimated position weight matrix

Nuc\Pos12345678
A0.0000.0000.0000.0000.0000.0000.0000.000
C0.9500.1780.7800.1530.0440.2700.8720.000
G0.0500.8220.2200.5770.4840.7300.1281.000
T0.0000.0000.0000.2690.4720.0000.0000.000

Sequence logo of discovered motif


Alignment of discovered sites (E-value: 1.46e-06)

Sequence   Strand Start Prob    Site  
TestSeq8:73 +73 0.9988 atctagacta CGCGTGCGgtggtattga
TestSeq3:168 -168 0.9980 agccgctaga CGGCCGCGatatgatccc
TestSeq13:147 -147 0.9980 agtcagccaa CGCCAGCGcttggtattt
TestSeq0:138 -138 0.9977 accgattccg CGCCAGCGtatcgatact
TestSeq5:51 -51 0.9974 gatgcttgca CGCCCCCGtgcatatctg
TestSeq4:8 -8 0.9971 ttccaat CGCACCCGtttttaacaa
TestSeq10:3 -3 0.9960 ct CGGCAGCGgttcagagta
TestSeq7:128 -128 0.9950 atataatatc CGCAGGCGtttaaccggc
TestSeq12:161 -161 0.9941 ttgcctaact CGGACGGGgactcataaa
TestSeq2:95 +95 0.9930 taacgcggta CCCCGGCGatcacaaatt
TestSeq18:28 -28 0.9930 gtatatacta CGGACGGGatactgtacc
TestSeq14:88 -88 0.9572 ttgagaaggt CGCAAGCCtcgtgtagta
TestSeq17:130 -130 0.9393 ttttcttcga CGCCCGGCaatgaatcat
TestSeq15:12 -12 0.9151 cactgttttt CGCACGGGaaagggcctc
TestSeq9:116 +116 0.8329 ccttgcattc GCCGTGCGacccgcgcca
TestSeq11:175 +175 0.7721 acgacagcat GCCCTGCGatgtttgcga
TestSeq16:6 +6 0.7677 ttatt CCCGCCGGatcaggaatc
TestSeq6:44 -44 0.6169 agttccgctg CGGGCGCGtattagtgtc
TestSeq1:144 -144 0.4328 atcaattatt CCCCCGGCccttttttct

Posterior probability plot


Explanation of cosmo results

The cosmo results consist of: