Data

texture

active
ARFF
Publicly available Visibility: public Uploaded 29-07-2016 by Rafael G. Mantovani

0 likes downloaded by 11 people , 18 total downloads 0 issues 0 downvotes

0 likes downloaded by 11 people , 18 total downloads 0 issues 0 downvotes

Issue | #Downvotes for this reason | By |
---|

Loading wiki

Help us complete this description
Edit

Author: Laboratory of Image Processing and Pattern Recognition (INPG-LTIRF), Grenoble - France.
Source: [ELENA project](https://www.elen.ucl.ac.be/neural-nets/Research/Projects/ELENA/databases/REAL/texture/)
Please cite: None
####1. Summary
This database was generated by the Laboratory of Image Processing and Pattern Recognition (INPG-LTIRF) in the development of the Esprit project ELENA No. 6891 and the Esprit working group ATHOS No. 6620.
```
(a) Original source:
P. Brodatz "Textures: A Photographic Album for Artists and Designers",
Dover Publications,Inc.,New York, 1966.
(b) Creation: Laboratory of Image Processing and Pattern Recognition
Institut National Polytechnique de Grenoble INPG
Laboratoire de Traitement d'Image et de Reconnaissance de Formes LTIRF
Av. Felix Viallet, 46
F-38031 Grenoble Cedex
France
(c) Contact: Dr. A. Guerin-Dugue, INPG-LTIRF, guerin@tirf.inpg.fr
```
####2. Past Usage:
This database has a private usage at the TIRF laboratory. It has been created in order to study the textures discrimination with high order statistics.
```
A.Guerin-Dugue, C. Aviles-Cruz, "High Order Statistics from Natural Textured Images",
In ATHOS workshop on System Identification and High Order Statistics, Sophia-Antipolis, France, September 1993.
Guerin-Dugue, A. and others, Deliverable R3-B4-P - Task B4: Benchmarks, Technical report,
Elena-NervesII "Enhanced Learning for Evolutive Neural Architecture", ESPRIT-Basic Research Project Number 6891,
June 1995.
```
####3. Relevant Information:
The aim is to distinguish between 11 different textures (Grass lawn, Pressed calf leather, Handmade paper, Raffia looped to a high pile, Cotton canvas, ...), each pattern (pixel) being characterised by 40 attributes built by the estimation of fourth order modified moments in four orientations: 0, 45, 90 and 135 degrees.
A statistical method based on the extraction of fourth order moments for the characterization of natural micro-textures was developed called "fourth order modified moments" (mm4) [Guerin93], this method measures the deviation from first-order Gauss-Markov process, for each texture. The features were estimated in four directions to take into account the possible orientations of the textures (0, 45, 90 and 135 degrees). Only correlation between the current pixel, the first neighbourhood and the second neighbourhood are taken into account. This small neighbourhood is adapted to the fine grain property of the textures.
The data set contains 11 classes of 500 instances and each class refers to a type of texture in the Brodatz album.
The database dimension is 40 plus one for the class label. The 40 attributes were build respectively by the estimation of the following fourth order modified moments in four orientations: 0, 45, 90 and 135 degrees: mm4(000), mm4(001), mm4(002), mm4(011), mm4(012), mm4(022), mm4(111), mm4(112), mm4(122) and mm4(222).
!! Patterns are always sorted by class and are presented in the increasing order of their class label in each dataset relative to the texture database (texture.dat, texture_CR.dat, texture_PCA.dat, texture_DFA.dat)
####4. Class:
The class label is a code for the following classes:
```
Class Class label
2 Grass lawn (D09)
3 Pressed calf leather (D24)
4 Handmade paper (D57)
6 Raffia looped to a high pile: (D84)
7 Cotton canvas (D77)
8 Pigskin (D92)
9 Beach sand: (D28)
10 Beach sand (D29)
12 Oriental straw cloth (D53)
13 Oriental straw cloth (D78)
14 Oriental grass fiber cloth (D79)
```
####5. Summary Statistics:
Table here below provides for each attribute of the database the dynamic (Min and Max values), the mean value and the standard deviation.
```
Attribute Min Max Mean Standard
deviation
1 -1.4495 0.7741 -1.0983 0.2034
2 -1.2004 0.3297 -0.5867 0.2055
3 -1.3099 0.3441 -0.5838 0.3135
4 -1.1104 0.5878 -0.4046 0.2302
5 -1.0534 0.4387 -0.3307 0.2360
6 -1.0029 0.4515 -0.2422 0.2225
7 -1.2076 0.5246 -0.6026 0.2003
8 -1.0799 0.3980 -0.4322 0.2210
9 -1.0570 0.4369 -0.3317 0.2361
10 -1.2580 0.3546 -0.5978 0.3268
11 -1.4495 0.7741 -1.0983 0.2034
12 -1.0831 0.3715 -0.5929 0.2056
13 -1.1194 0.6347 -0.4019 0.3368
14 -1.0182 0.1573 -0.6270 0.1390
15 -0.9435 0.1642 -0.4482 0.1952
16 -0.9944 0.0357 -0.5763 0.1587
17 -1.1722 0.0201 -0.7331 0.1955
18 -1.0174 0.1155 -0.4919 0.2335
19 -1.0044 0.0833 -0.4727 0.2257
20 -1.1800 0.4392 -0.4831 0.3484
21 -1.4495 0.7741 -1.0983 0.2034
22 -1.2275 0.5963 -0.7363 0.2220
23 -1.3412 0.4464 -0.7771 0.3290
24 -1.1774 0.6882 -0.5770 0.2646
25 -1.1369 0.4098 -0.5085 0.2538
26 -1.1099 0.3725 -0.4038 0.2515
27 -1.2393 0.6120 -0.7279 0.2278
28 -1.1540 0.4221 -0.5863 0.2446
29 -1.1323 0.3916 -0.5090 0.2526
30 -1.4224 0.4718 -0.7708 0.3264
31 -1.4495 0.7741 -1.0983 0.2034
32 -1.1789 0.5647 -0.6463 0.1890
33 -1.1473 0.6755 -0.4919 0.3304
34 -1.1228 0.3132 -0.6435 0.1441
35 -1.0145 0.3396 -0.4918 0.1922
36 -1.0298 0.1560 -0.5934 0.1704
37 -1.2534 0.0899 -0.7795 0.1641
38 -1.0966 0.1944 -0.5541 0.2111
39 -1.0765 0.2019 -0.5230 0.2015
40 -1.2155 0.4647 -0.5677 0.3091
```
The dynamic of the attributes is in [-1.45 - 0.775]. The database resulting from the centering and reduction by attribute of the Texture database is on the ftp server in the `REAL/texture/texture_CR.dat.Z' file.
####6. Confusion matrix.
The following confusion matrix of the k_NN classifier was obtained with a Leave_One_Out error counting method on the texture_CR.dat database. k was set to 1 in order to reach the minimum mean error rate : 1.0 +/- 0.8%.
```
Class 2 3 4 6 7 8 9 10 12 13 14
2 97.0 1.0 0.4 0.0 0.0 0.0 1.6 0.0 0.0 0.0 0.0
3 0.2 99.0 0.0 0.0 0.0 0.0 0.4 0.0 0.0 0.0 0.4
4 1.0 0.0 98.8 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.0
6 0.0 0.0 0.0 99.4 0.0 0.0 0.0 0.6 0.0 0.0 0.0
7 0.0 0.0 0.0 0.0 100.0 0.0 0.0 0.0 0.0 0.0 0.0
8 0.0 0.0 0.0 0.0 0.0 98.6 0.0 1.4 0.0 0.0 0.0
9 0.4 0.0 0.2 0.0 0.0 0.2 98.8 0.4 0.0 0.0 0.0
10 0.0 0.0 0.0 0.0 0.0 1.4 0.0 98.6 0.0 0.0 0.0
12 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 100.0 0.0 0.0
13 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 99.8 0.2
14 0.0 0.4 0.0 0.0 0.0 0.4 0.0 0.0 0.2 0.0 99.0
```
7. Result of the Principal Component Analysis:
The Principal Components Analysis is a very classical method in pattern recognition [Duda73]. PCA reduces the sample dimension in a linear way for the best representation in lower dimensions keeping the maximum of inertia. The best axe for the representation is however not necessary the best axe for the discrimination. After PCA, features are selected according to the percentage of initial inertia which is covered by the different axes and the number of features is determined according to the percentage of initial inertia to keep for the classification process.
This selection method has been applied on the texture_CR database. When quasi-linear correlations exists between some initial features, these redundant dimensions are removed by PCA and this preprocessing is then recommended. In this case, before a PCA, the determinant of the data covariance matrix is near zero; this database is thus badly conditioned for all process which use this information (the quadratic classifier for example).
The following file is available for the texture database: ''texture_PCA.dat.Z'', it is the projection of the ''texture_CR'' database on its principal components (sorted in a decreasing order of the related inertia percentage; so, if you desire to work on the database projected on its x first principal components you only have to keep the x first attributes of the texture_PCA.dat database and the class labels (last attribute)).
Table here below provides the inertia percentages associated to the eigenvalues corresponding to the principal component axis sorted in the decreasing order of the associated inertia percentage. 99.85 percent of the total database inertia will remain if the 20 first principal components are kept.
```
Eigen Value Inertia Cumulated
value percentage inertia
1 30.267500000 75.6687000000 75.6687000000
2 3.6512500000 9.1281300000 84.7969000000
3 2.2937000000 5.7342400000 90.5311000000
4 1.7039700000 4.2599300000 94.7910000000
5 0.6716540000 1.6791300000 96.4702000000
6 0.5015290000 1.2538200000 97.7240000000
7 0.1922830000 0.4807070000 98.2047000000
8 0.1561070000 0.3902670000 98.5950000000
9 0.1099570000 0.2748920000 98.8699000000
10 0.0890891000 0.2227230000 99.0926000000
11 0.0656016000 0.1640040000 99.2566000000
12 0.0489988000 0.1224970000 99.3791000000
13 0.0433819000 0.1084550000 99.4875000000
14 0.0345022000 0.0862554000 99.5738000000
15 0.0299203000 0.0748007000 99.6486000000
16 0.0248857000 0.0622141000 99.7108000000
17 0.0167901000 0.0419752000 99.7528000000
18 0.0161633000 0.0404083000 99.7932000000
19 0.0128898000 0.0322246000 99.8254000000
20 0.0113884000 0.0284710000 99.8539000000
21 0.0078481400 0.0196204000 99.8735000000
22 0.0071527800 0.0178820000 99.8914000000
23 0.0067661400 0.0169153000 99.9083000000
24 0.0053149500 0.0132874000 99.9216000000
25 0.0051102600 0.0127757000 99.9344000000
26 0.0047116600 0.0117792000 99.9461000000
27 0.0036193700 0.0090484300 99.9552000000
28 0.0033222000 0.0083054900 99.9635000000
29 0.0030722400 0.0076806100 99.9712000000
30 0.0026373300 0.0065933300 99.9778000000
31 0.0020996800 0.0052492000 99.9830000000
32 0.0019376500 0.0048441200 99.9879000000
33 0.0015642300 0.0039105700 99.9918000000
34 0.0009679080 0.0024197700 99.9942000000
35 0.0009578000 0.0023945000 99.9966000000
36 0.0007379780 0.0018449400 99.9984000000
37 0.0006280250 0.0015700600 100.000000000
38 0.0000000040 0.0000000099 100.000000000
39 0.0000000001 0.0000000003 100.000000000
40 0.0000000008 0.0000000019 100.000000000
```
This matrix can be found in the texture_EV.dat file.
The Discriminant Factorial Analysis (DFA) can be applied to a learning database where each learning sample belongs to a particular class [Duda73]. The number of discriminant features selected by DFA is fixed in function of the number of classes (c) and of the number of input dimensions (d); this number is equal to the minimum between d and c-1. In the usual case where d is greater than c, the output dimension is fixed equal to the number of classes minus one and the discriminant axes are selected in order to maximize the between-variance and to minimize the within-variance of the classes.
The discrimination power (ratio of the projected between-variance over the projected within-variance) is not the same for each discriminant axis: this ratio decreases for each axis. So for a problem with many classes, this preprocessing will not be always efficient as the last output features will not be so discriminant. This analysis uses the information of the inverse of the global covariance matrix, so the covariance matrix must be well conditioned (for example, a preliminary PCA must be applied to remove the linearly correlated dimensions).
The Discriminant Factorial Analysis (DFA) has been applied on the 18 first principal components of the texture_PCA database (thus by keeping only the 18 first attributes of these databases before to apply the DFA preprocessing) in order to build the texture_DFA.dat.Z database file, having 10 dimensions (the texture database having 11 classes). In the case of the texture database, experiments shown that a DFA preprocessing is very useful and most of the time improved the classifiers performances.
[Duda73] Duda, R.O. and Hart, P.E.,Pattern Classification and Scene Analysis, John Wiley & Sons, 1973.

Class (target) | nominal | 11 unique values 0 missing | |

V1 | numeric | 861 unique values 0 missing | |

V2 | numeric | 979 unique values 0 missing | |

V3 | numeric | 1199 unique values 0 missing | |

V4 | numeric | 1072 unique values 0 missing | |

V5 | numeric | 1025 unique values 0 missing | |

V6 | numeric | 961 unique values 0 missing | |

V7 | numeric | 965 unique values 0 missing | |

V8 | numeric | 1003 unique values 0 missing | |

V9 | numeric | 1032 unique values 0 missing | |

V10 | numeric | 1234 unique values 0 missing | |

V11 | numeric | 861 unique values 0 missing | |

V12 | numeric | 894 unique values 0 missing | |

V13 | numeric | 1300 unique values 0 missing | |

V14 | numeric | 696 unique values 0 missing | |

V15 | numeric | 810 unique values 0 missing | |

V16 | numeric | 727 unique values 0 missing | |

V17 | numeric | 805 unique values 0 missing | |

V18 | numeric | 899 unique values 0 missing | |

V19 | numeric | 852 unique values 0 missing | |

V20 | numeric | 1282 unique values 0 missing | |

V21 | numeric | 861 unique values 0 missing | |

V22 | numeric | 990 unique values 0 missing | |

V23 | numeric | 1223 unique values 0 missing | |

V24 | numeric | 1150 unique values 0 missing | |

V25 | numeric | 1112 unique values 0 missing | |

V26 | numeric | 1100 unique values 0 missing | |

V27 | numeric | 1010 unique values 0 missing | |

V28 | numeric | 1082 unique values 0 missing | |

V29 | numeric | 1110 unique values 0 missing | |

V30 | numeric | 1217 unique values 0 missing | |

V31 | numeric | 861 unique values 0 missing | |

V32 | numeric | 898 unique values 0 missing | |

V33 | numeric | 1309 unique values 0 missing | |

V34 | numeric | 742 unique values 0 missing | |

V35 | numeric | 838 unique values 0 missing | |

V36 | numeric | 750 unique values 0 missing | |

V37 | numeric | 797 unique values 0 missing | |

V38 | numeric | 921 unique values 0 missing | |

V39 | numeric | 865 unique values 0 missing | |

V40 | numeric | 1252 unique values 0 missing |

An estimate of the amount of irrelevant information in the attributes regarding the class. Equals (MeanAttributeEntropy - MeanMutualInformation) divided by MeanMutualInformation.

0.03

Second quartile (Median) of kurtosis among attributes of the numeric type.

11

Average number of distinct values among the attributes of the nominal type.

-0.58

Second quartile (Median) of means among attributes of the numeric type.

Number of attributes needed to optimally describe the class (under the assumption of independence among attributes). Equals ClassEntropy divided by MeanMutualInformation.

Second quartile (Median) of mutual information between the nominal attributes and the target attribute.

0.03

Second quartile (Median) of skewness among attributes of the numeric type.

0.22

Second quartile (Median) of standard deviation of attributes of the numeric type.

1.09

Third quartile of kurtosis among attributes of the numeric type.

Minimal mutual information between the nominal attributes and the target attribute.

Maximum mutual information between the nominal attributes and the target attribute.

11

The minimal number of distinct values among attributes of the nominal type.

Third quartile of mutual information between the nominal attributes and the target attribute.

11

The maximum number of distinct values among attributes of the nominal type.

0.52

Third quartile of skewness among attributes of the numeric type.

-0.62

First quartile of kurtosis among attributes of the numeric type.

0.25

Third quartile of standard deviation of attributes of the numeric type.

0

Standard deviation of the number of distinct values among attributes of the nominal type.

First quartile of mutual information between the nominal attributes and the target attribute.

-0.29

First quartile of skewness among attributes of the numeric type.

0.2

First quartile of standard deviation of attributes of the numeric type.

Average mutual information between the nominal attributes and the target attribute.