Data
Analysis - reading text files and processing them with Matlab
In this article, we're going to read
text files with
Matlab,
perform data
analysis or processing, and finally we are going to
write out our results to another text file. The procedure is easily
adaptable to many situations.
Let's assume that we have 3 text
files (it could be
hundreds). They all have to have the same format,
and have to have a basic file name, with numbered tag endings
(otherwise it is harder to automate the reading process).
For example, we have files 'data_sheet1.txt',
'data_sheet2.txt'
and 'data_sheet3.txt'.
So,
the basic file name is 'data_sheet'; then, the numbered tag is
1,
2 or 3, respectively, and they all end with the '.txt' extension.
Let the content for each file be something simple, for example, for
'data_sheet1.txt' the hypothetical content is:
dummy line 1
dummy line 2
dummy line 3
1 1234
2 2345
3 4320
4 4567
5 9876
This file has four text lines (three dummy lines and one blank line) at
the beginning, and then the real data in two columns.
In our case, the content for 'data_sheet2.txt' is:
dummy line 1
dummy line 2
dummy line 3
1 12340
2 23452
3 43203
4 45674
5 98765
and the content for 'data_sheet3.txt' is
dummy line 1
dummy line 2
dummy line 3
1 123
2 234
3 432
4 456
5 987
Note
that all the three files have four text lines at the beginning and all
of them have the relevant
data in the same format, with the same number
of elements (two columns and five rows). The number of columns or rows
is not relevant for our purpose, but the files have to keep the same
format or structure.
We are going to use Matlab functions
'fopen', 'textscan' and 'num2str' to read data from all those '.txt'
files (it's a good idea if you investigate those three functions a
little bit, but I'll give you the recipe).
We are not interested
in the four text lines at the beginning of the files, and we
want
to read the first column of the first file (which is the same for all
the files, let's say for identification purposes) and the second column
of each of the files, so, we want to end with something like
1
1234
12340
123
2
2345
23452
234
3
4320
43203
432
4
4567
45674
456
5
9876
98765
987
In this way, we now have the information in
one matrix, and we can do data analysis thereafter.
This
is the function that I propose to read the files. You have two input
parameters (the base file name and the number of files to read) and one
output (one cell array with your relevant data). Fair, isn't it?
To automatically change the name of the file we use an array in this
form:
[BaseFile num2str(i) '.txt']
This
array concatenates the string BaseFile name (input parameter) with a
counting number (by changing the iteration counter into a string), and
then concatenates
the '.txt' extension.
For the first file, the idea could be represented by:
[BaseFile '1' '.txt'], or better [BaseFile '1.txt']
The full code would be:
function R =
get_data(BaseFile, n)
% Open the first file
d(1) = fopen([BaseFile '1.txt']);
% Read the first two columns, skip the first 4
headerlines
R = textscan(d(1), '%f %f', 'headerLines', 4);
% Close the file, you don't need it any longer
fclose(d(1));
for i = 2 : n
% Open consecutively each of the remaining files
d(i) =
fopen([BaseFile num2str(i) '.txt']);
% Skip the first column of the new file (an '*'
to do this) % and keep on building the
array
R = [R
textscan(d(i), '%*f %f', 'headerLines', 4)];
% Close the file
fclose(d(i));
end
You end with your data in cell array R. Instruction 'textscan' produces
a cell array (not an ordinary array) so you have to alter this (only if
necessary).
How are you going to use the above function to read text files and
process data from Matlab?
This is one suggestion. You may process it the way you want...
% Reset your memory and clear your screen
clear; clc
% Provide base file name and number of files to
be
read
BaseFile = 'data_sheet';
n = 3;
% Use the developed function to read data
R = get_data(BaseFile, n);
% Transform your cell array into an ordinary
matrix
% and show your data
my_data
= cell2mat(R)
At
this point 'my_data' is a matrix that has the information as you need
it (exactly as shown before).
You can study it, or plot it... or perform data analysis of
any kind...
% Calculate the average of all of the columns
and show
my_average = mean(my_data)
% Calculate the standard deviation for each
column
my_std = std(my_data)
% Calculate the maximum
my_max = max(my_data)
% Calculate the minimum
my_min = min(my_data)
% Arrange your information to be saved
my_results = [my_average' my_std' my_max'
my_min']
% Save your 'my_results' matrix in file
'data_out.txt'
save data_out.txt -ascii my_results
Done!
Now, you have a text file with your data analysis or
processed information.
From
'Data Analysis' to home
From
'Data Analysis' to 'Matlab Cookbook Menu'
|
|