The example program below demonstrates the basics of working with data that comes from a text file. The program is designed to work with a text file that contains a list temperature readings. The readings are listed one per line in the text file. The program will read the list of readings from the text file, put the readings in a list, and then determine and print the lowest and highest temperature readings found in the file.
temps = [] f = open('temps.txt') for line in f.readlines(): temps.append(float(line)) f.close() lowest = temps[0] highest = temps[0] for t in temps: if t < lowest: lowest = t if t > highest: highest = t print('Lowest temp = '+str(lowest)) print('Highest temp = '+str(highest))
Here are some things to make note of in the program.
open()
function opens a file for reading and returns a file object. The parameter to open()
specifies the name of the file to open.f.readlines()
returns an iterable list of lines in the text file. We set up a for loop to iterate over this list of lines.close()
method to close the file.Since we always have to take care to close a file after we are done working with it, it may be helpful to use an alternative construction to manage opening and closing the file. The Python with
construct is useful for this purpose.
In place of
f = open('temps.txt') for line in f.readlines(): temps.append(float(line)) f.close()
we can do
with open('temps.txt') as f: for line in f.readlines(): temps.append(float(line))
Once we exit the body of the with construct the file will get closed for us automatically. In addition, should the program generate an error anywhere in the body the program will automatically exit the body of the with and close the file for us.
The next example is a short program that I used to generate some random data for the temps.txt data file.
import random f = open('temps.txt','w') for n in range(0,50): f.write('{:2.1f}\n'.format(random.random()*100)) f.close()
Here are some things to note in this program.
open()
function to open a file. The optional second parameter to the open()
function is a file mode specifier. Since we are opening this file for writing we use the 'w' mode specifier.write()
method. The parameter we pass to write()
is a string of text that we want to have written to the file. We have to take care to make sure that the string ends in the newline character, \n
, so that the text gets a line break at the end.random()
function from the random module to generate a random float in the range from 0.0 to 1.0. We multiply this random number by 100 to scale it up to the range from 0.0 to 100.0.close()
method to close the file when we are done writing to the file.In the next few examples we are going to be reading data from text files. In every case the data will be arranged as a data series with a list of data items on each line of the file. The following Python function will serve as a generic data reading function to load the raw data from the text file. The function reads the individual lines of the input file as text strings and then uses the string split()
method to split each line into a list of strings for the individual data items.
def readData(fileName): """Generic data reading function: reads lines in a text file and splits them into lists.""" data = [] with open(fileName) as f: for line in f.readlines(): data.append(lineToData(line.split())) return data
The next step will typically be to convert the strings in our data lists into a data format that is appropriate for our particular application. For example, in the next example program below we are going to replicate the linear regression example I showed a few lectures back. We will be working with an input file that looks like this:
1935 32.1 1940 30.5 1945 24.4 1950 23 1955 19.1 1960 15.6 1965 12.4 1970 9.7 1975 8.9 1980 7.2
The first entry in the data list returned by the call to split()
in readData will look like
["1935","32.1"]
I would like to convert that pair of strings into a tuple containing a combination of an integer and a float. Here is a simple data cleaning function that can perform that transformation:
def lineToData(line) """Converts a raw line list into an appropriate data format.""" return (int(line[0]),float(line[1]))
readData()
will then use this lineToData
function to put the data in the format that we need.
pairs = readData('farm.txt')
Here now is the program that reads the farm population data and performs the regression analysis on the data. Note the function definitions that help us to perform key parts of the regression computation.
def lineToData(line): """Converts a raw line list into an appropriate data format.""" return (int(line[0]), float(line[1])) def readData(fileName): """Generic data reading function: reads lines in a text file and splits them into lists.""" data = [] with open(fileName) as f: for line in f.readlines(): data.append(lineToData(line.split())) return data def means(pairs): xSum = 0 ySum = 0 for x, y in pairs: xSum += x ySum += y N = len(pairs) return xSum / N, ySum / N def covariance(pairs, means): sum = 0 for x, y in pairs: sum += (x - means[0]) * (y - means[1]) return sum def xVariance(pairs, xMean): sum = 0 for x, y in pairs: sum += (x - xMean) * (x - xMean) return sum def regressionCoeffs(pairs): """Computes linear regression coefficients (a,b) from a list of (x,y) pairs.""" m = means(pairs) beta = covariance(pairs, m) / xVariance(pairs, m[0]) alpha = m[1] - beta * m[0] return (alpha, beta) pairs = readData('farm.txt') a, b = regressionCoeffs(pairs) for x, y in pairs: prediction = a + x * b print('Year: {:d} Prediction: {:5.2f} Actual: {:5.2f}'.format(x, prediction, y))
The output produced by this program is
Year: 1935 Prediction: 31.49 Actual: 32.10 Year: 1940 Prediction: 28.56 Actual: 30.50 Year: 1945 Prediction: 25.62 Actual: 24.40 Year: 1950 Prediction: 22.69 Actual: 23.00 Year: 1955 Prediction: 19.76 Actual: 19.10 Year: 1960 Prediction: 16.82 Actual: 15.60 Year: 1965 Prediction: 13.89 Actual: 12.40 Year: 1970 Prediction: 10.96 Actual: 9.70 Year: 1975 Prediction: 8.02 Actual: 8.90 Year: 1980 Prediction: 5.09 Actual: 7.20
This looks about right for a linear regression.
Write a Python program that reads two lists of integers from files named 'one.txt' and 'two.txt' and then determines which numbers from the first file do not appear in the second file. Construct a list of these numbers and then write the list out to a third file named 'diff.txt'.
To submit your work for grading, compress your entire project folder into a ZIP archive and send me that archive as an attachment to an email message.