WebFaction
Community site: login faq

Hello all,

I have a loop where I am saving thousands of values into a sql database through django. The program memory keeps growing until the process is killed by webfaction. Below is the loop:

    while(nextRow):
        columnNumber = 0
        time = 0
        for column in row:
            if columnNumber == 0:
                time = column
            value = Value.objects.create(time=time,value=column,parameter_id=parameters[columnNumber].id)
            trip.values.add(value)
            columnNumber = columnNumber + 1
        trip.duration = float(time)
        row = nextRow
        try:
            nextRow = reader.next()
        except:
            break

I know that if I comment out Value.objects.create() the loop runs minutes faster and the memory stays in check.

Any thoughts on memory management here or how to decrease the time of "create()" ?

Thanks, Eric

asked 30 Nov '11, 16:40

Eric Walter
114
accept rate: 0%


The first thing would be to not do this in a loop-- how are you getting the nextRow collection? And can you update multiple items at once? Maybe with a couple of queries? You definitely shouldn't be looping over every row & column like that. Do you have to stay outside the ORM?

permanent link

answered 30 Nov '11, 17:05

tclancy
861615
accept rate: 6%

Hello. Thanks for your response. I cannot use the update multiple items at once because I am creating new entries so the link does not apply. It appears the only solution will be to add all the values to an array and then run raw SQL queries to create the values and then add them to the proper trip.

(01 Dec '11, 10:55) Eric Walter

Can you provide more details? It's not very often I need to fall back to raw SQL in Django and when I do it's almost always for SELECT, not INSERT or UPDATE. Is it possible to see more of the code?

(01 Dec '11, 14:10) tclancy

This is not an answer but I am posting code so this should be cleaner:

I have modified my code to this and the reader is a csv reader:

time = 0
valuesToSave = []
while(nextRow):
        columnNumber = 0
        for column in row:

            if columnNumber == 0:
                time = column

            value = Value(time=time,value=column,parameter_id=parameters[columnNumber].id)
            valuesToSave.append(value)
            #value = Value.objects.create(time=time,value=column,parameter_id=parameters[columnNumber].id)
            #trip.values.add(value)
            columnNumber = columnNumber + 1
        row = nextRow
        try:
            nextRow = reader.next()
            time = 0
        except:
            break

trip.save()

valuesToSave[0].save()
trip.values.add(valuesToSave[0])

nextID = valuesToSave[0].id + 1
valuesToSave.remove(valuesToSave[0])

sql_saveValues(valuesToSave)

valueIDs = range(nextID, nextID+len(valuesToSave))

sql_addValuesToTrip(valueIDs, trip.id)

This code works pretty quickly. However, when finished, the program is taking up over 200MB of memory and doesn't release any of it.

Thoughts?

-Eric

(01 Dec '11, 14:18) Eric Walter

Sorry, I really don't understand the code. Without seeing the models, I can't really understand why you're stepping out to SQL so much. I did try to clean up the top of the logic a bit here: https://gist.github.com/1424301. You might want to consider more meaningful names for your variables as well if you have a model named "Value" with a field named "value".

(02 Dec '11, 12:34) tclancy

I have narrowed things down and the problem is the array I am building: "valuesToSave". I am using 200MB of memory before I even run the SQL code. How is it that this array is not being released?

(06 Dec '11, 09:45) Eric Walter

Most likely, the memory use grows because you never commit the transaction. Even if you aren't using a table type in MySQL that supports transactions, the Django ORM is most likely holding off on sending any changes to the database server until you commit the transaction.

See this chapter on transactions. One concern, however, is that if there's an error midway through the execution of your code, the database will be half-updated. That's life.

Before you begin, you could back up your database so you can revert to the backed up copy if you need to. Use mysqldump for that.

permanent link

answered 02 Dec '11, 16:54

Ken K
2324
accept rate: 0%

Here is the code I am using to save to the database. I am writing a StringIO that will do all queries at once and then I commit right after.

def sql_addValuesToTrip(valueIDs, tripID): codeToAddAll = StringIO() codeToAddAll.write('INSERT INTO databaseName.models_trip_values (id, trip_id, value_id) VALUES ')

for i in range(0, len(valueIDs)):
    value = valueIDs[i]
    codeToAddAll.write("(NULL, '")
    codeToAddAll.write(str(tripID))
    codeToAddAll.write("', '")
    codeToAddAll.write(str(value))
    codeToAddAll.write("'), ")

codeString = codeToAddAll.getvalue()
codeString = codeString[:len(codeString)-2] + ""
cursor = connection.cursor()
cursor.execute(codeString)

transaction.commit_unless_managed()

return

def sql_saveValues(values): codeToAddAll = StringIO() codeToAddAll.write('INSERT INTO databaseName.models_value (id, parameter_id, time, value) VALUES ')

for i in range(0, len(values)):
    value = values[i]
    codeToAddAll.write("(NULL, '")
    codeToAddAll.write(str(value.parameter_id))
    codeToAddAll.write("', '")
    codeToAddAll.write(str(value.time))
    codeToAddAll.write("', '")
    codeToAddAll.write(str(value.value))
    codeToAddAll.write("'), ")

codeString = codeToAddAll.getvalue()
codeString = codeString[:len(codeString)-2] + ""
cursor = connection.cursor()
cursor.execute(codeString)
transaction.commit_unless_managed()
(05 Dec '11, 15:59) Eric Walter

I guess python has trouble freeing arrays. Thus a new approach was done combining raw sql and not storing any arrays. This way I only built up about 60MB of memory and then went down to 30MB on completion.

permanent link

answered 06 Dec '11, 10:47

Eric Walter
114
accept rate: 0%

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×909
×337

question asked: 30 Nov '11, 16:40

question was seen: 5,250 times

last updated: 06 Dec '11, 10:47

WEBFACTION
REACH US
SUPPORT
AFFILIATE PROGRAM
LEGAL
© COPYRIGHT 2003-2019 SWARMA LIMITED - WEBFACTION IS A SERVICE OF SWARMA LIMITED
REGISTERED IN ENGLAND AND WALES 5729350 - VAT REGISTRATION NUMBER 877397162
5TH FLOOR, THE OLD VINYL FACTORY, HAYES, UB3 1HA, UNITED KINGDOM