Bad Fritz (im not b(r)ad ;-P) (badfritz) wrote in lj_biz,
Bad Fritz (im not b(r)ad ;-P)

the LJ directory ..

yup, the directory is down for months now ..
yup, i want it back, too ..

but maybe in a different way ...

before i elaborate on my ideas for solutions, i should better make sure that i have understood the problem .. ;-)

why is the directory down ?
as far as i understood, the sql queries generated by the directory page create such a tremendous load on the master db server, that is brings the whole site down to its knees.

but whats so radically different about the directory sql queries compared to those sql queries generating some users friend view ?

would the server load be tolerable if it were limited to paid accounts ?

some maybe silly ideas for solutions ..

1) directory only for paid accounts
there are about 400.000 users, 10.000 (= 0.25 %) of them have paid accounts.
so limiting the directory to paid accounts would reduce the server load by 99.75 %
but maybe that would be still too much load ?

2) dedicated directory db server
if the queries would run on a dedicated directory db server instead of the master db server, they can only bring that directory server to its knees. i dont care very much about query results being up to date to the second. if the master db is copied every hour/day/week/whatever to that directory server, the load on the master server would be quite small.

3) filter out refreshs
somewhere i picked up, that users hit refresh when the query is slow, which creates even more load on the server. would it be possible to record the time when a user started a query and block any refreshs by that user for x minutes ?
hmm, i guess that would require another db table keeping track of the trigger happy users .. ;-)
or what about limiting the number of queries a user can start per hour/day/week ?

4) standard reports for most frequently used queries
i think many queries are quite similar, e.g. some user wants to find users in his town/state/country.

some queries representing the most frequently used directory queries could be coded and run by the server in regular intervals (e.g. once a week or every night, when the table of users for the random journal link is updated). the resulting text files of those queries could be put into some public read only directory, so that every user can grab them.

some possible standard reports could be:

  • users by country - list all users from country X, sorted by town,
    e.g. "user_fr.txt" lists all french users

  • users by u.s. state - list all users from u.s. state X, sorted by town,
    e.g. "user_usca.txt" lists all users from california

  • biggest communities - list all (or the first N) communities, sorted descending by number of members,
    e.g. "comm_big100.txt" lists the 100 biggest communities

  • most active communities - list all (or the first N) communities, sorted descending by number of posts,
    e.g. "comm_act100.txt" lists the 100 most active communities

  • new communities - list all (or the first N) communities, sorted descending by creation date,
    e.g. "comm_new50.txt" lists the 50 newest communities

  • ... i think you get the idea ...

it would be quite easy to add more standard reports, once a basic framework exists.

standard reports would avoid thousands of users running the same queries again and again.

5) "batch mode" queries
when the directory was up, queries were "live", which means the user gave the query parameters, and then the query was run.

instead some sort of batch mode could be used. the user enters the query parameters, which are put in some queue db table. the user gets some response page like:

thanks for your request.
your request is number N in the queue.
that means you can expect a result by email in approximately X days/hours/minutes ..

some server background task(s) would work through the query request queue, running one query after the other.
the query result would be emailed to the user.

if two queues and tasks with different priorities would be used, it would be possible to have fast high priority queries for paid accounts and slow low priority queries for free users.

6) reducing query complexity
there is one directory page, and it allows a lot of different parameters.
wouldnt it be better to have several different smaller query pages, each with a smaller set of possible parameters ?
that would reduce the complexity of the possible queries.

i think these ideas - especially when combined - could reduce the server load generated by queries significantly.

what do you think ?

  • Post a new comment


    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded