Page Menu
Home
c4science
Search
Configure Global Search
Log In
Files
F92592003
LinesOfALarge.php
No One
Temporary
Actions
Download File
Edit File
Delete File
View Transforms
Subscribe
Mute Notifications
Award Token
Subscribers
None
File Metadata
Details
File Info
Storage
Attached
Created
Thu, Nov 21, 19:14
Size
6 KB
Mime Type
text/x-php
Expires
Sat, Nov 23, 19:14 (2 d)
Engine
blob
Format
Raw Data
Handle
22466291
Attached To
rPHU libphutil
LinesOfALarge.php
View Options
<?php
/**
* Abstraction for processing large inputs without holding them in memory. This
* class implements line-oriented, buffered reads of some external stream, where
* a "line" is characterized by some delimiter character. This provides a
* straightforward interface for most large-input tasks, with relatively good
* performance.
*
* If your stream is not large, it is generally more efficient (and certainly
* simpler) to read the entire stream first and then process it (e.g., with
* `explode()`).
*
* This class is abstract. The concrete implementations avialable are:
*
* - @{class:LinesOfALargeFile}, for reading large files; and
* - @{class:LinesOfALargeExecFuture}, for reading large output from
* subprocesses.
*
* For example:
*
* foreach (new LinesOfALargeFile('/path/to/file.log') as $line) {
* // ...
* }
*
* By default, a line is delimited by "\n". The delimiting character is
* not returned. You can change the charater with @{method:setDelimiter}. The
* last part of the file is returned as the last $line, even if it does not
* include a terminating character (if it does, the terminating character is
* stripped).
*
* @task config Configuration
* @task internals Internals
* @task iterator Iterator Interface
* @group filesystem
*/
abstract
class
LinesOfALarge
implements
Iterator
{
private
$pos
;
private
$buf
;
private
$num
;
private
$line
;
private
$valid
;
private
$eof
;
private
$delimiter
=
"
\n
"
;
/* -( Configuration )------------------------------------------------------ */
/**
* Change the "line" delimiter character, which defaults to "\n". This is
* used to determine where each line ends.
*
* @param string A one-byte delimiter character.
* @return this
* @task config
*/
final
public
function
setDelimiter
(
$character
)
{
if
(
strlen
(
$character
)
!==
1
)
{
throw
new
Exception
(
"Delimiter character MUST be one byte in length."
);
}
$this
->
delimiter
=
$character
;
return
$this
;
}
/* -( Internals )---------------------------------------------------------- */
/**
* Hook, called before @{method:rewind()}. Allows a concrete implementation
* to open resources or reset state.
*
* @return void
* @task internals
*/
abstract
protected
function
willRewind
();
/**
* Called when the iterator needs more data. The subclass should return more
* data, or empty string to indicate end-of-stream.
*
* @return string Data, or empty string for end-of-stream.
* @task internals
*/
abstract
protected
function
readMore
();
/* -( Iterator Interface )------------------------------------------------- */
/**
* @task iterator
*/
final
public
function
rewind
()
{
$this
->
willRewind
();
$this
->
buf
=
''
;
$this
->
pos
=
0
;
$this
->
num
=
0
;
$this
->
eof
=
false
;
$this
->
valid
=
true
;
$this
->
next
();
}
/**
* @task iterator
*/
final
public
function
key
()
{
return
$this
->
num
;
}
/**
* @task iterator
*/
final
public
function
current
()
{
return
$this
->
line
;
}
/**
* @task iterator
*/
final
public
function
valid
()
{
return
$this
->
valid
;
}
/**
* @task iterator
*/
final
public
function
next
()
{
// Consume the stream a chunk at a time into an internal buffer, then
// read lines out of that buffer. This gives us flexibility (stream sources
// only need to be able to read blocks of bytes) and performance (we can
// read in reasonably-sized chunks of many lines), at the cost of some
// complexity in buffer management.
// We do this in a loop to avoid recursion when consuming more bytes, in
// case the size of a line is very large compared to the chunk size we
// read.
while
(
true
)
{
if
(
strlen
(
$this
->
buf
))
{
// If we already have some data buffered, try to get the next line from
// the buffer. Search through the buffer for a delimiter. This should be
// the common case.
$endl
=
strpos
(
$this
->
buf
,
$this
->
delimiter
,
$this
->
pos
);
if
(
$endl
!==
false
)
{
// We found a delimiter, so return the line it delimits. We leave
// the buffer as-is so we don't need to reallocate it, in case it is
// large relative to the size of a line. Instead, we move our cursor
// within the buffer forward.
$this
->
num
++;
$this
->
line
=
substr
(
$this
->
buf
,
$this
->
pos
,
(
$endl
-
$this
->
pos
));
$this
->
pos
=
$endl
+
1
;
return
;
}
// We only have part of a line left in the buffer (no delimiter in the
// remaining piece), so throw away the part we've already emitted and
// continue below.
$this
->
buf
=
substr
(
$this
->
buf
,
$this
->
pos
);
$this
->
pos
=
0
;
}
// We weren't able to produce the next line from the bytes we already had
// buffered, so read more bytes from the input stream.
if
(
$this
->
eof
)
{
// NOTE: We keep track of EOF (an empty read) so we don't make any more
// reads afterward. Normally, we'll return from the first EOF read,
// emit the line, and then next() will be called again. Without tracking
// EOF, we'll attempt another read. A well-behaved impelmentation should
// still return empty string, but we can protect against any issues
// here by keeping a flag.
$more
=
''
;
}
else
{
$more
=
$this
->
readMore
();
}
if
(
strlen
(
$more
))
{
// We got some bytes, so add them to the buffer and then try again.
$this
->
buf
.=
$more
;
continue
;
}
else
{
// No more bytes. If we have a buffer, return its contents. We
// potentially return part of a line here if the last line had no
// delimiter, but that currently seems reasonable as a default
// behaivor. If we don't have a buffer, we're done.
$this
->
eof
=
true
;
if
(
strlen
(
$this
->
buf
))
{
$this
->
num
++;
$this
->
line
=
$this
->
buf
;
$this
->
buf
=
null
;
}
else
{
$this
->
valid
=
false
;
}
break
;
}
}
}
}
Event Timeline
Log In to Comment