orbiter
3c3cb78555
- removed a lot of garbage and bloated code from GuiHandler.
...
- transformed log lines to String before they are stored because the
storage space is about 1:250 (45kb for one line before transformation,
180 bytes afterwards)
- this saves up to 10MB RAM so we can increase the number of lines to
1000 again.
11 years ago
Michael Peter Christen
5afa6e3aee
Automatically flush the log cache if a short memory status is reached.
...
For the default of 200 lines this can flush about 10MB.
11 years ago
Michael Peter Christen
030d0776ff
Enhanced crawl start for very, very large crawl lists (i.e. > 5000)
...
which had a problem because of badly used concurrency.
This fix also caused a redesign of the whole host deletion process.
This should fix bug http://bugs.yacy.net/view.php?id=250
11 years ago
Michael Peter Christen
6aabc4e5c8
reduced logging line memory, 10000 lines had filled up 450MB! grrr.
...
(thank you, a bomb from the past)
11 years ago
Michael Peter Christen
1a8783147b
enhanced computation of number of solr documents.
11 years ago
Michael Peter Christen
4948c39e48
added concurrency for mass crawl check
11 years ago
Michael Peter Christen
1b4fa2947d
- fixed a problem which ocurred when a document was not recognized with
...
the right content domain (i.e. identifying that it is an image, text
etc.) because it used the file extension and not an existing mime type
assignment.
- fixed the new setting that images shall be loaded for a better image
search.
- both fixes together makes it now possible to crawl
commons.wikimedia.org which makes use of 'funny' document names (i.e.
ending with .jpg while the document is html)
11 years ago
Michael Peter Christen
82621bead0
When doing bootstraping, always accept one seedlist-File without
...
checking the date of the file. This should help to start the peer in
case that the user has a completely wrong date setting.
11 years ago
Michael Peter Christen
16e3b357b3
replaced old tag cloud and adopted design a bit
11 years ago
Michael Peter Christen
dc38d35986
added matching in url field in Table_API_p search
11 years ago
Michael Peter Christen
691d7e70fa
added hint to development/commit rss feed
11 years ago
Michael Peter Christen
b81859c751
Show a RSS icon in the right top corner of search results. This replaces
...
the 'API' icon which was the link for the opensearch result which is an
extension of RSS. Since it is more appropriate to visualize a RSS link
with an RSS icon, this API icon was changed here.
11 years ago
Michael Peter Christen
1a09771be8
fixed sitemap crawl start
11 years ago
orbiter
b743e6d79f
- prevent that crawl filter have empty (never-match) content
...
- rewrite the description of the options "Restrict to start domain(s)"
and "Restrict to sub-path(s)" to an explanation, that the restriction
applies to all links in the link list of the option "From Link-List of
URL" if this option is selected
- allow "Restrict to sub-path(s)" if the "From Link-List of URL" is
selected. This is supported in the crawl start.
11 years ago
orbiter
20bbde8665
fix for mustmatch regex computation: result had correct semantic, but
...
may have contained multiple same expressions within the disjunction of
domain-restrictions. This fix removes the redundant restrictions and
makes the regex shorter.
11 years ago
orbiter
f597fdb602
make it easier to filter properties (case insensitive)
11 years ago
Michael Peter Christen
c833d02cf5
fixed webgraph postprocessing (did nothing and repeated to do this...)
11 years ago
Michael Peter Christen
74d0256e93
enhanced postprocessing: fixed bugs, enable proper postprocessing also
...
without the harvestingkey, remove crawl profiles after postprocessing,
speed-up for clickdepth computation.
11 years ago
Michael Peter Christen
299f51cb7f
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
11 years ago
reger
e7a596afda
Merge branch 'master' of git://gitorious.org/yacy/rc1.git
11 years ago
reger
37d24f3318
make use of declared static string ACTION_LOCATION
11 years ago
Michael Peter Christen
7b69c438f7
more methods for the table class
11 years ago
Michael Peter Christen
820b896146
Replaced the inframe loading from yacy.net for donations with the
...
loading of this iframe from the local host. To make this more flexible,
this iframe is loaded once after startup from yacy.net.
11 years ago
sixcooler
dfb73c9519
bump to httpclient-4.3.1 - a bugfix release
11 years ago
reger
0d4efabaa8
fix YaCy version string in proxy headers
...
(config parameter vString not longer used)
11 years ago
sixcooler
d9a02ed277
NPE fix for my last commit
11 years ago
sixcooler
61f627eb85
fix for ssl-connections from proxy-usage staying in close-wait-state
...
+ some extra 'close' in HttpClient
11 years ago
Michael Peter Christen
91fa99e9bb
added new icon/image for latest commit
11 years ago
Michael Peter Christen
9fac9249bc
- replaced 'edit' link with a clone symbol in Table_API_p since that is
...
what it does: it clones the crawl, it does not change the crawl.
- moved the appearance of this clone link to the type column since this
makes it visible also if the URL column is not visible.
11 years ago
Michael Peter Christen
0f6db6ad5b
Merge remote-tracking branch 'jensbees/crawlexpert-post'
11 years ago
bhoerdzn
3fcf7a94c5
rolling back wrong merge
11 years ago
Jens Bertram
3252c1ec39
Merge upstream/master into crawlexpert-post
11 years ago
Michael Peter Christen
d328cc4a83
fix for didyoumean, added also more asian alphabets
11 years ago
Michael Peter Christen
90c8577840
enhanced ranking; patches to replace old ranking
11 years ago
Jens Bertram
9f6b98d374
Merge master into crawlexpert-post
11 years ago
bhoerdzn
6e33be4ce6
reverting local changes to project.xml
11 years ago
bhoerdzn
a3824dfbaa
check URL on inital load, if set
11 years ago
bhoerdzn
52f49d475b
add a hidden field for "crawlingstart" since jQuery omits the submit button value
11 years ago
bhoerdzn
b0c0ec2dec
link recorded crawl starts back to "CrawlStartExpert_p" in "Process Scheduler"
11 years ago
bhoerdzn
d64d45361c
use integer types for boolean values
11 years ago
bhoerdzn
eda123d6fd
remove debugging code intercepting post requests
11 years ago
bhoerdzn
5057f27bbd
fix typo in parsing "cachePolicy" parameter
11 years ago
bhoerdzn
98f5c9018d
Fixed template vars for "deleteold". Fixed parsing "deleteold" parameter. Stop "setState" overwriting "deletold" state on load.
11 years ago
bhoerdzn
a6a62986d4
correct state handling for country code restriction
11 years ago
bhoerdzn
4066b85155
correctly set initial state for load filters
11 years ago
bhoerdzn
8c91c3e7cd
set form boolean values to 0 & 1 instead of false & true
11 years ago
bhoerdzn
c27fabc88e
fixed wrong parameter check
11 years ago
bhoerdzn
2214bf5396
Remove some post parameters, if they are set to default values, as their values are already set by YaCy. Added some documentation.
11 years ago
Michael Peter Christen
1b61bd40ed
- Added new solr field url_file_name_tokens_t which stores the file name
...
tokens. This can be used to enhance the ranking.
- Added also a rating_i field as basis for later usage.
- enhanced the tokenization process.
11 years ago
orbiter
6efa7532d2
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
11 years ago